Does speech recognition have a future in business tech?

Once a simple tool for dictation, speech recognition is being revolutionized by AI to improve customer experiences and drive inclusivity in the workforce

An abstract image showing multicolored sound waves in orange, purple, blue, against a dark background to represent speech recognition algorithms.
(Image credit: Getty Images)

Speech recognition technology has come a long way since sitting in front of a 1990s PC, wearing a single padded earphone with a long stick microphone attached, speaking words into software such as DragonDictate.

Today’s computers, tablets, and smartphones – as well as cars and home speakers – hear our voice commands and act on what we say or ask.

For businesses, the power and opportunity of speech recognition is now fueling transformation, breathing new life into apps, the manufacturing process, how they deal with customers, and the working lives of their employees.

Camelia Suciu, director, solutions engineering EMEA at Twilio, believes speech recognition is making a definite comeback, driven by consumer frustrations at the endless number of numbered choices to select when dealing with automated phone tree menus.

“Historically, speech systems were frustrating due to latency delays and challenges in recognizing and understanding requests,” she says. "Silence on a voice call is deafening, and the variety of accents, terms, languages, and limitless requests adds complexity, ultimately falling short of customer expectations.

“But in the race to offer personalised, effective customer service at scale, many brands are turning to AI-assisted voice solutions to provide more natural, adaptable conversations.”

These advances are being driven by natural language processing (NLP), Suciu says, which is moving to an era where chatbots understand message intent, handle grammatical errors, and even switch languages mid-conversation.

Speech to text (STT) and text to speech (TTS) functionality have been used to integrate large language models into this workflow, enabling automated responses that mimic speech from a real person. And new multimodal AI models can process audio input without the need for additional models to first turn speech into text, lowering latency and improving their inherent ability to process a user’s speech with the widest possible context.

AI agents can engage in human-like dialog and perform actions on behalf of a business, taking us far beyond basic FAQ responses. Even better, when AI agents can engage with customers on the channels they prefer, and leverage contextual data, you get a rich and dynamic customer experience that businesses haven’t been able to imagine or scale previously,“ adds Suciu.

Changing the tone of speech recognition

According to Vitor Monteiro, co-founder at Unflow, a software and AI innovation studio, businesses are looking far beyond just ‘talking’, when it comes to the next steps for speech recognition. He tells ITPro understanding tone is going to be a game-changer.

“We’re heading toward a future where tone of voice will shape how software responds to us,” he explains. “Advanced speech recognition combined with LLMs is making it possible for systems to detect urgency, frustration, or decisiveness – and act accordingly. That means yes, if you shout at your task manager, it could move that task to the top of your list.”

In an internal business context, Monteiro gives an example of a logistics operator reporting a supply issue mid-shift. By changing their tone in this scenario, the system not only transcribes the voice input but also detects the urgency and flags it for immediate attention.

He adds: “We’re not just transcribing anymore; we’re interpreting, prioritising, and responding in real time. As speech becomes a fully context-aware interface, leaders should start asking: will teams need to mind their manners when speaking to systems that can genuinely read the room?”

Monteiro also cites a major societal positive from this advancement: inclusion. He believes speech recognition will aid neurodivergent employees, or anyone not best served by “rigid, text-heavy workflows”. For instance, a product manager may prefer to brainstorm aloud during a walk with their voice notes automatically transcribed, summarised, and added to a shared board.

“We’re looking at a workplace that’s not only more efficient, but fundamentally more human,” Monteiro predicts. “Businesses that integrate automatic speech recognition (ASR) with LLMs can unlock productivity and accessibility in ways previously unimaginable. The smartest will be designing for voice-first workflows before their competitors do.”

According to research from Jabra, employees want this type of technology with more than a third (36%) believing that the best way to communicate with AI is through voice; only 15% believed typing was best.

This could stem, Jabra suggests, from speech outpacing typing every time with the average person speaking 125-150 words per minute, compared to 40-50 words per minute for the typing average.

Bias in speech recognition

However, any leap forward for speech recognition must mitigate bias and inaccuracy, suggests Martin Harper, the UKI Innovation Lead at Avanade. Harper is part of the Innovate UK funded DeepMyna project, led by Habitat Learn and Southampton University, which aims to develop more trustworthy AI.

“By minimizing transcription errors and mitigating biases in training data, this research will help empower users of speech-to-text technologies to improve both the accuracy and reliability of their tools,” Harper says.

“Once fully evaluated, the project’s findings will inform the development of high impact use cases, with applications spanning industries like healthcare, education, finance, and legal services.”

Like Monteiro, Harper also sees a boost for inclusion, with real-time transcription and captioning assisting those with hearing impairments and helping people with disabilities enter the workplace, even if typing is impossible for them. “We can ensure everyone has an equal opportunity to participate,” he adds.

Looking back, Peter van der Putten, director, AI Lab at Pegasystems and assistant professor of AI at Leiden University explains speech recognition’s previous limitations were tied to the fact that it could only be used for channels where speech is the major modality. Now, medical professionals will use it to transcribe notes, social care workers to summarise cases, or voice AI might listen to a customer service call to suggest the next actions.

There are, however, legal and security ramifications when capturing and storing people’s voices with van der Putten warning: “This will, of course, require customer and citizen consent and still be able to offer good service if this is declined. There should be capabilities to support customers exercising their rights to information or rights to be forgotten.

“More fundamentally, it is important speech data is used to the clear benefit of customers, so they can demand the use of speech AI as opposed to being apprehensive about it.”

James Tumbridge, partner at Keystone Law and a specialist in intellectual property and data protection, agrees. “The gathering, processing, and storage of a voice which can identify a person is personal data and use of it must be in a lawful way,” he explains.

“Then you have to respect the individual’s right to privacy, and the loss of control of their voice is a risk to that. Finally, if AI can replicate a voice, you are risking a cyber-security incident and fraud. Be careful whose voice you mimic – it could be a costly exercise.”

Jonathan Weinberg is a freelance journalist and writer who specialises in technology and business, with a particular interest in the social and economic impact on the future of work and wider society. His passion is for telling stories that show how technology and digital improves our lives for the better, while keeping one eye on the emerging security and privacy dangers. A former national newspaper technology, gadgets and gaming editor for a decade, Jonathan has been bylined in national, consumer and trade publications across print and online, in the UK and the US.