Why Your AI Voice Agent Mishears Callers and Exactly How to Fix It

BLOG

May 12

With every AI voice agent, one thing needs to work correctly before anything else can happen. Before the logic. Before the integrations. Before the data reaches your backend systems.

The agent needs to hear what the caller actually said.

If the speech recognition layer gets the input wrong, everything that follows is built on a false foundation. The AI processes the wrong words, generates a response to a question the caller never asked, and the conversation breaks down. The caller repeats themselves. The agent misunderstands again. The caller asks to speak to a human.

This is not a hypothetical edge case. It happens in production, across industries, every day. And for most businesses running AI voice agents, it is the single most underestimated source of call failure.

ASR hints have been used by us for years, and are now available within the Flow Designer. This article explains what ASR is, why it produces errors in real business environments, and how hints solve the problem in a way that requires no technical expertise to configure.

What ASR Is and What It Actually Does

ASR stands for Automatic Speech Recognition. It is the technology layer that converts what a caller says out loud into text. That text is then processed by the language model, which uses it to understand intent and generate a response.

ASR is not the AI that thinks, it is the AI that listens.

Every AI voice agent, regardless of the platform it runs on, has an ASR engine working in real time underneath the conversational layer. The quality of everything the agent says and does depends on whether the ASR engine correctly captured what the caller said first.

General-purpose ASR engines are trained on large datasets of recorded human speech. They perform well on common words spoken in standard accents under reasonable audio conditions. They were built to handle everyday language across a wide range of topics.

They were not built to know that your callers are going to say the name of a specific insurance product, a Dutch medication name, a logistics carrier code, or a scooter brand that appears nowhere in standard speech training data.

When a caller uses vocabulary that the ASR engine has rarely or never encountered, the engine makes its best guess based on phonetic similarity. Sometimes the guess is correct. Often it is not. And when it is not, the caller's actual words never reach the language model at all.

Why ASR Errors Are More Damaging Than They Look

A single ASR error in a call creates a chain reaction.

The language model receives a transcription that does not match what the caller said. It generates a response to the wrong input. The caller hears an answer to a question they did not ask. They correct themselves. The agent processes the correction. If the same word is misrecognised again, the caller loses confidence in the system.

Beyond the individual call experience, ASR errors at scale have measurable operational consequences.

The percentage of calls that the AI can handle independently drops, because more calls end in escalation to a human agent. Average handle time increases because calls require more turns to reach resolution. First-call resolution decreases because the agent cannot act correctly on misheard data. And data quality in connected systems suffers because the output of a misheard input is never the data your backend expected.

In industries where precision matters, the consequences become even more concrete. A healthcare provider whose voice agent mishears a patient's medication name is not dealing with an inconvenience. It is a risk. An insurance company whose agent mishears a policy type routes the caller to the wrong workflow. A logistics company whose agent mishears a carrier code fails to retrieve the correct shipment record.

The common thread is the same in every case. The problem is not that the language model is unintelligent. The problem is that it never received the correct input.

Where ASR Specifically Struggles

Understanding which categories of words cause the most ASR errors helps you identify exactly where hints will have the greatest impact in your own call flows.

Brand names and product names

These are the highest-risk category for most businesses. Brand names, product lines, and service names are typically not common in general speech datasets. A caller who says the name of a specific insurance product, a specific scooter brand, or a specific software tier is using vocabulary the ASR engine has likely encountered very few times.

Proper nouns and place names

City names, street names, neighbourhood names, and company names cause consistent errors in general-purpose ASR, particularly for Dutch language flows where the name pool differs significantly from English training data.

Medical and pharmaceutical terminology

Medication names, specialist designations, procedure names, and clinical terminology are highly domain-specific. A general ASR engine encountering a Dutch medication name for the first time will produce a phonetically similar but meaningless transcription.

Industry-specific codes and identifiers

This is one of the areas where ASR struggles most, and where hints make the biggest difference. Think of license plates, dates of birth, postal codes, policy numbers, order references, carrier codes, and product SKUs. These follow patterns that general training data does not prepare an ASR engine to handle. Short alphanumeric identifiers are particularly vulnerable, because there is little phonetic context to anchor the recognition. A license plate such as 47 XBP 9, or a postal code such as 1234 AB, is often misinterpreted without hints, while this input is precisely what businesses rely on to retrieve customer data, vehicle information, or delivery addresses.

These are exactly the types of input that companies depend on most for automated verification and routing, and where an ASR error directly leads to a failed transaction.

The Solution: ASR Hints in the AssistYou Flow Builder

AssistYou now makes it possible to add ASR hints directly at any node in your conversational flow.

When the ASR engine reaches a node that has hints configured, it receives those words as additional context before the caller speaks. It uses that context to weight its transcription decisions toward the expected vocabulary. A scooter brand name that would otherwise be transcribed as a random phonetic match is now recognised correctly because the engine was told to expect it.

This is how hint-based vocabulary boosting works in professional speech recognition systems. Custom vocabulary features let you tell the system to expect specific terms, which dramatically improves accuracy for domain-specific content.

The implementation in the AssistYou platform requires no technical expertise. You navigate to the node where callers are likely to say specific words, open the node settings, and add the relevant hints as a list. The hints apply to that node only, which means you are giving the ASR engine precise, contextual guidance at exactly the right moment in the flow, rather than a blanket vocabulary list that applies indiscriminately.

A Practical Example: Insurance Policy Name Recognition

Consider an insurance company running an inbound AI voice agent for claims intake. At one step in the flow, the agent asks the caller to confirm which product their claim relates to.

The company offers twelve distinct products, each with a specific name that uses a combination of the brand name and a descriptor: words like "Rechtsbijstand", "Aansprakelijkheid", and "Inboedel" that are clear to a native Dutch speaker but phonetically unusual for a general ASR engine, particularly under the audio quality conditions of a standard phone line.

Without ASR hints, the engine produces its best guess. Some product names are recognised correctly. Others are not. The agent receives the wrong product name, routes the caller to the wrong claims workflow, and the error is only discovered when a human agent reviews the ticket.

With ASR hints configured at that node, all twelve product names are provided to the engine before the caller speaks. The engine now transcribes those names correctly at a significantly higher rate. The claims workflow receives the correct product name. The caller is routed correctly. The data in the connected CRM is accurate.

This is the difference between a voice agent that requires regular human correction and one that operates reliably at scale.

ASR Hints and the Dutch Language Specifically

For Dutch-language call flows, ASR hints are particularly valuable.

Dutch is a well-supported language in commercial ASR systems, but Dutch-specific proper nouns, place names, and domain vocabulary remain underrepresented in most training datasets. An ASR engine that handles standard Dutch conversational phrases reliably may still struggle with a neighbourhood name in Utrecht or a medication name that uses Dutch phonology in an unexpected way.

At AssistYou, we combine the strengths of leading ASR providers, including Speechmatics and Deepgram, into one integrated system that natively supports hint configuration. By bringing these engines together, we use the strong points of each provider and reinforce recognition accuracy in a way that no single engine can deliver on its own. The hint system works together with the vocabulary boosting features of the underlying ASR provider, which means the improvement takes place at the recognition level itself, and not as a correction in post processing.

This matters because post-processing corrections, which some platforms apply after the ASR output is produced, can only fix errors the system is already trained to detect. Hint-based boosting works earlier in the process, before the error is made.

What ASR Hints Do Not Do

It is important to be precise about the scope of this feature so you configure it correctly.

ASR hints improve transcription accuracy for expected vocabulary at specific points in a flow. They are not a replacement for a well-trained ASR engine. They do not fix audio quality issues caused by poor phone connections or significant background noise. They do not improve recognition of spontaneous, unpredictable vocabulary that you cannot anticipate in advance.

Hints are most effective when you know what a caller is likely to say at a specific step. If a step asks an open-ended question with a wide range of possible answers, hints are less applicable. If a step asks for a specific type of input, such as a product name, a brand, a policy type, or a specialist category, hints deliver a clear and measurable improvement.

Use them where the vocabulary is known and specific. That is where they produce the greatest impact.

Frequently Asked Questions

What is ASR in the context of an AI voice agent? ASR stands for Automatic Speech Recognition. It is the technology that converts spoken caller input into text that the AI language model can process. Every AI voice agent depends on an ASR engine working correctly before any conversational logic can function.

Why does ASR produce errors on specific words? General-purpose ASR engines are trained on broad speech datasets. They perform well on common vocabulary but struggle with domain-specific terms, brand names, proper nouns, and industry-specific language that appears rarely or not at all in standard training data.

What are ASR hints? ASR hints are a list of words or phrases you provide to the ASR engine at a specific node in your call flow. They tell the engine what vocabulary to expect at that moment, which increases the probability of correct transcription for that specific input.

Do ASR hints require coding or technical setup? No. Hints are configured directly in the AssistYou flow builder through the node settings interface. No coding, no infrastructure changes, and no involvement from a technical team are required.

Which use cases benefit most from ASR hints? Any flow where callers provide domain-specific vocabulary benefits from ASR hints. Insurance, healthcare, mobility, logistics, and legal services are the sectors where the improvement is most measurable, because these industries use terminology that is underrepresented in general ASR training data.

Do ASR hints work for Dutch-language flows? Yes. AssistYou supports hint configuration across Dutch-language flows through integrations with Speechmatics and Deepgram, both of which support native vocabulary boosting for Dutch.