What is the full form of ASR

ASR: Automated Speech Recognition

ASR stands for Automated Speech Recognition. It refers to a technology that converts spoken words into written text. This technology allows computers to identify and process the words a person speaks into an input device or microphone connected to a computer.

Types

ASR is independent transcription software designed to convert spoken language into readable text. It is of two types which are as follows.

Direct dialogue conversations: It is a basic version of ASR. It consists of a machine interface which connects with people. You are required to verbally interact with the computer; the machine tells you to respond with a specific word from a list of words and, accordingly, provides a response or answer to your request. Automated telephone banking uses this technology to enable customers to perform a wide range of financial transactions over the telephone.
Natural language conversation: It is a more advanced and sophisticated version of ASR. It understands the user's speech or written material and responds to the user on the basis of understood content. It enables people to interact with the computer using everyday language.

How ASR works

The basic sequence of events that exist in ASR is as follows:

A person speaks to the software using an input device like a microphone.
The input device creates a wave file of your words.
The volume of the wave file is normalized, and background noises are removed.
The cleaned wave file is broken down into phonemes which are the smallest units of sound. There are around 44 phonemes in English.
The ASR software analyzes the phonemes, starting from the first phoneme. It uses statistical probability analysis to figure out whole words before making a complete sentence.
Now, after understanding the words, the ASR responds in a meaningful way.

What Is the Process of Natural Language Processing?

NLP is significantly more crucial to the advancement of speech recognition systems than directed dialogue because it is the route that ASR technology will take in the future.

Its operation is intended to roughly mirror how people understand speech and respond to it.

An NLP ASR system typically has a vocabulary of at least 60,000 words. If you say just three words in a row, there are almost 215 trillion possible word combinations!

The obvious conclusion is that it would be completely unworkable for an NLP ASR system to search through its full vocabulary for each word and process it separately. Instead, a much smaller list of carefully chosen "tagged" keywords that provide context for larger queries is what the natural language processing system is intended to respond to.

In order to reply appropriately, the system can much more quickly narrow down the specifics of what you are saying to it and identify the words being used by leveraging these contextual cues.

When you say things like "weather forecast," "check my balance," and "I'd like to pay my bills," for instance, the NLP system may focus on the tagged keywords "forecast," "balance," and "bills." These terms would then be utilised to determine the context of the other words you used, preventing mistakes like conflating "weather" and "whether."

The Tuning Test: How ASR is "Made to Learn" from Humans

Whether they are NLP or directed dialogue systems, ASR systems are trained using two fundamental processes. Human "Tuning," is the first and less complex of them, and "Active Learning," is the second and significantly more complex form.

Human tuning: Human programmers browse through the conversation records of a particular ASR software interface and look at the common terms that it required to hear but did not already have in its pre-programmed vocabulary. This is a fairly simple form of ASR training. The programme is then updated with those terms to increase its capacity for speech comprehension.

Active learning: The much more advanced form of ASR, known as active learning, is currently being tested, especially with NLP voice recognition software. With active learning, the software is designed to autonomously pick up, remember, and use new words, constantly growing its vocabulary as it is exposed to novel linguistic constructions.

Theoretically, this enables the programme to recognise the more particular speech patterns of particular people so that it can interact with them more effectively.

So, for instance, if a particular human user repeatedly rejects an autocorrect for a particular phrase, the NLP software eventually learns to identify that user's particular use of the word as the "correct" form.

Next TopicFull Form

← prev next →