- Legal Futures - https://www.legalfutures.co.uk -

An introduction to speech recognition

By Legal Futures Associate Philips Dictation [1]

Speech recognition, also known as automatic speech recognition (ASR), computer speech recognition, or speech-to-text, is a capability which enables a program to identify human speech and convert it into readable text.

Whilst the more basic speech recognition software has a limited vocabulary, we are now seeing the emergence of more sophisticated software can handle natural speech, different accents and various languages, whilst also achieving much higher accuracy rates. We are also using speech recognition technology much more in our everyday lives, with an increasing number of people taking advantage of digital assistants like Google Home, Siri, and Amazon Alexa.

So, how has the technology evolved, how does it work and what are the opportunities for businesses and professionals across numerous industries and sectors to exploit speech recognition in the everyday work?

History

Here’s a quick overview of how speech recognition has developed from the early prototypes:

How it works

A wide range of speech recognition applications and devices are available, with the more advanced solutions now use Artificial Intelligence (AI) and machine learning. They are typically based on the following models:

Initially, the Hidden Markov Model (HMM) was widely adopted as an acoustic modelling approach. However, it has largely been replaced by deep neural networks. The use of deep learning in speech recognition has had the effect of significantly lowering the word error rate.

Word error rate

A key factor in speech recognition technology is its accuracy rate, commonly referred to as the word error rate (WER). A number of factors can impact upon the WER, for example different speech patterns, speaking styles, languages, dialects, accents and phrasings. The challenge for the software algorithms that process and organise audio into text are to address these effectively, whilst also being able to separate the spoken audio from background noise that often accompanies the signal.

The application of speech recognition

Thanks to laptops, tablets and smartphones, together with the rapid development of AI, speech recognition software has entered all aspects of our everyday life. Examples include:

Virtual assistants

These integrate with a range of different platforms and enable us to command our devices just by talking. At the personal level examples include Siri, Alexa and Google Assistant. In the office they can be used to complement the work of human employees by taking responsibility for repetitive, time-consuming tasks and allowing employees to focus their energy on more high-priority activities.

Voice search

Speech recognition technology is not only impacting the way businesses perform daily tasks but also how their customers are able to reach them. Voice search is typically used on devices such as smartphones, laptops and tablets, allowing users to input a voice-based search query instead of typing their query into a search engine. The differences between spoken and typed queries can cause different SERP (search engine results page) results since the way we speak creates new voice search keywords that are more conversational than typed keywords.

Speech to text solutions

And finally, the most significant area as far as business users are concerned is speech to text software. This area is growing rapidly, due in no small part to the availability of cloud-based solutions that are enabling users to access fully featured versions of speech to text apps from the smartphones or tablets irrespective of their locations. Furthermore, speech recognition technology can reduce repetitive tasks and free up professionals to use their time more productively, whilst also allowing businesses to save money by automating processes and doing administrative tasks more quickly.