A Solution to the 'Cocktail Party' Problem

Industry Insight

Published: October 1, 2018

Ruairi J Mackenzie

A Solution to the 'Cocktail Party' Problem content piece image

Listen with

Speechify

0:00

Thank you. Listen to this article using the player above. ✖

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 3 minutes

Loud background noise doesn’t just annoy people by getting in the way of good conversation – it also affects voice activated devices. AudioTelligence, a start-up based out of Cambridge, say they have developed the world’s first digital solution to this ‘cocktail party problem’.

AudioTelligence’s technology is backed up by the expertise of Professor Peter Rayner who founded the Signal Processing and Communications Laboratory at the University of Cambridge and who sits on AudioTelligence’s Technical Board. The start-up has raised £3.1 million from Cambridge Innovation Capital and Cambridge Enterprise to grow their team in the face of demand for the technology in multiple applications, including in hearing aids and to improve voice recognition performance in digital assistants. We caught up with AudioTelligence’s CEO, Ken Roberts, and Andrew Williamson, Investment Director at Cambridge Innovation Capital, to find out more about the technology.

Ruairi Mackenzie (RM): Could you tell us more about how AudioTelligence uses blind signal separation to improve listening ability?

Ken Roberts (KR): AudioTelligence deploys a data driven approach. This uses a combination of Bayesian statistics and elements of machine learning to identify individual sound sources. Having done so we can automatically reject echoes of sources, and by actively eliminating individual sources we can achieve excellent interference rejection.

Existing technologies use a model based approach designed around a specific microphone array geometry and deploy a combination of beam-forming, echo cancellation and noise suppression to remove unwanted background noise.

The beam-forming algorithm attempts to focus the array in the direction of the desired sound source however it is difficult to focus the array to fully exclude unwanted sources. Traditional echo cancellation is also sensitive to acoustic reflections so together they only make a limited improvement.

In addition, the noise suppression typically attempts to remove all noise sources together. This results in relatively poor suppression of interfering sources. It can also eliminate certain frequencies entirely resulting in artefacts, which can make the output difficult to understand.

Our technology is independent of the array geometry and uses low cost uncalibrated microphones, and since the algorithm automatically adapts as the soundscape changes we can also eliminate changes caused by microphone occlusion due to ageing.

Our technology has been developed over 10 years and the original target was hearing assistance so a natural sounding output and low latency have always been essential metrics for our technology.

RM: The technology has many potential uses in everything from AI assistants to hearing aids – what applications are you most excited about?

KR: Our recent announcement has resulted in a huge amount of interest in the hearing assistance application and we have had a significant number of individuals contact us to express interest in a technology that can improve their ability to hear in noisy situations.

A common complaint has been that traditional hearing aids can’t help with this as they simply amplify all sounds equally, so background noise just gets louder along with the source of interest. In fact many people say they actually turn off their hearing aid in noisy situations as it makes it even more difficult to listen to an individual speaker.

As reported previously, over 400 million people, 5% of the world’s population, have hearing difficulty, so the potential to deliver something that can change people’s lives is a huge opportunity, and also a huge responsibility.

Although we set out 10 years ago to address the hearing assistance problem, in the meantime speech recognition and communications in general have become widespread and our ability to improve speech recognition in real-world noise is a huge opportunity, as is the ability to make it possible to use Siri or Google Assistant on a noisy street or to make a Skype call in a noisy internet café.

We have recently deployed our IP in the cloud and so one of the great opportunities for us is to be able to offer a solution that works equally well off-board the device in a Software as a service (SaaS) model where (as in speech recognition) latency is not an issue, running in software on-board a device that has available power and performance for low latency applications, or in silicon for a mobile application where power and performance are limited.

RM: I’ve noticed other companies integrating AI into their devices. What makes AudioTelligence’s machine learning capability stand out from the crowd?

KR: Our approach actually only uses elements of machine learning and the main technology is based on Bayesian statistics. Our engineering team have been developing professional audio solutions for more than 30 years (two of the developers have technical Oscars for contribution to the movie industry!) and it is their extensive experience and understanding of the problem that really makes the difference and accounts for our performance. It’s important to state that unlike AI solutions our technology doesn’t rely on training.

RM: AudioTelligence is based on the West Cambridge site alongside various University departments, and Emeritus Cambridge Prof Peter Rayner is on your technical board. What other advantages have you found from working in the Cambridge area?

KR: Outside of Silicon Valley, Cambridge is one of a very few areas around the world that has such a concentration of technology development and that brings people to the town to engage with companies like AudioTelligence.

Andrew Williamson (AW): Cambridge is the center of the universe for speech technology development, and includes VocalIQ which was acquired by Apple, Evi which was acquired by Amazon, and Entropic acquired by Microsoft. So today all of the major players have speech technology development teams in Cambridge and actively scout for capabilities that can give them a lead over competition.

The combination of cutting edge IP developed at the University of Cambridge and the deep pool of talented scientists and engineers provides a unique environment for growing category leading technology businesses, such as AudioTelligence.

Ken Roberts, CEO of AudioTelligence (left) with Andrew Williamson, Investment Director of CIC.

Ken Roberts and Andrew Williamson were speaking to Ruairi J Mackenzie, Science Writer for Technology Networks.

Meet the Author

Ruairi J Mackenzie

RJ is a freelance science writer based in Glasgow. He covers biological and biomedical science, with a focus on the complexities and curiosities of the brain and emerging AI technologies. RJ was a science writer at Technology Networks for six years. RJ has a Master’s degree in Clinical Neurosciences from the University of Cambridge.

Informatics

Informatics

A Solution to the 'Cocktail Party' Problem