A computer system capable of translating brain activity into synthesized speech by decoding the movements of muscles involved in vocalization, has shown its potential in a proof-of-concept experiment conducted by researchers at the University of California San Francisco (UCSF). Whilst this initial study was only conducted in volunteers without a speech impairment, there is hope that such technology could one day be used to help patients with neurological damage regain their speech, as an alternative to relying on slow and unwieldy nonverbal spelling tools.
Many patients with neurological conditions have trouble with speech. Sudden neurological attack in the form of a stroke can rob people of clear speech overnight, whilst degenerative conditions like amylotrophic lateral sclerosis (ALS) can see patients become paralyzed and gradually lose control of their vocal cords over an extended period. Currently available solutions generally exploit muscle movements to generate sentences letter-by-letter; Stephen Hawking famously controlled his computer interface with his cheek. Recent trials using brain-controlled interfaces, where patients' electrophysiological activity is read and used to control a cursor, have achieved speed rates of up to 8 words per minute. UCSF researchers Gopala K. Anumanchipalli, Josh Chartier, & Edward F. Chang had a loftier target: to design an interface that could match the rate of natural speech, which is anywhere from 130 to 150 words per minute.
Two steps to synthetic speech
Creating this vocal tract involved two steps. Firstly, an electrocorticography device (which measures electrical activity directly from the exposed cortex) was used to measure the brain signals produced by five volunteers’ sensorimotor cortices as they read several hundred sentences aloud to see how the signals corresponded to the motor movements that make sound. Chang’s team were not able to directly measure the movement of their volunteers’ articulators, and so turned to a neural network, a kind of artificially intelligent computational system, to lend a hand. This network was trained using a library of data from previous experiments that measured speech and vocal tract movements together. The network was able to learn what those movements looked like, and then apply them to the brain activity that Chang’s team had measured.
An example array of intracranial electrodes of the type used to record brain activity in the current study. Credit: UCSF
With the activity decoded into vocal tract movements, a second neural network was then trained to transform the movements into synthesized speech. This resulted in clearer speech than was previously achievable using one-step methods. The clarity of the speech was measured using the crowdsourcing platform Amazon Mechanical Turk, where users were tasked with recognizing the words spoken by the synthetic vocal tract from banks of possible words, showing a success rate of roughly 70% for three-syllable words chosen from a bank of ten possible words (although this rate was lower when users chose from a bank of 50 words, at under 40%).
Voice to the voiceless?
Whilst the ultimate goal for this research is to restore voice to those with paralysis, this study didn’t examine any volunteers with speech impairment. Would this system be usable by those who need it most? Chang’s team produced some compelling results in a second experiment that point the way towards this goal: volunteers were asked to silently mime words, and the system was tasked with turning those mimed movements into speech. Whilst there was a loss in fidelity, the system was able to produce intelligible words from silent speech. But would the system work for patients without any mouth movements to map? And how would the system work for patients who have never been able to speak, such as people living with cerebral palsy? One promising result from the study showed that elements of the virtual vocal tract could be shared between speakers, meaning that the brain activity of one person could be applied to a vocal tract created from the speech of another person, although further improvements and testing will be required to the system to make it widely usable in this way.
Senior author Chang is optimistic about the potential of the device to eventually help patients unable to speak: "We are hopeful, of course, in that particular case, it's not so much about tapping into speech, but is really about learning to speak through a device.”
Nonetheless, he is clear that there is more work to be done before such a decoder could work clinically: “Everything that we have described is trying to more or less plug and play where essentially, you have a decoder and you try and decode those intact representations of speech. But I think that for certain people in the future who are candidates, speech may have to be learned from the bottom up (…) It will be very exciting to see if this virtual vocal tract that we have created, whether that can help people have the ability to speak who have never spoken before.”