Paralyzed Patients Speak Again Thanks to AI-Powered Brain Implants
Efforts to restore speech to people silenced by brain injuries and diseases have taken a significant step forward.
Complete the form below to unlock access to ALL audio articles.
Efforts to restore speech to people silenced by brain injuries and diseases have taken a significant step forward with the publication of two new papers in the journal Nature.
In the work, two multidisciplinary teams demonstrated new records of speed and accuracy for state-of-the-art, AI-assisted brain-computer interface (BCI) systems. The advances point the way to granting people who can no longer speak the ability to communicate at near conversation-level pace and even show how that text can be retranslated into speech using computer programs that mimic the patient’s voice. One group developed a digital avatar that a paralyzed patient used to communicate with accurate facial gestures.
Breaking the silence
People living with neurodegenerative disorders, such as amyotrophic lateral sclerosis (ALS), or those who have been affected by brain damage or stroke, can lose the ability to use the muscles required to speak, a condition called anarthria. Communication with loved ones can be a priceless solace in the wake of such complex medical diagnoses.
Jaimie Henderson, a professor of neurology at Stanford University, has a personal connection to this research. “When I was five years old, my dad was involved in a devastating car accident,” Henderson said in a press conference. Henderson’s father was left with very impaired movement and speech. “I grew up wishing that I could know and communicate with him.” Now, decades later, Henderson’s team has taken steps toward making communication for people like his father far easier.
This isn’t Henderson’s first foray into the area; in 2021, a revelatory study asked patients to imagine writing out characters, a process called “mindwriting”. By translating the resulting brain activity, the researchers were able to train a typing program to output 90 characters per minute, the current record for such software. The new research shatters that record.
Brain implant
One paper, featuring Henderson as senior author, worked with a patient, Pat Bennett, who had lost the ability to produce intelligible speech due to ALS. A brain implant was inserted into a part of Bennett’s sensorimotor cortex, the brain area responsible for moving the muscles in the mouth and face that produce speech. Here, electrical signals representing tiny movements in the jaw, mouth and tongue were still present, even if the muscles required to act on them no longer worked properly.
Henderson and his team were able to extract information from these areas by training an AI algorithm, which read the signals produced when Bennett attempted to read preset sentences. The model guessed the statistical likelihood of particular word sounds having been intended based on Bennett’s brain activity, in the same way that ChatGPT guesses what to write in response to a prompt. The output was then run through a language model, which guessed what words the phonemes were most likely to have built up to. The training sessions continued twice a week for four months. By the time the training was complete, Bennett’s attempted sentences could be decoded into on-screen text at an average pace of 62 words per minute, more than triple the speed attained in Henderson’s mindwriting paper.
Leading AI systems used to decode unimpaired speech into text have an error rate of 4–5%. When reading from a set of 50 words, built up to represent the needs someone with ALS may commonly require in their care, Benett was able to successfully communicate just over 90% of the time. When the algorithm was assessed using a massive 125,000-word vocabulary, the error rate jumped to 23.8%. Nevertheless, this represented the first time such a large vocabulary has ever been successfully decoded using a BCI and opens the door to more naturalistic speech.
Working with the data after the sessions had concluded, the team was able to cut the word error rate using the 125,000-word vocabulary to just 11.7%. "This breakthrough marks a new generation of brain-computer interfaces, where machine learning enhances neural probing to provide real value for patients," said Laurent Itti, professor of computer science, psychology and neuroscience at the University of Southern California, who was not involved in the study.
Digital avatar
The second paper, led by researchers and clinicians at the University of California San Francisco (UCSF) including neurosurgeon Edward Chang, worked with another, unnamed patient, who had sustained a stroke more than a decade previously, which stopped them being able to speak. As in Henderson’s study, Chang’s team targeted brain regions that normally would have moved muscles in the face and mouth. The UCSF researchers, however, specialize in a technique called electrocorticography (ECoG), which records signals from the surface of the brain without inserting electrodes into it. There are many more similarities between the two approaches than differences, says Frank Willett, first author on Henderson’s paper.
Once again using a phoneme-based learning approach, Chang’s team also produced a system that could accurately and quickly translate text into speech, reaching a rate of 78 words per minute with a 25.5% word error rate on a 1000-word vocabulary. Based on conversations with the patient, however, the team decided to take their work further, converting the text back into speech.
As anyone who has listened to robotic voiceovers on TikTok knows well, there is far more to human communication than accurate word replication. “Using a clip from her wedding video, we were able to decode these sounds into a voice that sounded just like [the patient’s] prior to the stroke,” said Sean Metzger, first author on Chang’s paper. The team also wanted to give the patient back her ability to communicate using facial movements. To this end, they created a “personalized avatar”, which moved its digital face in response to the brain signals that the patient would have used to move their own face prior to their stroke.
Clinical advance
The papers represent a major advance in BCI technology but remain some way from being rolled out to the general public. Henderson and Willett’s BCI required extensive training to cope with the inherent variable of brain cell data between test days, which will limit the technology’s application – for now. “As we do more of these recordings and get more of this data, we should be able to transfer what the algorithms learn from other people to a new person, so we don’t have to do as much training,” speculated Willett.
Chang and Metzger’s study, which drew from wider populations of neurons, required less calibration. In one demonstration of the stability of their data, the team tried to decode one of 26 NATO code words from the brain data. Metzger explained that the team could “freeze” the decoder, meaning that it stopped learning new data, and still produce perfect decoding nearly 80 days later.
“Reaching 60-70 words per minute,” said Chang, “is a real milestone for our field and we are really excited about it.”
“The most important message is that there is hope that this is going to continue to improve and provide a solution in coming years,” he concluded.
Of course, the people most impacted by this research are the patients. Asked for her feedback on the system, the participant in Chang’s study said, “First, the simple fact of hearing a voice similar to your own is emotional. Being able to have the ability to speak aloud is very important.”
References:
Willett FR, Kunz EM, Fan C et al. A high-performance speech neuroprosthesis. Nature. 2023. doi: 10.1038/s41586-023-06377-x
Metzger SL, Littlejohn KT, Silva AB et al. A high-performance neuroprosthesis for
speech decoding and avatar control. Nature. 2023. doi: 10.1038/s41586-023-06443-4