Why Do We Keep Doing Things That Don’t Feel Rewarding?
Researchers identify a dopamine signal that reinforces repeated actions without tracking reward.

Complete the form below to unlock access to ALL audio articles.
Why do we keep doing the same thing, even when it stops being rewarding? Neuroscientists at the Sainsbury Wellcome Centre, UCL, have identified a second dopamine-based learning system in the brain that reinforces repeated actions independently of their outcome.
Published in Nature, the work helps explain how habits form and why bad ones can be so difficult to break.
The role of dopamine in habit formation
We form habits by repeating actions that once helped us achieve something useful. But what happens in the brain when those actions become automatic? When we stop thinking about their outcomes and just do them?
Previous research has focused on how the brain uses dopamine to signal surprise about rewards, called reward prediction errors (RPEs). These help us learn which actions are worth repeating. But RPEs don’t explain why we keep doing things when they’re no longer rewarding, or how habits take hold in the first place.
Some dopamine neurons don't seem to care about reward at all. Instead, they fire before or during movement, raising questions such as, are they involved in controlling movement, or might they be doing something else?
Computer models have suggested the brain uses a second kind of learning signal called an action prediction error (APE). Unlike RPEs, APEs track how expected an action is in a given situation. When an action is more or less likely than predicted, the signal updates the tendency to repeat it.
The new study set out to test whether this second system exists in the brain. The team were interested to know if movement-related dopamine signals act as APEs – and if so, how this second system supports the formation of stable, automatic behaviours.
A second dopamine learning system
To test the idea of a second learning system, researchers trained mice to make choices based on sound. In this task, mice heard tones of different frequencies and learned to choose a left or right port to get a reward. The researchers focused on the tail of the striatum (TS) – a brain region linked to movement, not reward.
They found mice with damage to the TS could still learn the task, but they stalled partway through.
“We observed that lesioned mice and control mice initially learn in the same way, but once they get to about 60-70% performance, i.e. when they develop a preference, then the control mice rapidly learn and develop expert performance, whereas the lesioned mice only continue to learn in a linear fashion,” said lead author Dr. Marcus Stephenson-Jones, a group leader at the Sainsbury Wellcome Centre.
“This is because the lesioned mice can only use RPE, whereas the control mice have two learning systems, RPE and APE, which contribute to the choice,” he explained.
The team used a fluorescent sensor to track dopamine activity. In the TS, dopamine release was tied to movement, not reward. The signal got smaller as actions became predictable, matching what would be expected from an APE. Even when the actual reward was larger or smaller than expected, TS dopamine signals did not change.
When the familiar auditory cue was replaced with a novel sound, dopamine release in the TS increased, as predicted by the APE model, confirming that the signal reflected prediction error rather than mere movement.
When the researchers artificially boosted TS dopamine at decision points, the mice became more likely to repeat the same action next time, even without a reward change.
Stimulation of TS dopamine in open field experiments had no effect on how often or how fast mice moved, ruling out the idea that it simply triggers movement.
Computational models also found that APE alone didn’t help learning, but when added to the reward-based system, it made the learning faster and more stable.
“Essentially, we have found a mechanism that we think is responsible for habits. Once you have developed a preference for a certain action, then you can bypass your value-based system and just rely on your default policy of what you’ve done in the past. This might then allow you to free up cognitive resources to make value-based decisions about something else,” said Stephenson-Jones.
Implications for addiction, Parkinson’s and habit change
Identifying a second dopamine-based learning system – one that reinforces repeated actions without tracking reward – suggests a mechanism for how behaviors can become automatic.
“Imagine going to your local sandwich shop. The first time you go, you might take your time choosing a sandwich and, depending on which you pick, you may or may not like it. But if you go back to the shop on many occasions, you no longer spend time wondering which sandwich to select and instead start picking one you like by default. We think it is the APE dopamine signal in the brain that is allowing you to store this default policy,” said Stephenson-Jones.
The finding also helps reframe how we think about compulsive behavior and addiction. If repeated actions are reinforced by a value-free signal, then those behaviors may persist even when they stop being rewarding. Interventions might work better if they focus on replacing a habit, rather than just suppressing it.
“Now that we know this second learning system exists in the brain, we have a scientific basis for developing new strategies to break bad habits,” said Stephenson-Jones.
“Up until now, most research on addictions and compulsions has focused on the nucleus accumbens. Our research has opened up a new place to look in the brain for potential therapeutic targets,” said Stephenson Jones.
The movement-related dopamine neurons that signal APE are also the same type that degenerate in Parkinson’s disease, which could explain why habitual behaviors like walking are disrupted, while more flexible actions can remain intact.
“Suddenly, we now have a theory for paradoxical movement in Parkinson’s. This gives us a new place to look in the brain and a new way of thinking about Parkinson’s,” said Stephenson-Jones.
The team is now testing whether disrupting the APE signal prevents habits from forming and investigating how the two learning systems coordinate over time to shape behaviour.
Reference: Greenstreet F, Vergara HM, Johansson Y, et al. Dopaminergic action prediction errors serve as a value-free teaching signal. Nature. 2025. doi: 10.1038/s41586-025-09008-9
This article is a rework of a press release issued by the Sainsbury Wellcome Centre. Material has been edited for length and content.