Humans have a wrong understanding of the brain's dopamine mechanism! The technology behind the top version of AlphaGo inspires brain science, and DeepMind's latest results are published in Nature
Lai Keqian Ming Shisan sent from Aofei Temple
Quantum Bit Report | Public Account QbitAI
Artificial intelligence often draws inspiration from the way humans think.
But now it’s the other way around!
Advances in artificial intelligence are already providing insights into how the brain learns.
This is the latest research from DeepMind, just published in Nature, which proves:
Distributional reinforcement learning, the core technology behind AlphaGo's top versions Alpha Zero and AlphaStar, provides a new explanation for how the reward pathways in the brain work.
This conclusion also made Hassabis, the founder of DeepMind, very excited. He tweeted:
Our research in machine learning can give us a new understanding of how the brain works, which is very exciting!
He certainly had reason to be excited.
In the long run, this also proves that the algorithm proposed by DeepMind is similar to the operating logic of the brain, which means that it can be better expanded to solve complex real-world problems.
And Hassabis' goal has always been to create general artificial intelligence.
The weapon behind the Alpha series: distributed reinforcement learning
Reinforcement learning is to allow the intelligent agent to take some actions in an unknown environment, then receive rewards and enter the next state.
The temporal difference learning (TD) algorithm can be said to be the center of reinforcement learning.
It is a method of learning how to predict the value given the future value of a state.
The algorithm compares the new forecast to expectations.
If the two are found to be different, this "time difference" will adjust the old prediction into the new prediction, making the result more accurate.
△ When the future is uncertain, future returns can be represented as a probability distribution. Some outcomes may be good (cyan), and some outcomes may be bad (red).
The amount of future reward resulting from a particular action is usually unknown and random. In this case, the standard TD algorithm learns to predict the average future reward.
Distributional reinforcement learning is a more complex prediction method that predicts the probability distribution of all future rewards .
So what is the dopamine reward mechanism in the human brain?
Then the prototype of the research took root in the minds of DeepMind researchers.
You don’t know until you study it, and once you do, you’ll be really shocked.
In the past, people believed that dopamine neurons should respond in the same way.
It's a bit like being in a choir, where everyone sings the exact same notes.
But the team found that individual dopamine neurons seemed to be different - displaying varying levels of motivation.
So the researchers trained mice to perform a task and gave them rewards of varying and unpredictable sizes.
They found evidence of "distributed reinforcement learning" in the ventral tegmental area (a midbrain structure that controls the release of dopamine to limbic and cortical areas) of mice .
This evidence suggests that reward predictions are represented simultaneously and in parallel by multiple future outcomes.
Isn’t this too similar to the principle of distributed machine learning?
Explaining the brain's dopamine system
The experiment used light discrimination technology to record the responses of single dopamine neurons in the ventral tegmental area of the mouse brain.
The ventral tegmental area is rich in dopamine and serotonin neurons and is part of two major dopamine neural pathways.
Based on reinforcement learning theory, the study hypothesizes that there is a dopamine reward prediction error (RPE) in the brain.
A signal induces a reward prediction, and when the reward prediction is below the mean of the distribution, it induces a negative RPE, while a larger reward induces a positive RPE.
In general reinforcement learning, a reward amplitude below the mean distribution will cause a negative RPE, while a larger amplitude will induce a positive RPE (as shown in the left figure a above).
In distributional reinforcement learning, each channel carries a different RPE value prediction, with different channels having different levels of positivity.
The predictions of these values in turn provide reference points for different RPE signals. In the final result, a single reward outcome can stimulate both positive RPE and negative RPE (as shown in the right figure above).
The recording results show that the reversal points of dopamine neurons in the mouse brain vary according to the degree of positivity, which is consistent with the characteristics of distributed reinforcement learning (as shown in Figure b above).
To verify that the diversity of neuronal responses is not random, the researchers conducted further verification.
The data is randomly split into two halves and the reversal points are estimated independently in each half. It is found that the reversal points in one half are correlated with the reversal points in the other half.
To further understand how neurons process reward predictions, the researchers stimulated the neurons with three different signals.
The reward probabilities were 10%, 50%, and 90% respectively, and the responses of four dopamine neurons were recorded simultaneously.
Each trace is the average response to one of the three cues, with time zero being the start time.
The results showed that some cells encoded the 50% cue as the 90% cue, while other cells encoded the 10% cue as the 10% cue simultaneously.
Finally, the researchers also conducted a validation attempt to decode the reward distribution from the firing rate of dopamine cells.
By performing inference, they successfully reconstructed a distribution that matched the actual distribution of rewards in the task in which the mice were participating.
The preliminary verification of the distributed reinforcement learning mechanism of the mouse brain gave researchers more thoughts:
What circuit or cellular-level mechanisms account for the asymmetric diversity?
How are different RPE pathways anatomically coupled to corresponding reward predictions?
These mysteries of the brain remain to be further understood.
Moreover, this research result also verifies the previous hypothesis that dopamine distribution affects the mechanism of mental disorders such as addiction and depression .
It has been theorized that both depression and bipolar disorder may involve negative emotions about the future.
These emotions are associated with negative prediction biases about the future, which may arise from asymmetries in RPE coding28,29.
But its greater significance lies in its encouragement of the development of current machine learning technology.
Matt Botvinick, head of neuroscience research at DeepMind, said: “When we can show that the brain uses algorithms similar to the ones we use in our AI work, it gives us more confidence.”
The results of an interdisciplinary research team
There are three co-first authors on this paper, and it is also the research result of an interdisciplinary team.
First on the list is Will Dabney, a senior research scientist at DeepMind.
Will Dabney
He received his undergraduate degree from the University of Oklahoma and his Ph.D. from the University of Massachusetts, Amherst.
Before joining DeepMind, he worked on Amazon's Echo team.
Joined DeepMind in 2016.
The second co- author is Zeb Kurth-Nelson, a research scientist at DeepMind.
Zeb Kurth-Nelson
He received his Ph.D. from the University of Minnesota and joined DeepMind in 2016.
The third co-first author is Naoshige Uchida, a professor of molecular and cell biology at Harvard University.
△ Naoshige Uchida
In addition, DeepMind founder Hassabis is also among the authors.
He has long hoped that breakthroughs in artificial intelligence will also help us master basic scientific problems.
Current research has found that the research direction they are committed to can actually provide inspiration for people's research on the brain, which undoubtedly strengthens their confidence in research.
One More Thing
At the same time as this paper was published in Nature, another study by DeepMind appeared in the same journal.
It is AlphaFold, launched by DeepMind in December 2018, a system that uses artificial intelligence to accelerate scientific discovery.
Based solely on the protein's genetic sequence, the protein's 3D structure can be predicted, and the results are more accurate than any previous model.
DeepMind said this was its first major milestone in scientific discovery and made significant progress in one of the core challenges in biology.
Up to now, DeepMind has proposed the Alpha series, from AlphaGo, to AlphaZero, to AlphaStar, and now AlphaFold, all four of which have been published in Nature.
Alas...the happiness of top research institutions is so plain and boring.
Portal
https://www.nature.com/articles/s41586-019-1924-6
The author is a contracted author of NetEase News and NetEase "Each has its own attitude"
-over-
AI Insider | Seize new opportunities for AI development
Expand your network of high-quality contacts, obtain the latest AI information & paper tutorials, welcome to join the AI Insider Community to learn together~
Communicate with experts | Enter the AI community
Quantum Bit QbitAI · Toutiao signed author
Tracking new trends in AI technology and products
If you like it, click "Watching"!