Metalearning (neuroscience)

Metalearning is a neuroscientific term proposed by Kenji Doya,^[1] as a theory for how neurotransmitters facilitate distributed learning mechanisms in the Basal Ganglia. The theory primarily involves the role of neurotransmitters in dynamically adjusting the way computational learning algorithms^[2] interact to produce the kinds of robust learning behaviour currently unique to biological life forms.^[3] 'Metalearning' has previously been applied to the fields of Social Psychology and Computer Science but in this context exists as an entirely new concept.

The theory of Metalearning builds off earlier work by Doya into the learning algorithms of Supervised learning, Reinforcement learning and Unsupervised learning in the Cerebellum, Basal Ganglia and Cerebral Cortex respectively.^[2] The theory emerged from efforts to unify the dynamic selection process for these three learning algorithms to a regulatory mechanism reducible to individual neurotransmitters.

Roles of Neuromodulators

Dopamine

Dopamine is proposed to act as a "global learning" signal, critical to prediction of rewards and action reinforcement. In this way, dopamine is involved in a learning algorithm in which Actor, Environment and Critic are bound in a dynamic interplay that ultimately seeks to maximise the sum of future rewards by producing an optimal action selection policy. In this context, Critic and Actor are characterised as independent network edges that also form a single Complex Agent. This Agent collectively influences the information state of the Environment, which is fed back to the Agent for future computations. Through a separate pathway, Environment is also fed back to Critic in the form of the reward gained through the given action, meaning an equilibrium can be reached between the predicted reward of given policy for a given state, and the evolving prospect of future rewards.

Serotonin

Serotonin is proposed to control the balance between short and long term reward prediction, essentially by variably "discounting" expected future reward sums that may require too much expenditure to achieve. In this way, serotonin may facilitate the expectation of reward at a quasi-emotional level, and thus either encourage or discourage persistence in reward-seeking behaviour depending on the demand of the task, and the duration of persistence required. As global reward prediction would theoretically result from Serotonin modulated computations reaching a steady state with the computations similarly modulated by Dopamine; high serotonergic signalling may override the computations of Dopamine and produce a divergent paradigm of reward not mathematically viable through the dopamine modulated computations alone.

Norepinephrine

Norepinephrine is proposed to facilitate "wide exploration" by stochastic action selection. The choice between focusing on known, effective strategies or selecting new, experimental ones is known in probability theory as the Exploration-Exploitation Problem.^[4] An interplay between situational urgency, and the effectiveness of known strategies thus influences the dilemma between reliable selection for the largest predicted reward, and exploratory selection outside known parameters. Since neuronal firing cascades (such as those required to perfectly swing a golf club) are by definition unstable and prone to variation; Norepinephrine thus selects for the most reliable known execution pattern at higher levels, and allows for more random and unreliable selection at low levels with the purpose of potentially discovering more efficient strategies in the process.

Acetylcholine

Acetylcholine is proposed to facilitate the balance between memory storage and memory renewal,^[5] finding an optimal balance between stability and effectiveness of learning algorithms for the specific environmental task. Acetylcholine thus modulates plasticity in the Hippocampus, Cerebral Cortex and Striatum to facilitate ideal learning conditions in the brain. High levels of Acetylcholine would thus allow for very rapid learning and remodelling of synaptic connections, with the consequence that existing learning may become undone. Likewise, the learning of states that takes place over an extended temporal resolution may be overridden before it reaches a functional level, and thus learning may occur too quickly to actually be performed efficiently. At lower levels of Norepinephrine, plastic changes are proposed to occur much more slowly, potentially being protective against unhelpful learning conditions or allowing for information changes to embody a much broader temporal resolution.

Metalearning

Central to the idea of Metalearning is that global learning can be modelled as function of efficient selection of these four neuromodulators. While no mechanistic model is put forward for where Metalearning ultimately exists in the hierarchy of agency, the model has thus far demonstrated the dynamics necessary to infer the existence of such an agent in biological learning as a whole. While computational models and information systems are still far away from approaching the complexity of human learning; Metalearning provides a promising path forwards for the future evolution of such systems as they increasingly approach the complexity of the biological world.

Potential Applications

The investigation of Metalearning as a neuroscientific concept has potential benefits to both the understanding and treatment of Psychiatric Disease, as well as bridging the gaps between Neural Networks, Computer Science and Machine Learning.^[1]

References

^ ^a ^b Doya, K. (2002). "Metalearning and neuromodulation". Neural Networks. 15 (4–6): 495–506. doi:10.1016/S0893-6080(02)00044-8. PMID 12371507. Retrieved 2013-08-04.
^ ^a ^b Doya, K. (1999). "What are the computations of the cerebellum, the basal ganglia and the cerebral cortex?". Neural Networks. 12 (7–8): 961–974. doi:10.1016/S0893-6080(99)00046-5. PMID 12662639.
^ Doya, K. (2000). "Metalearning, neuromodulation, and emotion" (PDF). Affective Minds. Archived from the original (PDF) on 2007-02-21. Retrieved 2013-08-04.
^ Usher; et al. (1999). "The Role of Locus Coeruleus in the Regulation of Cognitive Performance". Science. 283 (5401): 549–554. Bibcode:1999Sci...283..549U. doi:10.1126/science.283.5401.549. PMID 9915705. Retrieved 2013-08-04.
^ Hasselmo, Michael (1993). "Acetylcholine and memory". Trends in Neurosciences. 16 (6): 218–222. doi:10.1016/0166-2236(93)90159-J. PMID 7688162. S2CID 3957170.

External links

Neural Computation Unit at the Okinawa Institute of Science and Technology
Neural Computation Project at the ATR Brain Information Communication Research Laboratory Group

[M&N-1] Doya, K. (2002). "Metalearning and neuromodulation". Neural Networks. 15 (4–6): 495–506. doi:10.1016/S0893-6080(02)00044-8. PMID 12371507. Retrieved 2013-08-04.

[comp-2] Doya, K. (1999). "What are the computations of the cerebellum, the basal ganglia and the cerebral cortex?". Neural Networks. 12 (7–8): 961–974. doi:10.1016/S0893-6080(99)00046-5. PMID 12662639.

[3] Doya, K. (2000). "Metalearning, neuromodulation, and emotion" (PDF). Affective Minds. Archived from the original (PDF) on 2007-02-21. Retrieved 2013-08-04.

[4] Usher; et al. (1999). "The Role of Locus Coeruleus in the Regulation of Cognitive Performance". Science. 283 (5401): 549–554. Bibcode:1999Sci...283..549U. doi:10.1126/science.283.5401.549. PMID 9915705. Retrieved 2013-08-04.

[5] Hasselmo, Michael (1993). "Acetylcholine and memory". Trends in Neurosciences. 16 (6): 218–222. doi:10.1016/0166-2236(93)90159-J. PMID 7688162. S2CID 3957170.

[1]

[2]

[3]

[4]

[5]