Learning requires the sophisticated ability to constantly update expectations in order to make accurate predictions about the changing environment. Although a full characterization of how this is orchestrated by the brain remains elusive, a new study published by Cell Press in the May 27 issue of the journal Neuron provides insight into how the human brain may use a combination of two distinct strategies to guide behavior.
One accepted learning strategy, called model-free learning, relies on trial and error comparisons and is associated with the generation of a "reward prediction error" that corresponds to the difference between an actual and expected reward. A second mechanism, called model-based learning, involves generation of a cognitive map of the environment that describes the relationship between different situations. Model-based learning is associated with a "state prediction error," which measures surprise in a new situation given the current estimate of the environment.
"Think about a situation in which you always take the same route when driving home after work. Then, on a particular day, the usual way is blocked due to construction work. A model-free learning system would be helplessly lost and couldn't decide where to go next. But a model-based system would be able to query its cognitive map and figure out an efficient detour," explains lead study author Dr. Jan Gläscher from the Computation and Neural Systems Program at the California Institute of Technology.
"The simpler model-free learning has been well studied, and its basic learning mechanism driven by reward prediction error is relatively well understood," continues Dr. Gläscher. "In contrast, the more sophisticated model-based learning system, with its rich adaptability and flexibility, has been sparsely studied." The researchers were interested in determining whether the human brain computes both error signals and, if so, what their different neural signatures are.
Using a specially designed a decision task combined with functional magnetic resonance imaging, the scientists observed the previously well-characterized reward prediction error signal associated with model-free learning in a part of the brain called the ventral striatum. During model-based learning, they observed a neural signature for the state prediction error in different areas of the brain, the intraparietal sulcus and the lateral prefrontal cortex.
These observations suggest that there are two unique forms of learning signals that occur in the human brain. The authors suggest that these signals may form the basis of separate computational strategies for guiding behavior. "Taken together, our findings reveal that two different error signals are computed in distinct brain areas and illustrate how human choice behavior may emerge through a combination of two unique forms of learning," concludes Dr. Gläscher.
The researchers include Jan Glascher, California Institute of Technology, Pasadena, CA, University Medical Center Hamburg-Eppendorf, Hamburg, Germany; Nathaniel Daw, New York University, NY; Peter Dayan, Gatsby Computational Neuroscience Unit, University College London, London, UK; and John P. O'Doherty, California Institute of Technology, Pasadena, CA, Trinity College Institute of Neuroscience and School of Psychology, Trinity College Dublin, Ireland.