REVIEWS MULTIPLE REWARD SIGNALS IN THE BRAIN Wolfram Schultz The fundamental biological importance of rewards has created an increasing interest in the neuronal processing of reward information The suggestion that the mechanisms underlying drug addiction might involve natural reward systems has also stimulated interest This article focuses on recent neurophysiological studies in primates that have revealed that neurons in a limited number of brain structures carry specific signals about past and future rewards This research provides the first step towards an understanding of how rewards influence behaviour before they are received and how the brain might use reward information to control learning and goal directed behaviour GOAL DIRECTED BEHAVIOUR Behaviour controlled by representation of a goal or an understanding of a causal relationship between behaviour and attainment of a goal REINFORCERS Positive reinforcers rewards increase the frequency of behaviour leading to their acquisition Negative reinforcers punishers decrease the frequency of behaviour leading to their encounter and increase the frequency of behaviour leading to their avoidance Institute of Physiology and Program in Neuroscience University of Fribourg CH 1700 Fribourg Switzerland e mail Wolfram Schultz unifr ch The fundamental role of reward in the survival and wellbeing of biological agents ranges from the control of vegetative functions to the organization of voluntary GOAL DIRECTED BEHAVIOUR The control of behaviour requires the extraction of reward information from a large variety of stimuli and events This information concerns the presence and values of rewards their predictability and accessibility and the numerous methods and costs associated with attaining them Various experimental approaches including brain lesions psychopharmacology electrical self stimulation and the administration of addictive drugs have helped to determine the crucial structures involved in reward processing1 4 In addition physiological methods such as in vivo microdialysis voltammetry5 9 and neural imaging10 12 have been used to probe the structures and neurotransmitters that are involved in processing reward information in the brain However I believe that the temporal constraints imposed by the nature of the reward signals themselves might be best met by studying the activity of single neurons in behaving animals and it is this approach that forms the basis of this article Here I describe how neurons detect rewards learn to predict future rewards from past experience and use reward information to learn choose prepare and execute goaldirected behaviour FIG 1 I also attempt to place the processing of drug rewards within a general framework of neuronal reward mechanisms Behavioural functions of rewards Given the dynamic nature of the interactions between complex organisms and the environment it is not surprising that specific neural mechanisms have evolved that not only detect the presence of rewarding stimuli but also predict their occurrence on the basis of representations formed by past experience Through these mechanisms rewards have come to be implicit or explicit goals for increasingly voluntary and intentional forms of behaviour that are likely to lead to the acquisition of goal objects Rewards have several basic functions A common view is that rewards induce subjective feelings of pleasure and contribute to positive emotions Unfortunately this function can only be investigated with difficulty in experimental animals Rewards can also act as positive REINFORCERS by increasing the frequency and intensity of behaviour that leads to the acquisition of goal objects as described in CLASSICAL and INSTRUMENTAL CONDITIONING PROCEDURES Rewards can also maintain learned behaviour by preventing EXTINCTION13 The rate of learning depends on the discrepancy between the occurrence of reward and the predicted occurrence of reward the so called reward prediction error 14 16 BOX 1 Rewards can also act as goals in their own right and can therefore elicit approach and consummatory behaviour Objects that signal rewards are labelled with positive MOTIVATIONAL VALUE because they will elicit effortful NATURE REVIEWS NEUROSCIENCE VOLUME 1 DECEMBER 2000 1 9 9 2000 Macmillan Magazines Ltd REVIEWS PAVLOVIAN CLASSICAL CONDITIONING Learning a predictive relationship between a stimulus and a reinforcer does not require an action by the agent OPERANT INSTRUMENTAL CONDITIONING Learning a relationship between a stimulus an action and a reinforcer conditional on an action by the agent behavioural responses These motivational values arise either through innate mechanisms or more often through learning In this way rewards help to establish value systems for behaviour and serve as key references for behavioural decisions Reward detection Reward prediction Goal representation Medial temporal cortex Dorsolateral prefrontal premotor parietal cortex EXTINCTION Reduction and cessation of a predictive relationship and behaviour following the omission of a reinforcer negative prediction error Relative reward value Reward expectation Orbitofrontal cortex MOTIVATIONAL VALUE A measure of the effort an agent is willing to expend to obtain an object signalling reward or to avoid an object signalling punishment Thalamus Striatum Reward detection Goal representation Dopamine neurons Amygdala SNpr GP Reward prediction Error detection Figure 1 Reward processing and the brain Many reward signals are processed by the brain including those that are responsible for the detection of past rewards the prediction and expectation of future rewards and the use of information about future rewards to control goal directed behaviour SNpr substantia nigra pars reticulata GP globus pallidus Box 1 Reward prediction errors Behavioural studies show that reward directed learning depends on the predictability Maintain Perform No current Error behavioural of the reward14 16 The connectivity reaction occurrence of the reward has to be surprising or Yes unpredicted for a stimulus or action to be learned The Modify Generate degree to which a reward current error signal connectivity cannot be predicted is indicated by the discrepancy between the reward obtained for a given behavioural action and the reward that was predicted to occur as a result of that action This is known as the prediction error and underlies a class of error driven learning mechanisms33 In simple terms if a reward occurs unpredictably
View Full Document
Unlocking...