• Top page
  • Timetable
  • Per session
  • Per presentation
  • How to
  • Meeting Planner



Reward and Decision Making

開催日 2014/9/11
時間 16:00 - 17:00
会場 Poster / Exhibition(Event Hall B)

Heterogeneous reward signals of midbrain dopamine neurons in over-trained monkeys

  • P1-238
  • 榎本 一紀 / Kazuki Enomoto:1 松本 直幸 / Naoyuki Matsumoto:2 春野 雅彦 / Masahiko Haruno:3 木村 實 / Minoru Kimura:1 
  • 1:玉川大脳研 / Brain Science Institute, Tamagawa Univ, Tokyo, Japan 2:熊本県大環境共生 / Fac of Environmental and Symbiotic Sci, Pref Univ of Kumamoto, Kumamoto, Japan 3:情報通信研 情報通信融合研究センター / CiNet, NICT, Osaka, Japan 

In a stable environment, it is economical to treat each single event indifferently with long-term prediction of future actions and rewards. Midbrain dopamine (DA) neurons have been proposed to play critical roles in learning by encoding reward value and its prediction error in anatomically different manners. In a previous study, we reported that DA neurons learned to encode the long-term value of multiple future rewards. However, it is unclear about the DA signals when monkeys are over-trained.
In this study, monkeys learned a choice task which consisted of sub-blocks of multiple rewarding steps for 2-3 months. We recorded DA neuron activities in the pars compacta of the substantia nigra (SNc) and the ventral tegmental area (VTA) of the midbrain. In advanced stage of learning, duration of anticipatory licking of a spout of reward pipe, as an index of reward expectation, differentiated each step in a manner of long-term reward value within a block of the task estimated by the TD learning algorithm. We found that responses of DA neurons to the task-start cue in dorsolateral midbrain were well estimated by the expected values of multiple future rewards with smaller discount factors than those in ventromedial midbrain. When monkeys learned the steps and blocks for more than 60 days, the lickings became indifferent to each step. Consistently, DA responses to the task-start cue evolved to reduce differences between steps: response magnitude in the step with the lowest reward probability was not significantly different from that in the step with the highest probability. These responses were well estimated with TD learning algorithm in which monkeys are supposed to estimate future rewards across block of the task. We observed the same tendencies of DA responses to conditional stimuli in the classical conditioning paradigm in which reward probability increased step by step and thus future rewards were easily predictable.
These results suggest a heterogeneous role of DA in a well-learned environment: signaling event-indifferent, hyper long term (minimum discounted) reward information that facilitate long-term decisions in anatomically different manners.

Copyright © Neuroscience2014. All Right Reserved.