演題詳細
Poster
報酬・意思決定
Reward and Decision Making
開催日 | 2014/9/12 |
---|---|
時間 | 14:00 - 15:00 |
会場 | Poster / Exhibition(Event Hall B) |
線条体におけるドーパミン濃度の穏やかな上昇を説明する強化学習モデルの検討
Exploring models of reinforcement learning to explain the ramping DA signal in the striatum
- P2-234
- 加藤 郁佳 / Ayaka Kato:1 森田 賢治 / Kenji Morita:2
- 1:東京大院理生物科学 / Dept Biological Sciences, School of Science, Univ of Tokyo, Tokyo, Japan 2:東京大院・教育・身体教育学 / Physical & Health Educ, Grad Sch of Educ, Univ of Tokyo, Tokyo, Japan
Based on the results of lots of experiments, midbrain dopamine (DA) neurons have been considered to (approximately) compute reward prediction error (RPE) defined in the reinforcement learning theory, which is the difference between reward obtained or expected to be obtained and reward that had been expected to be obtained. Those reward expectations have been suggested to be stored in the cortico-basal ganglia (corticostriatal) synapses and updated according to RPE through synaptic plasticity, which is induced by released DA. This DA = RPE hypothesis has been widely appreciated in studies on reward learning and value-based decision-making. However, recent work (Howe at el., Nature, 2013) has revealed a new type of DA signal that appears not to represent RPE. Specifically, striatal DA concentration was found to show a gradual increase toward the goal in a maze task, which required rats to move large distances to get reward, and it was suggested that such DA signal may account for persistent motivational state. We explored whether the ramping DA signal could still be in line with the "DA = RPE" hypothesis if the hypothesis is extended by taking into account some biological properties of the relevant neural elements. In particular, we examined effects of possible decay of DA-dependent plastic changes of synaptic strengths, in reference to the neural implementation of reinforcement learning. Through modeling and simulation of reward learning tasks, we found that incorporation of decay prominent in a few trials dramatically changes the model's behavior, causing gradual ramping of RPE (Morita & Kato, Front Neural Circuit, 2014). In terms of functional relevance, given that the baseline DA neuronal activity is not high and encoding of negative RPE by DA can be limited, synaptic decay could instead be used for flexibly reversing and updating the learned reward associations. There are, however, a number of limitations in our simplified model, including the omission of cue stimulus that was presented in the experiments. We will consider how recognition of the cue affected the behavior and DA ramping.