• Top page
  • Timetable
  • Per session
  • Per presentation
  • How to
  • Meeting Planner



Reward and Decision Making

開催日 2014/9/12
時間 11:00 - 12:00
会場 Poster / Exhibition(Event Hall B)

Dissociation of working memory-based and value-based strategies in a free-choice task

  • P2-229
  • 伊藤 真 / Makoto Ito:1 吉澤 知彦 / Tomohiko Yoshizawa:1,2 銅谷 賢治 / Kenji Doya:1,2 
  • 1:沖縄科学技術大学院大学 / Neural Computation Unit, OIST, Okinawa, Japan 2:奈良先端科学技術大学院大学 / NAIST, Nara, Japan 

Value-based decision strategies, such as Q-learning, has been utilized to analyze the neuronal basis of decision making. The hypotheses that value-based strategies are implemented in the cortico-basal ganglia loops have been supported by reports of neural activities correlated with action values. However, animals often show different strategies, such as the win-stay, lose-switch strategy (WSLS); after a rewarded trial, the same action is selected, otherwise other action is selected.
In this study, we hypothesized that WSLS is employed when the working memory (WM) is readily usable and value-based strategy is employed when WM is hard to use.
To test our hypothesis, we examined rats' choice behavior and action signal in a cortical motor-output area, the primary motor cortex (M1), in a free-choice task with WM interference.
The task was started by the presentation of "choice tone" or "no-choice tone". After choice tone, a rat was required to perform a nose poke into ether left or right hole (choice trials), then a food pellet was delivered probabilistically depending on his choice (e.g., left = 75%, right = 25%). For no-choice tone, rat was required not to perform any nose pokes (no-choice trials).
We compared rats' choice strategies in the interfere condition (IC), in which no-choice trials were inserted between every choice trial, and the control condition (CC) consisting of only choice trials. The reward probabilities were reversed after several tens of choice trials. In IC, the rats needed several choice trials to adapt to the reversed reward probabilities while the adaptation in CC needed only single trial. The strategy in CC could be explained by WSLS with noise, while action probabilities in IC changed gradually by past experience, consistently with value-based strategy with small learning rate.
Before the start of choice action, 39% of neurons (66/169) in M1 coded upcoming action. In 32% of them (21/66), the firing rate immediately before action execution significantly differed between IC and CC. These results support our hypothesis and suggest that the action command signals generated by WM-based and value-based strategies are differentially coded M1.

Copyright © Neuroscience2014. All Right Reserved.