Instructional control of reinforcement learning: A behavioral and neurocomputational investigation

Bradley B. Doll, William J Jacobs, Alan G. Sanfey, Michael J. Frank

Research output: Contribution to journalArticle

122 Citations (Scopus)

Abstract

Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.

Original languageEnglish (US)
Pages (from-to)74-94
Number of pages21
JournalBrain Research
Volume1299
DOIs
StatePublished - Nov 3 2009

Fingerprint

Learning
Behavior Control
Prefrontal Cortex
Hippocampus
Choice Behavior
Corpus Striatum
Neuroimaging
Cognition
Reinforcement (Psychology)
Research

Keywords

  • Basal ganglia
  • Dopamine
  • Reinforcement learning
  • Reward
  • Rule-governance

ASJC Scopus subject areas

  • Neuroscience(all)
  • Clinical Neurology
  • Developmental Biology
  • Molecular Biology

Cite this

Instructional control of reinforcement learning : A behavioral and neurocomputational investigation. / Doll, Bradley B.; Jacobs, William J; Sanfey, Alan G.; Frank, Michael J.

In: Brain Research, Vol. 1299, 03.11.2009, p. 74-94.

Research output: Contribution to journalArticle

Doll, Bradley B. ; Jacobs, William J ; Sanfey, Alan G. ; Frank, Michael J. / Instructional control of reinforcement learning : A behavioral and neurocomputational investigation. In: Brain Research. 2009 ; Vol. 1299. pp. 74-94.
@article{efc9caad81aa4748806b1918d5478e11,
title = "Instructional control of reinforcement learning: A behavioral and neurocomputational investigation",
abstract = "Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is {"}overridden{"} at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract {"}Q-learning{"} and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a {"}confirmation bias{"} in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.",
keywords = "Basal ganglia, Dopamine, Reinforcement learning, Reward, Rule-governance",
author = "Doll, {Bradley B.} and Jacobs, {William J} and Sanfey, {Alan G.} and Frank, {Michael J.}",
year = "2009",
month = "11",
day = "3",
doi = "10.1016/j.brainres.2009.07.007",
language = "English (US)",
volume = "1299",
pages = "74--94",
journal = "Brain Research",
issn = "0006-8993",
publisher = "Elsevier",

}

TY - JOUR

T1 - Instructional control of reinforcement learning

T2 - A behavioral and neurocomputational investigation

AU - Doll, Bradley B.

AU - Jacobs, William J

AU - Sanfey, Alan G.

AU - Frank, Michael J.

PY - 2009/11/3

Y1 - 2009/11/3

N2 - Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.

AB - Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, S. (Ed.). 1989. Rule-governed behavior: cognition, contingencies, and instructional control. Plenum Press.). Here we examine the control of behavior through instructions in a reinforcement learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is "overridden" at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract "Q-learning" and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a "confirmation bias" in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.

KW - Basal ganglia

KW - Dopamine

KW - Reinforcement learning

KW - Reward

KW - Rule-governance

UR - http://www.scopus.com/inward/record.url?scp=70449715719&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70449715719&partnerID=8YFLogxK

U2 - 10.1016/j.brainres.2009.07.007

DO - 10.1016/j.brainres.2009.07.007

M3 - Article

C2 - 19595993

AN - SCOPUS:70449715719

VL - 1299

SP - 74

EP - 94

JO - Brain Research

JF - Brain Research

SN - 0006-8993

ER -