Reinforcement learning
Reinforcement learning is a paradigm involving iterated remapping of
situations to actions with the goal of maximizing a numerical reward
(Sutton & Barto 2017). Learners are not provided rules, but must
instead employ repeated trials to discover relationships between actions
and rewards. This framework has strong parallels to experience-based
frameworks for animal learning. Indeed, a schematic of the reinforcement
optimizer for a computer learning to play the game Go is broadly similar
to schematics of animal behavior and learning (Table 2). In both
frameworks, an agent takes actions (movements) in the environment, and
the outcomes of those actions are processed by an interpreter (cognitive
model), which either “rewards” or “punishes” the agent, thereby
modifying its internal state and modifying its subsequent actions.
Additional aspects of realism are that rewards can be short term or
delayed, and that the appropriateness of actions is not provided
initially but must be learned via exploration.