Reinforcement learning
Reinforcement learning is a paradigm involving iterated remapping of situations to actions with the goal of maximizing a numerical reward (Sutton & Barto 2017). Learners are not provided rules, but must instead employ repeated trials to discover relationships between actions and rewards. This framework has strong parallels to experience-based frameworks for animal learning. Indeed, a schematic of the reinforcement optimizer for a computer learning to play the game Go is broadly similar to schematics of animal behavior and learning (Table 2). In both frameworks, an agent takes actions (movements) in the environment, and the outcomes of those actions are processed by an interpreter (cognitive model), which either “rewards” or “punishes” the agent, thereby modifying its internal state and modifying its subsequent actions. Additional aspects of realism are that rewards can be short term or delayed, and that the appropriateness of actions is not provided initially but must be learned via exploration.