LEARNING AND SPACE USE: CONNECTIONS TO OTHER DISCIPLINES
We distinguish two fundamental constructs for learning in conjunction with animal movement: updating the world model and building a new world model . To understand the difference between these, it helps to assume that the animal has a cognitive model of the world (\(\hat{Q}\)) and a set of “policy rules” (\(\beta\)) for mapping conditions, including the snapshot of that cognitive model and the state or priorities of the animal, into outcomes, in particular movement decisions. The policy rules can be thought of as the coefficients of a function governing outcomes in terms of conditions. Within this construct, updating the world model refers to the process of movement through a world, acquiring and storing information about the world, updating the world model \(\hat{Q}\), and acting upon that knowledge according to the fixed set of policy rules \(\beta\). The learning process itself is limited to updating the world model. Note that this kind of learning is only meaningful if the world itself is dynamic, with resources or threats moving or depleting and regenerating in a way that makes it beneficial, or even necessary, to update expectations rather than navigate with an essentially fixed map. When confronted with a new world, either via dispersal, translocation, or a significant perturbation to the existing world, the very structure of the world model and the policy rules both need to be adjusted bybuilding a new world model . These two fundamental kinds of learning are schematized in Figure 2 where an elk’s movement among three dynamic patches constant updating of information (updating the world model ), a process with relies on moving between those patches. But when a patch is significantly perturbed, or becomes unusable in a novel way, the fundamental structure of the world needs to be altered (building a new world model ), and novel policy rules to govern interaction with novel elements needs to be developed.
The main distinction between updating the world model andbuilding a new world model appears in a slightly different form in the machine-learning literature, where the two kinds of learning are labelled as base-level and meta-level. Specifically,“The base-level learning problem is the problem of learning functions, just like regular supervised learning. The meta-level learning problem is the problem of learning properties of functions, i.e., learning entire function spaces” (Thrun and Pratt 1998). The function spaces in our analogy comprise \(\hat{Q}\), whereas the learning functions are the coefficients \(\beta\). In the neurosciences, the terms model-based and model-free reinforcement learning are used in analogy with base-level and meta-level learning (Doll et al . 2012).
It is also interesting to note that complex behaviors that appear to involve decision-making can arise from other mechanisms of self-organized behavior. Self-organization occurs when simple rules lead to complex behavior (Gros 2015). A prominent theoretical example is cellular automata whereby a specific rule set, such as “the game of life,” gives rise to agent-like configurations that may travel, replicate, and combine. Self-organized robots (Box 3) can exhibit emergent behavior, such as autonomous direction reversal, which an external observer could mistakenly interpret as decision-making (Kubandtet al . 2019). Because self-organization is not purposeful, an agent solely based on self-organizational principles will not be able to improve, or to “learn” its score in a given task.