LEARNING AND SPACE USE: CONNECTIONS TO OTHER DISCIPLINES
We distinguish two fundamental constructs for learning in conjunction
with animal movement: updating the world model and building
a new world model . To understand the difference between these, it helps
to assume that the animal has a cognitive model of the world
(\(\hat{Q}\)) and a set of “policy rules” (\(\beta\)) for mapping
conditions, including the snapshot of that cognitive model and the state
or priorities of the animal, into outcomes, in particular movement
decisions. The policy rules can be thought of as the coefficients of a
function governing outcomes in terms of conditions. Within this
construct, updating the world model refers to the process of
movement through a world, acquiring and storing information about the
world, updating the world model \(\hat{Q}\), and acting upon that
knowledge according to the fixed set of policy rules \(\beta\). The
learning process itself is limited to updating the world model. Note
that this kind of learning is only meaningful if the world itself is
dynamic, with resources or threats moving or depleting and regenerating
in a way that makes it beneficial, or even necessary, to update
expectations rather than navigate with an essentially fixed map. When
confronted with a new world, either via dispersal, translocation, or a
significant perturbation to the existing world, the very structure of
the world model and the policy rules both need to be adjusted bybuilding a new world model . These two fundamental kinds of
learning are schematized in Figure 2 where an elk’s movement among three
dynamic patches constant updating of information (updating the
world model ), a process with relies on moving between those patches.
But when a patch is significantly perturbed, or becomes unusable in a
novel way, the fundamental structure of the world needs to be altered
(building a new world model ), and novel policy rules to govern
interaction with novel elements needs to be developed.
The main distinction between updating the world model andbuilding a new world model appears in a slightly different form
in the machine-learning literature, where the two kinds of learning are
labelled as base-level and meta-level. Specifically,“The base-level learning problem is the problem of learning
functions, just like regular supervised learning. The meta-level
learning problem is the problem of learning properties of functions,
i.e., learning entire function spaces” (Thrun and Pratt 1998). The
function spaces in our analogy comprise \(\hat{Q}\), whereas the
learning functions are the coefficients \(\beta\). In the neurosciences,
the terms model-based and model-free reinforcement learning are used in
analogy with base-level and meta-level learning (Doll et al .
2012).
It is also interesting to note that complex behaviors that appear to
involve decision-making can arise from other mechanisms of
self-organized behavior. Self-organization occurs when simple rules lead
to complex behavior (Gros 2015). A prominent theoretical example is
cellular automata whereby a specific rule set, such as “the game of
life,” gives rise to agent-like configurations that may travel,
replicate, and combine. Self-organized robots (Box 3) can exhibit
emergent behavior, such as autonomous direction reversal, which an
external observer could mistakenly interpret as decision-making (Kubandtet al . 2019). Because self-organization is not purposeful, an
agent solely based on self-organizational principles will not be able to
improve, or to “learn” its score in a given task.