Keywords: self-generation, multisensory associations, active learning, gaze-controlled interface

Introduction

We learn faster when we are actively engaged with the material – this is not just a folk wisdom, but it has been reproduced in a plethora of experimental settings (Markant et al., 2016). Active learning benefits for memory are, however, a very diverse phenomenon, and in order to study their cognitive and neural underpinnings, we have to choose a focus. In this study, we have asked ourselves whether memory gain through active learning could be linked to the previously established differences in the neural and perceptual processing of self- versus externally generated stimuli (Baess et al., 2011; Blakemore et al., 1998, 2000; Schäfer & Marcus, 1973). Specifically, we decided to study the effects of being in control over auditory stimuli on the learning progress of visuo-auditory contingencies, in an experimental setup that controls for the conflating factors of predictability and movement. We designed a variation of the classic self-generation paradigm (Schäfer & Marcus, 1973) using eye movement sonification. In a memory task, participants learned associations between movement-sound pairs. Their learning progress was tracked across several stages of learning on a behavioural and neural level.
Our first aim was to test whether actively controlling stimuli, beyond effects of movement and predictability, would lead to associative learning benefits on a behavioural level. Previous studies have shown that active control during complex tasks such as spatial navigation, as well as simpler experimental setups such as recognition memory tasks, can facilitate learning (Harman et al., 1999; James et al., 2002; Plancher et al., 2013). A related, somewhat more clearly defined phenomenon is the “production effect”: Stimuli produced by oneself are remembered better than externally produced stimuli (Brown & Palmer, 2012; MacLeod et al., 2010). Even minimal amounts of control, such as controlling the pacing of information, have been found to improve memory (Markant et al., 2014). The effects of control are easily conflated with the effects of movement during learning, since in most studies on this question, participants use hand movements in order to control stimuli in the active condition, while not moving at all in the passive condition (Craddock et al., 2011; Harman et al., 1999; Liu et al., 2007; Luursema & Verwey, 2011; Meijer & Van der Lubbe, 2011). Nevertheless, some studies have found memory benefits for active learning even when controlling for the factor of movement (Plancher et al., 2013; Trewartha et al., 2015). Theoretical and experimental accounts of the role of choice during learning suggest that controlling the flow – the pacing, the order – of information is crucial for the memory gain, since the learner is able to develop hypotheses and test them, or revisit items that they feel unsure about (Gureckis & Markant, 2012; Kruschke, 2008; Markant et al., 2016; Markant & Gureckis, 2010; Schulze et al., 2012). This is corroborated by the fact that motor activity unrelated to strategic control over the learning strategy does not improve memory performance (Voss et al., 2011). In order to get a better view of the role of control in the learning process of arbitrary motor-auditory contingencies, we developed a learning paradigm in which participants had to return several times to the same set of stimuli and were tested on their memory performance in between rounds of learning. We hypothesised that we would encounter a memory advantage for stimuli learned under active exploration. We expect that this memory advantage will express itself in participants learning the associations faster in the active condition.
A second aim of this study was to investigate the neural mechanisms underlying the putative memory benefits for active learning. To that aim, we isolated neurophysiological effects of control over acoustic stimuli from unspecific neuromodulatory effects caused by movements and effects of stimulus predictability. Control during stimulus generation could modulate brain responses at different levels of learning, and we probed a series of possible mechanisms.
Control over stimuli, accompanied by a Sense of Agency (SoA), is known to impact stimulus processing. Electrophysiological responses to self-generated stimuli tend to be attenuated relative to externally generated stimuli, even when the stimuli evoking the response are physically identical (Blakemore et al., 2000; Gentsch & Schütz-Bosbach, 2011; Hughes et al., 2013b; Hughes & Waszak, 2011; Kilteni et al., 2020; Mifsud et al., 2018; SanMiguel et al., 2013). Although they can be observed in all sensory modalities (auditory: Baess et al., 2011; visual: Hughes & Waszak, 2011; tactile: Kilteni et al., 2020, for some examples), attenuation effects on sensory processing have been extensively studied in the auditory domain, often comparing evoked electrophysiological responses to self-generated and externally-generated acoustic stimuli (Horváth, 2015; Schäfer & Marcus, 1973). Using electroencephalogram (EEG), a number of neuro-electrical markers of self-generated processing have been established: An attenuation of certain event-related potentials (ERPs), i.e. a diminished amplitude of different peaks that characterise the early cortical processing of self- as opposed to externally generated sounds. Attenuation for self-generated sounds has been observed in the N1 component (Bäß et al., 2008; Elijah et al., 2018; Mifsud et al., 2016; Neszmélyi & Horváth, 2017; Oestreich et al., 2016; Pinheiro et al., 2019; van Elk et al., 2014), the P2 component (Horváth & Burgyán, 2013; Knolle et al., 2012), and the Tb component (Paraskevoudi & SanMiguel, 2022; SanMiguel et al., 2013; Saupe et al., 2013). The nature of these effects is often assumed to be predictive, since efference copies of motor commands are thought to serve as a basis for precise anticipation of sensory stimulation (Miall & Wolpert, 1996). Correctly predicted sensory stimulation is thought to elicit smaller neural responses than wrongly predicted or surprising input, in line with the predictive coding theory of neural processing (Blakemore et al., 1998; Kilner et al., 2007). However, previous research has shown that motor activity during sensory processing also has unspecific modulatory effects that are not related to predictability – just being in motion affects the way we perceive stimuli (Horváth et al., 2012), and movement effects can be a conflating factor when trying to study the effects of predictability and control (Hazemann et al., 1975; Horváth, 2013; Paraskevoudi & SanMiguel, 2021; Press & Cook, 2015). Recent studies that investigated specifically effects of agency, controlling for predictability and movement, have found both attenuation and enhancement effects on the P2 component (Bolt & Loehr, 2021; Han et al., 2021), and modulations of the P3 component have also been observed (Burnside et al., 2019; Kühn et al., 2011). We hypothesized that if the known effects typically observed in the self-generation paradigms on the N1, P2 and P3 component are indeed related to agency and control, we should be able to reproduce them with our design, even though we used an unconventional experimental paradigm: Instead of hand or finger movements, participants used their eye movements to generate sounds. By using a gaze-controlled interface, we were able to compare an experimental condition in which participants controlled a cursor using their eye movements (“agent condition”) with a condition in which participantsfollowed a cursor with their gaze (“observer condition”), minimizing the motor differences between conditions. Eye movements are mostly automatic and usually used towards visual goals, and we have no expectations of auditory consequences of our eye movements (Mifsud & Whitford, 2017; Slobodenyuk, 2016). Importantly, two studies using self-generation paradigms have used saccades to generate sounds and found either no attenuation for eye-movement initiated sounds (Mifsud & Whitford, 2017) or weakened attenuation of the N1, but not the P2 component (Mifsud et al., 2016). Electrophysiological responses to gaze fixations have been measured in the context of brain-computer interfaces and gaze-controlled games (Ihme & Zander, 2011; Protzak et al., 2013), and certain markers of voluntary gaze control have been established: Voluntary gaze fixations that were made consciously in order to control an interface were characterised by a slow negative parieto-occipital wave evoked by the fixation which was absent or much decreased in fixations that did not control the interface (Protzak et al., 2013; Shishkin et al., 2016).
Rather than control affecting stimulus processing on a basic level, we also considered the possibility that control would specifically modulate learning processes. Repeated presentation of a given movement-sound pair, as was the case in our paradigm, leads to neural changes over time related to the learning progress – we develop internal models of the associations that we have learned (Kilner et al., 2007), and the sound’s predictability based on the preceding movement increases gradually. Effects of predictability on ERP components strongly resemble those of self-generation: predictability often leads to sensory attenuation (Alink et al., 2010; Grotheer & Kovács, 2016; Kaiser & Schütz-Bosbach, 2018; Summerfield et al., 2008), and in fact sensory attenuation for self-generated stimuli is more pronounced when the outcome of the self-generated action matches the agents’ expectation (Hughes et al., 2013a; Stenner et al., 2014). Controlling for temporal predictability can help us to understand the functional separation of modulations of established ERP components by self-generation (Klaffehn et al., 2019). By studying the evolution of ERP components in relation to learning we can shed light on the effects of increased predictability beyond the self-generation effects, which should be observable from the start of the learning process. In line with previous studies, we expected to find an increased attenuation of the N1 (Kaiser & Schütz-Bosbach, 2018) and P2 component during late stages of learning. Furthermore, modulations of the P3 component – with less clear directionality – have been observed as a function of learning (Polich, 2007; Turk et al., 2018). If control was to facilitate learning progress, we would expect stronger or earlier effects of learning when participants have control over the stimuli.
Further insight into the neural mechanisms behind the active learning memory advantage can be gained by studying evoked responses to incongruent sounds. In our paradigm, participants are regularly tested on their memory of movement-sound pairs; in those test trials, they are required to passively observe a cursor movement and listen to a sound, and judge whether the two are a matching pair or not, based on their previously learned associations. We hypothesised that control during acquisition strengthens the internal representation of the movement-sound association, so violations of the latter should elicit larger prediction error signals (Knolle et al., 2013; Mathias et al., 2015). Based on the previous literature, we expected incongruent stimuli to elicit mismatch responses like the N200 or an orienting response like the P3a (Knolle et al., 2013; Winkler et al., 2009). Alternatively, sounds congruent with learned associations can elicit “matching” responses: The P3b component in particular is thought to reflect the matching of a stimulus with a predicted item, and has been found to be larger with increased predictability (Molinaro & Carreiras, 2010; Roehm et al., 2007; Vespignani et al., 2010). This component is also referred to as “late positive component” (LPC), which is believed to reflect an explicit recollective process (Friedman & Johnson Jr., 2000), typically elicited by designs in which participants have to make a response related to the stimulus (Yang et al., 2019). It is considered part of the classical “old/new” effect; stimuli presented in a test phase which appear familiar to the participant elicit a stronger LPC (Woodruff et al., 2006). The LPC has been found to be a predictor of learning outcomes (Turk et al., 2018). We expected the strength of either the matching-responses to correctly predicted or the mismatch responses to incongruent sounds to be modulated by the factor of control during the learning phase.
Once a motor-auditory association is established, the increased predictability of sounds that comes with learning should affect sound processing similarly regardless of whether sounds are presented during learning or during a test trial: we expected that sensory responses – specifically the N1 and P2 component – would be attenuated during late stages of learning, and we hypothesised that this effect could be modulated by the mode of acquisition of the motor-auditory associations. Previous studies have shown that during memory tests, stimuli that were previously self-generated can cause motor-reactivation even in the absence of movememt (Butler et al., 2011). The distinctiveness account of the production effect (Hommel, 2005) suggests that motor activation during learning builds stronger, more distinctive memory traces, which is thought to be reflected in more efficient learning; how we learned something affects how we will process it in the future. Movement during sound processing affects our memory of the sound, but is it necessary for the movement to be causally linked to the sound in order for this effect to come into play? If the latter was indeed necessary, we would expect to see an effect of agency – rather than movement – on the neural processing of the stimulus or the strength of the memory trace. Alternatively, if we do not find modulations by agency, that would give support to the idea that movement does not need to be causal to the stimulus in order to affect its processing or memory encoding (Horváth et al., 2012).
In the present study, our goal was to improve our understanding of how active control over sound stimuli affects their immediate sensory processing and encoding in memory. Towards these aims, we studied the way in which control during learning improves memory, and how it modulates neural responses during sound processing and memory encoding. Last but not least, our goal was to reveal a link between self-generation effects during sound processing and memory benefits of active control.

Methods

Participants

Twenty-five healthy undergraduate university students from the University of Barcelona volunteered in the study. Two participants were excluded from the analysis due to their low behavioural performance, based on a cut-off point determined by simulating the responses of 25 randomly responding individuals and choosing the highest performing one as the threshold (56% correctness in the behavioural task). The final sample included twenty-three participants (14 women, M = 21 years old, range: 18–31). No participant self-reported any hearing impairment, psychiatric disorder or use of nervous system-affecting substances at least 48 hours prior to the experiment. All participants gave written informed consent for their participation after the nature of the study was explained to them and they were monetarily compensated (10 Medical Association (Declaration of Helsinki) with the exception of pre-registration and was accepted by the Bioethics Committee of the University of Barcelona.

Experiment design

Experiment structure
The experiment consisted of two types of trials: acquisition trials and test trials. During acquisition trials, participants had 20 s to learn associations between movement directions of a white cursor over a grid of 9 red squares (Fig. 1), and 8 different sounds that were played depending on the cursor movements (see section on sound generation below). During test trials, participants were tested on their memory for the movement-sound associations.
The movement-sound associations were learned either as agents or as observers. Agent and observer experimental conditions differed only during acquisition trials. During acquisition trials in the agent condition, the cursor was controlled by the participant’s gaze, while in the observer condition the cursor was animated by the computer. Thus, in the agent condition, the acquisition process required active exploration. Participants were instructed to perform saccades over the squares and generate as many different sounds as possible. In the observer condition, the cursor was animated using previously recorded eye movements from the same participant, and participants were asked to follow the cursor’s movements and memorise the relationships between movements and sounds.
Following each acquisition trial, participants were tested on their memory of the movement-sound associations in a series of 6 test trials. During test trials, participants were presented with a short animation of the cursor moving from one square to another in a straight line (executing one of the 8 possible movements). After a delay of 750 ms (matching the pattern of acquisition trials, see section “Visual stimulation and gaze-controlled sound generation”), one of the 8 sounds familiar from acquisition, either congruent or incongruent with the previously learned associations, was presented. 50% of test trials presented congruent movement-sound pairs. The order of the animations and sounds was based on a computer-generated, randomised list. At the end of each test trial, participants responded whether the movement and sound were a congruent pair by pressing one out of two buttons on a midi keyboard placed in front of them.
One acquisition trial and 6 test trials were considered a “learning block”. During 7 consecutive learning blocks, participants were presented with the same movement-sound associations. Groups of 7 learning blocks with contingent movement-sound associations are referred to here as “contingency blocks”. After the termination of the 7th learning block, the contingency block was finished and new sounds were loaded, so participants had to start their learning process anew. Contingency blocks alternated between the agent and observer conditions. The order of the conditions was counterbalanced across participants.