Design process

From the first prototypes LANGA has been iteratively improved based both on feedback from researchers and users. For example, one of the most important aspects that has been modified upon users’ feedback was the threshold used by the SRE for determining whether a word was correct or not. In fact, in particular the words containing phonemes that are not present in the English language were found to be hard to pronounce well enough to receive positive feedback by the SRE because the threshold was originally too high. This feature consistently slowed down users’ progress with the games even if they learned the vocabulary items with an acceptable degree of proficiency. Other fundamental changes were the removal of cognates —- words that have an high degree of similarity between the native and target languages (and in this case, because our participants were Canadians who had learned basic French vocabulary in the public school system, Spanish-French cognates were also excluded). The motivation for this is that cognates. are too easy to learn and might not  reflect the effectiveness of the software so much as the existence of an already established map between the phonetic and semantic forms of the word in a known language. We introduced the possibility for the user to see and hear written/spoken form of the word after the teaching game, avoiding them having to reload the teaching game to refresh the item. Finally we introduced a preliminary session before the beginning of the training where users could practice the games and become familiar with their mechanics, allowing them to focus their attention only on the material when starting the training. Another advancement was accelerated development of the  Authoring Tool, which allows researchers to add new curricula and teaching items (both novel words, and entire new languages by adding phonetic forms and spoken examples corresponding to items already in the training database).

Empirical validation of effectiveness

In this section we present an experiment were we to tested the effectiveness of LANGA in teaching L2 vocabulary through two different training strategies. Participants followed the 10-day curriculum detailed in Timeline of the general training plan (top panel) ..., and their learning was assessed using both behavioral and EEG neuroimaging methods. In this regard, besides using two well-established behavioral tasks such as naming and forced-choice to assess learning outcomes, we also looked at changes in the amplitude of the N400. The N400 is an ERP component that is thought to reflect specific dynamics of semantic processing, in particular accessing the meaning of a word and/or integrating the meaning into context Lau 2008, Kutas 2011. For example, in the case of a semantic-priming task (the one used in this study as part of the forced-choice task) where a word presented visually might or might not be followed by the corresponding (congruent) auditory version, the ERPs elicited by the incongruent and congruent stimuli will present differences in amplitude that are generally observable in the time range between  300-500 msec and peaking around 400 msec after onset of the auditory stimuli. This difference is thought to reflect a facilitatory effect wherein  the visual stimuli primes lexical-semantic representations of the respective word. Under such condition, decoding auditory features and phonological structure of the auditory stimuli to retrieve its semantic form will be faster and less effortful when the stimuli is congruent than incongruent, and the degree of cognitive effort is proportional to the difference in the amplitude of the ERP component Lau 2008, Kutas 2011.
For these reasons, analysis of changes of the N400 component across different stages of learning offers a unique perspective into the dynamics underlying acquisition and maintanance of L2 vocabulary under different teaching strategies.

Methods

Subjects

Twelve subjects (four males; mean age = 23.5 years, SD = 3.57) took part to the experiment and received small financial compensation. All participants except one showed strong right-hand dominance as determined by the Handedness Inventory (Oldfield 1971). All participants had normal hearing and normal or corrected-to-normal vision, and no known neurodevelopmental, neurological or psychiatric disorders. Participants were all native English and reported little or no prior exposure to Spanish; all participants were given a pre-test for knowledge of the Spanish words to be taught in the study and anyone who correctly named more than 3 items was excluded from the study. Nine participants reported speaking a second language fluently (four French; one Urdu; one Polish; one American Sign Language); four reported knowledge of a third language (French, German, Punjabi and Korean); two reported knowledge of a fourth language (Italian and Arabic) and finally, only one participant reported knowledge of a fifth language (French)). All the participants gave informed consent according to the Declaration of Helsinki; all procedures were approved by the Dalhousie University Research Ethics Board.

Stimuli

Participants were taught 72 Spanish words (48 nouns and 24 verbs). Half the words (24 nouns and 12 verbs) were  taught using the rote method and the other half via the inferential method; words were divided into  two sets so that half of the participants learned one set via the rote method, and the other half of participants learned that set via the inferential method, and vice-versa. As mentioned before, Spanish-English and Spanish-French cognates words (i.e. el léon, the lion), were excluded from the curricula. Both transitive (i.e. sacude, to shake) and intransitive verbs (i.e. salta, to jump) were included in the training material. Spoken versions of training words were recorded from a male native Spanish speaker. Each noun was preceded by a definite determiner, (el for masculine nouns and la for feminine nouns; with 26 masculine and 22 feminine nouns in total). Verbs were produced in the third person singular, present tense.  
Pictures used during training were drawn by a professional artist and all followed the same “cartoon” style. These were developed with iterative feedback from the research team so that each picture was deemed to readily illustrate the desired concept. For testing, a different set of pictures were used. These consisted of freely-available black and white line drawings corresponding to each training item. The motivation for using separate items for training and testing was to ensure that learners were associating words with semantic concepts, rather than simply with the particular visual stimuli used during training.

Procedure

Spanish proficiency was assessed three times _ prior to learning, the day after the last training session was completed, and approximately 10 days after the last training session. Proficiency was assessed using two tests: a naming test and a forced-choice recognition task; EEG was recorded during the latter task. In both tests, the order of items was randomized, and items taught via the rote and inferential methods were intermixed. All testing was conducted in a room insulated from external acoustic and electrical noise. Subjects were asked to sit on a chair positioned in front of a computer monitor for all testing.
During the naming task, participants wore a set of headphones with microphone and held a gamepad with both hands, so that they could press a right or left button with respective fingers. The task consisted of 72 trials, one per noun/verb belonging to the training set. During each trial, a picture corresponding to one of the words taught during the training appeared on the screen. Pictures were initially surrounded by a red line, indicating that the recording function was off. If the participant believed they knew the corresponding Spanish word, they were instructed to press the one button on the gamepad to activate the recording function. There was no constraint on how much time subjects could take before deciding to say the word. Once the proper button was pressed, a transition from red to green line surrounding the picture signalled that the recording function was on, and from that moment subjects had three seconds to say the word. Recorded files were later scored for accuracy. In case they did not know the word, participants pressed an alternative button to advance directly to the next trial.
During the forced-choice recognition task (during which EEG was also recorded; see below), participants saw a series of 144 trials. On each trial, a picture was shown and after 1 s a spoken word from the training set was played. On half the trials, the word matched the picture; on the other half, the word was a different word that had been taught using the same approach (rote or inferential). Upon hearing the word, participants had to indicate via button press whether the word matched the picture or not.

EEG/ERP recording

EEG/ERPs were recorded during the forced-choice recognition task. We used a Brain Products V-Amp/ActiCap 16 channel EEG system, with a sampling rate of 512 Hz and online 100 Hz lowpass filtering; electrode impedance was kept below 20 kOhm during the recording.  Electrodes were placed at the following locations according to the International 10-10 System: Fp1, Fp2, F7, F3, Fz, F4, F8, C3, Cz, C4, TP9, P3, Pz, P4, TP10, Oz. EEG data were processed offline using the EEGLAB toolbox version XXX Arnaud Delorme 2004 for running in on Matlab (Mathworks), version 2015a. Processing included applying a 0.1 - 30 Hz bandpass filter, with a 6dB rolloff per octave. The continuous EEG was then epoched from 200 ms before to 1000 ms after the onset of each auditory cue, and individual trials were inspected for artifacts  and any containing excessive noise were manually removed. Channels that were excessively noisy throughout the recording were also removed Independent Component Analysis (ICA, T.P.Jung 2000) was then used to remove ocular and other artefacts. Any bad channels previously removed were interpolated after ICA using spherical splines.

Results

Naming task

Behavioral results were analysed using a mixed-effects logistic regression model. The fixed effects part of the model included stage (three levels: pre, post and follow-up) and training strategy (two levels: rote and inferential) as predictors. The random effects structure modelled between-subjects and between-item variability. The dependent variable was naming accuracy (two levels: correct and incorrect). The main effects and the interaction terms were assessed with a stepwise regression where the baseline model had only the intercept. Results showed a main effect of Stage, \(\chi^2\)(2) = 1458.4, p < 10^-15; and training strategy, \(\chi^2\)(2) = 6.84, p = .024. The interaction was not significant, \(\chi^2\) = 6.10, p = .15 (Bonferroni correction for 3 comparisons).  Post hoc comparisons showed significant differences between levels of stage. Specifically, there was a significant increase from pre to post training, z = -16.31, p <10-5,  and also from pre-training to follow-up, z = -15.30, p<10-5.Interestingly, there was also a significant decrease in  performance from post-training to follow-up, z = 3.55, p < .005.