4. Discussion

     The present study was intended to determine whether or not the electrophysiological dynamics underlying perceptual processing of orofacial movements are modulated by the linguistic content of visual speech cues and to what extent interfering with automatic mimicry can affect this process. We reported two main electrophysiological findings. First, early ERPs amplitudes were clearly modulated by linguistic content and more specifically by the PoA of the syllables. Second, the effect of the PoA was significantly reduced by the effector depression. 
    The first component modulated by the PoA was N270. That negative deflection peaking around 270 ms have previously been associated with conflicts in audiovisual integration. For instance, Wang et al. (2002b) reported a modulation of this component in a task where the gender of a visually presented face mismatched the gender of a voice pronouncing a vowel. The amplitude of N270 increased in response to audiovisual incongruity between face and voice gender \cite{Wang_2002}. Another study reported that the N270 was elicited in the presence of an audiovisual incongruity independently of its relevance for task solving. The authors concluded that "this component reflects the activity of a conflict detection process of automatic nature, rather resistant to top-down influences from voluntary attention networks" \cite{ORTEGA_2008}. More recently, Chennu et al. (2016) reported a negative deflection similar to the mismatch negativity effect in response to omitted sounds (i.e., the omission effect) indicating the presence of top-down attentional processes that strenghtens the brain's prediction of future events \cite{Chennu_2016}. In the current study, we interpret the N270 as marker of audiovisual inconsistency, in the sense that participants perceived a mouth saliently articulating a syllable but they never heard the corresponding speech sound. Congruently with this interpretation, it has been reported that, "during the processing of silently played lip movements, the visual cortex tracks the missing acoustic speech information when played forward as compared to backward" \cite{Hauswald_2018}. Interestingly, the most significant differences observed in this study were between bilabial and velar syllables. Those syllables have very different PoA, the former being performed with a clear movement of the lips and the latter being performed by a nearly imperceptible movement of the lower tongue. In that sense,  the greater amplitude of N270 observed for bilabial syllables compared to other syllables may be attributable to their different degrees of visual salience. The absence of differences in the amplitude of N270 between experiments 1 and 2 suggest that the restrained mobility of the upper articulatory system (i.e., lips and tongue) does not impair the detection of the crossmodal conflict induced by the omission of the auditory couterpart.
    The second component modulated by the PoA was N400. This negativity is traditionally associated with semantic incongruity processing. Interestingly, a recent study reported that when visual speech (i.e., silent articulations) was incongruent with preceding auditory words a significantly larger N400 was elicited compared to congruent conditions, suggesting the detection of the auditory-articulatory mismatch \cite{Kaganovich_2016}. In our study however, the stimuli were visually and silently displayed. Thus, rather than an auditory-articulatory mismatch, the larger amplitude of N400 could be interpreted as  a response to the conflict caused by the missing auditory counterpart of syllables articulation. Supporting this interpretation, no N270 or N400 components were elicited in response to backward syllables probably because they are not pronounceable so they lack auditory and motoric counterparts. An alternative interpretation can be formulated on the basis of studies suggesting that rather than being an index of semantic incongruity,  N400 reflects errors in speech prediction. A recent study showed that its amplitude increases in response to sentences containing unexpected target nouns compared to expected nouns. Importantly, the effect expectation violation as indexed by N400 amplitude was not observable when speech production system was not available (i.e., when articulators were suppressed). The latter suggest that the availability orofacial articulators is necessary for lexical prediction during reading \cite{Martin_2018}. In line with these results, we observed a significant difference in the amplitude of N400 for bilabial syllables between Experiment 1 and Experiment 2 when speech effectors, mostly the lips, were blocked. Bilabial articulatory movements are more visible and salient, as a consequence the subsequent auditory cues is more predictable. So, when it is not perceived, the effect of expectation violation is greater. In contrast, when speech articulators were restrained this effect was not observed. In that sense, our results support the idea of Martin et al. (2018) that speech effectors play a critical role in generating speech predictions. But, because we used syllables and not words, the results of the current study demonstrate that speech predictions are generated as early as the pre-lexical level. 
    We hypothesize that the mechanism underlying the ability to make speech predictions on the basis of articulatory movements is automatic imitation. Several authors have proposed that listeners covertly imitate speaker's orofacial gestures during face to face interactions allowing them to construct up-dating foward model and make predictions about the up-coming speech \cite{Pickering_2013,Gambi_2013,Brass_2005}. Although the absence of prediction error effect AND THETA in Experiment 2 strongly suggests that orofacial movements have a key role in speech perception, the data analysis performed in this study are not directly investigating the involement of motor systems. Further studies analyzing time-frequency domain or electromyographic responses of orofacial muscles could be more conclusive in that respect.
 

Acknowledgements

This research was founded by a PhD fellowship from the Consejo Nacional de Investigación Científica y Tecnológica (CONICYT) from the chilean governement.