DOC PREVIEW
MIT HST 722J - Visual speech speeds up the neural processing of auditory speech

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Visual speech speeds up the neural processingof auditory speechVirginie van Wassenhove*†‡, Ken W. Grant§, and David Poeppel*†‡¶*Neuroscience and Cognitive Science Program and Departments of†Biology and¶Linguistics, University of Maryland, College Park, MD 20742; and§Auditory-Visual Speech Laboratory, Walter Reed Army Medical Center, Washington, DC 20307Communicated by Morris Halle, Massachusetts Institute of Technology, Cambridge, MA, December 6, 2004 (received for review September 8, 2004)Synchronous presentation of stimuli to the auditory and visualsystems can modify the formation of a percept in either modality.For example, perception of auditory speech is improved when thespeaker’s facial articulatory movements are visible. Neural conver-gence onto multisensory sites exhibiting supra-additivity has beenproposed as the principal mechanism for integration. Recent find-ings, however, have suggested that putative sensory-specificcortices are responsive to inputs presented through a differentmodality. Consequently, when and where audiovisual representa-tions emerge remain unsettled. In combined psychophysical andelectroencephalography experiments we show that visual speechspeeds up the cortical processing of auditory signals early (within100 ms of signal onset). The auditory–visual interaction is reflectedas an articulator-specific temporal facilitation (as well as a non-specific amplitude reduction). The latency facilitation systemati-cally depends on the degree to which the visual signal predictspossible auditory targets. The observed auditory–visual data sup-port the view that there exist abstract internal representations thatconstrain the analysis of subsequent speech inputs. This is evidencefor the existence of an ‘‘analysis-by-synthesis’’ mechanism inauditory–visual speech perception.EEG 兩 multisensory 兩 predictive codingStudies of auditory–visual (AV) speech highlight severalcritical issues in multisensory perception, including the keyquestion of how the brain c ombines signals from segregatedprocessing streams into a single perceptual representation. In theMcGurk ef fect (1), an audio [p㛭] dubbed onto a facial displayarticulating [k㛭] elicits the ‘‘fused’’ percept [t㛭], whereas an audio[k㛭] dubbed onto a visual [p㛭] elicits various ‘‘combinations’’ suchas ‘‘pk㛭’’ or ‘‘kp㛭’’ but never a fused percept. These resultsillustrate the ef fect of input modality on the perceptual AVspeech outcome and suggest that multisensory percept forma-tion is systematically based on the informational content of theinputs. In classic speech theories, however, visual speech hasseldom been accounted for as a natural source of speech input.Ultimately, when in the processing stream (i.e., at which repre-sent ational stage) sensory-specific information fuses to yieldun ified percepts is fundamental for any theoretical, computa-tional, and neuroscientific accounts of speech perception.Recent investigations of AV speech are based on hemody-namic studies that cannot speak directly to timing issues (2, 3).Electroencephalographic (EEG) and magnetoencephalog raphic(4–7) studies testing AV speech integration have typically usedoddball or mismatch negativity paradigms, thus the earliest AVspeech interactions have been reported for the 150- to 250-msmismatch response. Whether systematic AV speech interactionscan be documented earlier is controversial, although nonspeechef fects can be observed early (8).AV Speech as a Multisensory ProblemSeveral properties of speech are relevant to the present study. (i)Because AV speech is ec ologically valid for humans (9, 10), onemight predict an involvement of specialized neural computationscapable of handling the spectrotemporal complexity of AVspeech (compared to, say, arbitrary tone–flash pairings), forwhich no natural functional relevance can be assumed. ( ii)Natural AV speech is characterized by particular dynamics suchas (a) the temporal precedence of visual speech (the movementof the facial articulators typically precedes the onset of theac oustic signal by tens to a few hundred milliseconds (Fig. 1) and(b) a tolerance to desynchronization of the ac oustic and visualsignals of ⬇250 ms (11), a time constant characteristic ofsyllables across languages (12) that relates closely to a proposedtemporal integration c onstant underlying perceptual unit for-mation (13, 14). (iii) For speech processing, abstract represen-t ations have been postulated. Specifically, linguistic theoriesdealing with the constituents of the speech signal and therelation to the stored representations build on the central notionof distinctive feature. These abstract building blocks have preciserelations to the (intended) motor c ommands (articulatory ges-tures) involved in speech production (15, 16) as well as acousticinterpret ations (17). (iv) Visual speech provides direct butimpoverished evidence for particular articulatory targets; inc ontrast, the auditory utterance alone usually per mits completeperceptual categorization (say, on the phone). For instance,although an audio-alone兾pa兾leads to a clear percept兾pa兾, itsvisual-alone counterpart (i.e., seeing a mouth articulating [pa])is limited to the recognition of a visual place-of-articulationclass, or the ‘‘viseme’’ category bilabials, which comprises thepossible articulations [p㛭], [b㛭], and [m㛭].Neurophysiological Basis of Multisensory IntegrationConvergent neural pathways onto multisensory neurons (18)have been argued to prov ide the substrate for multisensorybinding (19). A typical signature of multisensory neurons is theenhanced response (supra-additivity) to the present ation ofc o-occurring events. Consistent with the concept that multisen-sory neurons mediate the integration of un isensory informationinto a multisensory representation, functional MRI studies ofAV speech show that auditory and polysensory cortices, specif-ically superior temporal sulcus and superior temporal g yrus,reflect enhanced activation when compared to unimodal speech(20, 21). The involvement of polysensory areas has suggested apossible computational route for AV speech processing: unimo-dal signals integrated in multisensory cortical sites (say, superiortemporal sulcus) feed back onto primary sensory fields (22, 23).The feedback hypothesis predicts the enhanced activation ofauditory cortices (24).This explanation has appealing properties, but there


View Full Document
Download Visual speech speeds up the neural processing of auditory speech
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Visual speech speeds up the neural processing of auditory speech and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Visual speech speeds up the neural processing of auditory speech 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?