Categories
philosophy

The Motor Theory Of Speech Perception

Introduction

The Motor Theory of Speech Perception is an account of how we understand speech. Mole has raised some objections to it. I shall support the Motor Theory by providing responses to those objections.

Our speech perception functions very well even in conditions where the signal is of poor quality. These abilities are markedly better than our perception of non-speech sounds. For example, you can fairly easily pick out words. You can do that even against a background of intense, and louder, traffic noise. This fact makes it seem  that there is a special nature to speech perception as compared to perception of non-speech sounds.

The Motor Theory of Speech Perception (Liberman and Mattingly 1985) seeks to explain this special nature of speech perception. The idea is that the mechanical and neural elements for production of speech are also used for perception of speech.  

On this view, speech perception is the offline running of the systems that when online, actually produce speech. According to the Motor Theory, motor activation are also occurring when perception of speech takes place. Motor activations are micro-movements of mouth and tongue muscles or preparations thereto.

You might make subliminal movements of the type you would make to produce an `S’ sound. Then you are thereby well-placed to understand that someone else whom you see making such movements is likely to be producing an `S’ sound. This is how we understand one another’s speech so well.  And so it is key to the Motor Theory of Speech Perception that speech perception is special.

Analogies Between Simulation Theory and The Motor Theory of Speech Perception

In some ways, the position of the Motor Theory in explaining speech perception is analogous to the position of Simulation Theory (see Short, 2015). Simulation Theory seeks to explain how we are often able to predict and explain the behaviour of other people (so-called Theory of Mind).

In both cases, the account seeks to generate a maximally powerful explanation of the phenomenon using the minimum of additional “moving parts”.  The Motor Theory notes that we already have complicated machinery to allow us to produce speech. The theory says we also use that complicated machinery to understand speech.

The Simulation Theory account of Theory of Mind notes that we already have an immensely complex piece of machinery – a mind. It postulates that we may also use that mind to simulate others and thus understand them.  I see value in these parsimonious and economical simulation approaches in both areas.

The Mole Objections to the Motor Theory of Speech Perception

Mole (Ch. 10, 2010) challenges the Motor Theory.  He agrees that speech perception is special. But he does not agree that it is special in such a way as to support the Motor Theory.  In this article, I will offer responses on behalf of the Motor Theory to Mole’s (2010) challenge in five ways.

Speech Perception Is Special

Mole (2010) claims that speech perception is not special.  If that is true, then the Motor Theory cannot succeed because it proceeds from that assumption.  I will first deny Mole’s (2010) claim that other perception also involves mapping from multiple percepts to the same meaning and is therefore not unique to speech perception. Taking an example from speech, we always understand the name “Sherlock” to refer to that detective. That’s true even though someone says it in a myriad of different ways.  

This phenomenon is invariance.  Mole (2010) claims that there is nothing special about speech perception. Other types of perception (such as colour perception) also involve mapping from multiple external sources of perceptual data to the same single percept. I will show that the example from visual perception invoked by Mole (2010) is not of the correct type. The example does not dismiss the need for a special explanation of speech perception provided by the Motor Theory.

Special Invariance in Speech Perception

Mole (2010) makes another challenge to the idea that underpins the Motor Theory that there is a special invariance in speech perception.  This special invariance is the way that we always understand “Sherlock” to refer to the detective. That’s true whichever accent the speaker has. Or whatever the background noise level is (provided of course that we can actually hear the name).  Mole (2010) claims that invariances in speech perception are not special as similar invariances also occur in face recognition.  Mole (2010) seeks to make out his face recognition point by discussing how computers perform face recognition;  I will show that he does not succeed here.

McGurk Data Provide the Wrong Cross-Talk

The famous McGurk experiment describes “cross-talk” effects. Visual and aural stimuli interact with each other and change how we perceive them.  For example, subjects seeing a video of someone saying “ga” but hearing a recording of someone saying “ba” report that they heard “da.”  Since the Motor Theory postulates that speech perception is special, such cross-talk effects will support the Motor Theory if they are in fact special to speech perception.  Mole (2010) uses cross-modal data from two experiments with the aim of showing that such cross-talk also exists in non-speech perception.  I will suggest that the experiments Mole (2010) cites do not provide evidence for the sort of cross-talk phenomenon that Mole (2010) needs to support his position.

People Who Can’t Speak Can Understand Speech

My account refutes Mole’s (2010) claim that Motor Theory cannot account for how persons who cannot speak can nevertheless understand speech. I will outline how that could occur.

Other Problems for Mole

Finally, I will briefly consider a range of additional data that support the Motor Theory. These therefore challenge the position espoused by Mole (2010).  These are that the the Motor Theory explains all three of the following conundrum. Firstly, cerebellar involvement in dyslexia. Secondly, observed links between speech production and perception in infants. Thirdly why neural stimulation of speech production areas enhances speech perception.

Challenges To Mole (2010)

Mole’s (2010) Counterexample From Visual Perception Is Disanalogous To Speech Perception

A phoneme is a single unit of speech.  A phoneme is approximately the aural equivalent of a syllable.  Any single phoneme is understood by the listener. This is despite the fact that many different sound patterns associated with it.  It is clearly a very useful ability of people to be able to ignore details about pitch, intensity and accent. They focus purely on the phonemes which convey meaning.  This invariance is a feature of speech perception but not of sound perception. That situation motivated the proposal of the Motor Theory.

Invariances

It is important to be clear on where there is invariance and where there is lack of invariance in perception.  There is invariance in the item which the perceiver perceives (for example, Sherlock). This is even though there is a lack of invariance in the perceptual data that allows the perceiver to have the perception.  

So we can see that it is Sherlock’s face (an invariance in what is understood) even though the face may be seen from different angles (a lack of invariance in perceptual input).  Similarly, we may hear that Sherlock’s name is spoken (an invariance in what we understand) even though the name may be spoken in different accents (a lack of invariance in perceptual input).   Lack of invariance is of course the same as variance.

For supporters of the Motor Theory, this invariance in what the listener reports that they have heard is evidence that the perceptual object in speech perception is a single gesture. In fact, the one phoneme that the speaker intended to pronounce.  We report this single object despite the fact that the speaker pronounces the phoneme in a wide variety of accents.  The accents can vary a great deal. There is still invariance in what the speaker understands.

Mole (2010) denies that this invariance is evidence for the special nature of speech.  Mole (p.217, 2010) writes: “[e]ven if speech were processed in an entirely non-special way, one would not expect there to be an invariant relationship between […] properties of speech sounds […] and phonemes heard for we do not […] expect perceptual categories to map onto simple features of stimuli in a one-to-one fashion.”

Mappings

Mole’s (2010) argument is as follows.  He allows that there is not a one-to-one mapping between stimulus and perceived phoneme in speech perception.  I will also concede this.  Mole (2010) then denies that this means that speech perception is special. His grounds are that there is not in general a one-to-one mapping between stimulus and percept in perception (other than in speech). 

He produces a putative example in vision, by noting the existence of `metamers’.  A metamer is one of two colours of slightly different wavelengths that are nevertheless perceived as the same colour.  Colour is defined here by wavelength rather than phenomenology. 

Mole (2010) has indeed produced a further example of a situation where there is not a one-to-one mapping between stimulus and percept.  However, this lack of one-to-one mapping is not exactly what causes the special nature of speech perception under the Motor Theory. Rather the relevant phenomenon is ‘co-articulation.’ That is, the way in which we are generally articulating more than one phoneme at a time.

Coarticulation

As Liberman and Mattingly write (1985, p. 4), “coarticulation means that the changing shape of the vocal tract, and hence the resulting signal, is influenced by several gestures at the same time” so the “relation between gesture and signal […] is systematic in a way that is peculiar to speech”.  So while it is indeed the case that multiple stimuli are presented which result in a single percept, it is the temporal overlap between those stimuli that is the key factor. It is not the mere fact of their multiplicity.  In other words, the Motor Theory argument relies on the fact that a speaker is pronouncing more than one phoneme at a time during overlap periods.

Disanalogy of Metamer Example

This means that Mole’s (2010) metamer example is disanalogous. It only deals with the multiplicity of the stimuli in the mapping and not with their temporal overlap.  This is the case because there cannot in fact be a temporal overlap between two colour stimuli.  We can see this using a thought experiment.  Let us imagine a lighting rig that is capable of projecting any number of arbitrary colours. Also it can project more than one colour at the same time.

In that case, we could not say that the perception of a colour projected at a particular time was changed by the other colours projected with it.  That situation would simply be the projection of a different colour.  So a projection of red light with green light does not produce a modified red, it produces yellow light.  It is not possible to have a “modified red,” because such a thing is not red any more.  The rig would not be projecting a different sort of red; it would be projecting a different colour that was no longer red.

Using Hearing as an Example

I will illustrate this further with an example from a different sensory modality: hearing.  The position I am taking about red (more exactly, an precise shade of red) is essentialist.  On essentialist accounts, some properties of an item can change. Change will result in a modified version of that item. Other properties, the essential ones, cannot change without the original item losing its identity.

For example, some properties of an opera are essential to it being an opera.  By definition, it is symphonic music with singing.  A symphony requires only the musical instruments.  Some properties of an opera can change and this will result in a modified opera.  One could replace the glass harmonica scored for the Mad Scene in Lucia di Lammermoor with flute.  One would then have a performance of a modified version of Lucia. It would be a modified opera and would still be an opera.

What one could not do is change an opera into a symphony, strictly speaking.  There could be a performance of the first act of Lucia as normal. One would be watching a performance of an opera.  If in the second act the musicians came out and played without the singers, one would not have converted an opera into a symphony.  One would have ceased to perform an opera and begun to perform a symphony. Albeit, it would be one musically identical to the non-vocal parts of Lucia.

An Imaginary Lighting Rig

Returning to the lighting rig, we cannot say here that yellow is a modified red. If we did, we would abandon any meaning for separate colour terms altogether. Every colour would be a modified version of every other colour.  This impossible lighting rig is what Mole (2010) needs to cite to have a genuine example. It would require the projection of multiple stimuli at the same time. It would result in activation of the same perceptual category.

In sum, a metamer is an example where there is no one-to-one mapping between stimulus and perceptual category. Also, the different stimuli are not simultaneous.  This is the case because we cannot be looking at both colours involved in a metamer at the same time.  

A co-articulation by contrast is an example of where there is no one-to-one mapping between stimulus and perceptual category, but where the different stimuli are indeed simultaneous. As it is that very simultaneity that is the key to the special nature of the systematic relation between gesture and signal under the Motor Theory, Mole (2010) does not have an example here that demonstrates that speech perception is not special.

Face Recognition Does Not Show A Similar Sort of Invariance Of Perception As Speech Recognition

Mole (2010) claims that face recognition is another example of invariance. For example, we can recognise that we are looking at Sherlock’s face from various angles and under different lighting conditions. This challenges the idea that invariance in speech perception is evidence for the special nature of speech perception.  His claim is that the invariance lies in the way we can always report that we are looking at Sherlock’s face. That occurs despite variance in input visual data. This he claims is similar to the invariance in the way that we can always report we have heard Sherlock’s name despite variance in input aural data.  If that is true, then Mole (2010) has succeeded in showing that speech perception is not special as the Motor Theory claims.

Mole (2010) allows that we use invariances in face recognition. He denies though this is explicable by examination of retinal data.  He writes: “[t]he invariances which one exploits in face recognition are at such a high level of description that if one were trying to work out how it was done given a moment-by-moment mathematical description of the retinal array, it might well appear impossible” (Nudds and O’Callaghan 2010, p. 216).  What this means is that it would be difficult to get from the retinal array (displaying a great deal of lack of invariance) to the features we use in recognising Sherlock. Those features would include for example our idea of the shape of his nose (which is quite invariant).

A Response to The Mole Objection

However, we can question this as follows.  Since the only thing that computers can do in terms of accepting data is to read in a mathematical array, Mole’s (2010) claim is in fact equivalent to the claim that we cannot see how computers can perform face recognition.  That claim is false.  To be very fair to Mole (2010), his precise claim is that the task might appear impossible. But I shall now show that since it is widely understood to be possible, it should not appear impossible either.

Computers Can “Recognize” Faces

Fraser et al. (2003) describe an algorithm that performs the face recognition task better than the best algorithm in a ‘reference suite’ of such algorithms. 

Their computer sees a gallery of pictures of faces and a target face. It’s instructions are to sort the gallery such that the target face is near the top.  The authors report that their algorithm is highly successful at performing this task.  Fraser et al. write (2003, p. 836): “[w]e tested our techniques by applying them to a face recognition task and found that they reduce the error rate by more than 20% (from an error rate of 26.7% to an error rate of 20.6%)”.  So the computer recognized the target face around 80% of the time.

So we see firstly that the computer can recognize a face.  [It is not an objection here to claim that strictly speaking, computers cannot `recognise’ anything.  All that we require here is that computers can be programmed so as to distinguish faces from one another merely by processing visual input.  It is this task which Mole (2010) claims appears impossible.]  Then we turn to the claim that how the computer does this is incomprehensible.  The entire paper, which discusses exactly that topic at length, refutes this. We can take it that such understanding is widely to hand in computational circles.

Efficiency

It may be true in one sense that we could not efficiently perform the same feat as the computer. We could not physically take the mathematical data representing the retinal array and explicitly manipulate it in a sequence of complex ways in order to perform the face recognition task.  In another sense, we could, of course. It is what we do every time we actually recognize a face.  The mechanics of our eyes and the functioning of our perceptual processing system have the effect of performing those same mathematical manipulations.  We know this because we do in fact perform face recognition using only the retinal array as input data.

What Mole Has Provided

Mole (2010) has indeed provided an example of invariance (i.e., in face recognition). However, the example does not damage the need for a special explanation of the speech perception invariances. The face perception example is in fact easily explicable.  Therefore Mole (2010) has not here provided a further example of a invariance. He has not thereby questioned the specialness of speech perception.  Speech perception continues indeed to exhibit a unique invariance which continues to appear in need of unique explanation.

Experimental Data Do Not Show Cross-Modal Fusion 

Cello Experiment

Mole (2010) argues that an experiment on judgments made as to whether a cello was bowed or plucked shows the same illusory optical/acoustic combinations as the McGurk effect. 

The McGurk effect (McGurk and MacDonald 1976) involves subjects hearing a /ba/ stimulus and seeing a /ga/ stimulus.  The subjects report that they have perceived a /da/ stimulus. This is not one of the stimuli; it is a fusion or averaging of the two stimuli.  So an optical stimulus and and an acoustical stimulus have combined. Together, they produce an illusory result which is neither of them.

If Mole’s (2010) claim that the cello experiment shows McGurk-like effects is true, this would show that these illusory effects are not special to speech. This would challenge the claim that there is anything special about speech that the Motor Theory can explain.  

Mole (p. 221, 2010) writes: “judgments of whether a cello sounds like it is being plucked or bowed are subject to McGurk-like interference from visual stimuli”.  However, the data Mole (2010) cites do not show the same type of illusory combination. So Mole (2010) is unable to discharge the specialness of speech perception as he intends.

Gestures

The Motor Theory postulates that the gesture intended by the speaker is the object of the perception. The object is not the acoustical signal produced.  The theory explains this by also postulating a psychological gesture recognition module. This will make use of the speech production capacities in performing speech perception tasks.  Thus the McGurk effect constitutes strong evidence for the Motor Theory. It explains that the module considers optical and acoustical inputs in deciding what gestur the speaker intends.

This strong evidence would be weakened if Mole (2010) can show that McGurk-like effects occur other than in speech perception. The proponents of the Motor Theory would then be committed to the existence of multiple modules. Their original motivation by the observed specialness of speech would be put in question, in fact as in the McGurk effect.

McGurk Data

The paper Mole (2010) cites, Saldaña and Rosenblum (1993), describes an experimental attempt to find non-speech cross-modal interference effects. They used a cello as the source of acoustic and optical stimuli.  Remarkably, Saldaña and Rosenblum (1993) state in their abstract that their work suggests “the nonspeech visual influence was not a true McGurk effect.” This is in direct contradiction of Mole’s (2010) stated reason for citing them.

Photo by Méline Waxx on Pexels.com

There are two ways to make a cello produce sound: plucking or bowing.  The experimenters proceed by presenting subjects with discrepant stimuli. For example, they presented an optical stimulus of a bow accompanied by an acoustical stimulus of a pluck.  Saldaña and Rosenblum (1993) found that the reported percepts are adjusted slightly by a discrepant stimulus. They move in the direction of that stimulus.

However, to see a McGurk effect, we need the subjects to report that the gesture they perceive is a fusion of a pluck and a bow.  Naturally enough, this did not occur, and indeed it is unclear what exactly such a fusion might be.  Therefore, Mole (2010) has not here produced evidence that there are McGurk effects outside the domain of speech perception.

Mole’s Response to the Data

Mole’s (2010) response is to dismiss this as a merely quantitative difference between the effects observed by the two experiments.  Mole (p. 221, 2010) writes:  “[t]he McGurk effect does reveal an aspect of speech that is in need of a special explanation because the McGurk effect is of a much greater magnitude than analogous cross-modal context effects for non-speech sounds”.  As we saw, Mole (2010) is wrong to claim there is only a quantitative difference between the McGurk effect observed in speech perception and the cross-modal effects observed in the cello experiment. Only in the former do we see fusion effects.  That is most certainly a major qualitative difference.

Mole’s (2010) claim that the cello results are only quantitatively different to the results seen in the McGurk effect experiment produces further severe difficulties when we consider in detail the experimental results obtained.  The cello experimenters describe a true McGurk effect as being one where there is a complete shift to a different entity. The syllable is clearly heard. It is entirely different to the one in the acoustic stimulus.  Saldaña and Rosenblum (1993, p. 409) describe these McGurk data as meaning: “continuum endpoints can be visually influenced to sound like their opposite endpoints”.

Cello Data

The cello data were not able to make a pluck sound exactly like a bow. In fact, the discrepant optical stimuli were only able to slightly shift the responses in their direction, by less than a standard deviation, and in some cases not at all.  This is not the McGurk effect at all and so Mole (2010) cannot say it is only quantitatively different.  Indeed, Saldaña and Rosenblum (1993, p. 410) specifically note that: “[t]his would seem quite different from the speech McGurk effect”.

In sum, the cross-modal fusion effect that Mole (2010) needs is physically impossible in the cello case. The data do not even represent a non-speech analog of the McGurk effect. That is confirmed by the authors.  Once again, speech perception remains special and the special Motor Theory is needed to explain it.

Sound Localization Experiment

The other experiment relied on by Mole (2010) is Lewald and Guski (2003). It considered the ventriloquism effect. However, the result that Mole (2010) needs to support his theory is an effect that is a good analogy to the McGurk effect in a non-speech domain. As I will show below, the data from the Sound Localisation Experiment also fails to bear out his claim that there are McGurk-like effects outside the domain of speech perception.

The Sound Localisation  Experiment uses tones and lights as its acoustic and optical stimuli.  It investigates the ventriloquism effect quantitatively in both the spatial and temporal domains.  The idea is that separate optical and acoustic events are perceived as a unified single event with optical and acoustical effects.  This will only occur if the spatial or temporal separation of the component events is below certain thresholds.

Integration Windows

Lewald and Guski (2003, p. 469) propose a “spatio-temporal window for audio-visual integration.” In this window, separate events are perceived as unified.  They suggest maximum values of 3◦ for angular or spatial separation and 100 ms for temporal separation. 

Thus a scenario in which a light flash occurs less than 3◦ away from the source of a tone burst will produce a unified percept of a single optical/acoustical event. So will a scenario in which a light flash occurs within 100 ms of a tone burst.  The two stimuli in fact occurred at slightly different times or locations. So this effect entails that at least one of the stimuli is perceived to have occurred at a different time or location than it actually did.

Recap of McGurk Effect

To recap, in the McGurk effect, discrepant optical and acoustic stimuli result in a percept that is different to either of the two stimuli and is a fusion of them.  We may allow to Mole (2010) that Lewald and Guski (2003) do indeed report subjects perceive a single event comprising a light flash and a tone burst.  However, that is insufficient to constitute an analogy to the McGurk effect. 

Subjects do not report that their percept is some fusion of a light flash and a tone burst – as with the cello experiment, it is unclear what such a fusion could be – they merely report that an event has resulted in these two observable effects.  [We may note that Lewald and Guski (2003) do not take themselves to be searching for non-speech analogs of the McGurk effect; the term does not appear in their paper or in the titled of any of their 88 citations, throwing doubt on the claim that they are working in the field at all.]

Fused Events?

Indeed, subjects were not asked whether they perceived some fused event.  They were asked if the sound and the light had a common cause. They were also asked if the events were co-located or were synchronous.  As Lewald and Guski write (p. 470, 2003): “[i]n Experiment 1, participants were instructed to judge the likelihood that sound and light had a common cause.  In Experiment 2, participants had to judge the likelihood that sound and light sources were in the same position. In Experiment 3, participants judged the synchrony of sound and light pulses’ ”. 

A ‘common cause’ might have been some particular event. But it is not the sound and the light and they were the only items perceived. Therefore the instructions do not even admit the possibility of perception of a fused event.

What Lewald and Guski Measured

It is puzzling that Mole (p. 221, 2010) cites (Lewald and Guski 2003) to support his claim that perceived flash count is influenced by perceived tone count.  We see this when Mole writes (p. 221, 2010):  “[t]he number of flashes that a subject seems to see can be influenced by the number of concurrent tones that he hears (Lewald and Guski 2003)”.

Moreover, neither the Sound Localisation Experiment nor the cello experiment support Mole’s (p. 221, 2010) summation that “[i]t is not special to speech that sound and vision can interact to produce hybrid perceptions influenced by both modalities” in the way he needs.  Unlike with the McGurk effect, there are no hybrid perceptions in either case. “Hybrid” means ‘a perception of an event which is neither of the stimulus events’.

There are cross-modal effects between non-speech sound stimuli and optical stimuli. But that is inadequate to support Mole’s (2010) claim that speech is not special.  We still need the special explanatory power of the Motor Theory.

Mute Perceivers Are Not A Problem

One of Mole’s (2010) challenges is that the Motor Theory cannot explain how some people can have the capacity to perceive speech that they lack the capacity to produce.  Mole writes (p. 226, 2010) that “[a]ny move that links our ability to perceive speech to our ability to speak is an unappealing move, since it ought to be possible to hear speech without being able to speak oneself”.  There is an equivocation here though on the meaning of ‘capacity to produce’.  Mole (2010) is reading that term so that the claim is that someone who is unable to use their mouth to produce speech lacks the capacity to perceive speech.  Since such mute people can indeed as he claims understand speech, he takes his claim to be made out.

However, in the article cited by Mole (2010), it is clear that this is not what ‘capacity to produce’ means.  In the study by Fadiga et al. (2002) described, the neuronal activation related to tongue muscles is not sufficient to generate movement.  This activation is a result of the micro-mimicry that takes place when people are perceiving speech.  Fadiga et al. (2002) call this mimicry “motor facilitation.”

Motor Facilitation

Fadiga et al. (p. 400, 2002) write: “The observed motor facilitation is under-threshold for overt movement generation, as assessed by high sensitivity electromyography showing that during the task the participants’ tongue muscles were absolutely relaxed”.   Thus the question is whether the subject has the capacity to produce such a sub-threshold activation, and not the capacity to produce speech via a super-threshold activation.   Naturally, since all the subjects had normal speech, they could produce both a sub-threshold and a super-threshold activation, with the latter resulting in speech.

However, someone could be able to activate their tongue muscles below the threshold to generate overt movement but not be able to activate those muscles above the threshold.  That would mean that they lacked ‘capacity to produce’ in Mole’s (2010) sense, but retained it in Fadiga et al.’s (2002) sense.  This would be a good categorization of the mute people who can understand speech they cannot utter. 

An Empirical Test

Those people would retain the ability to produce the neural activity that Fadiga et al. observe, which does not result in tongue muscle movement.  This is a testable empirical claim. My account commits to it.  It is possible that they may not be able to even produce the sub-threshold neural signals. If that turns out to be correct, it would be a problem for the Motor Theory and the defence I have offered for it here.

Similarly, we can resolve Mole’s (2010) puzzle about how one can understand regional accents that one cannot mimic; i.e. I can understand people who speak with an accent that is different to mine.  The capacity to understand a particular accent could result from our ability to generate the necessary sub-threshold activations, but not the super-threshold ones.  If we go on to acquire that regional accent, our super-threshold muscle activation capacities would be of the required form.  This again is an empirical prediction which makes my account subject to falsification by data.

Implications in Developmental Psychology

This hypothesis could have interesting implications in the field of developmental psychology.  Mole (p. 216, 2010) outlines how infants can perceive all speech sound category distinctions. They eventually lose the ability to discriminate the ones that do not represent a phoneme distinction in their language.  

So it may be the case that all infants are born with the neural capacity to learn to generate super-threshold activations of all regional accents. But they eventually retain that capacity only at the sub-threshold level. That is because they can later understand a wide range of regional accents. They lose the capacity at the super-threshold level for those regional accents they cannot mimic.

Another implication here of the Motor Theory is to say that a listener’s vocal tract can function as a model of itself. That would be just as a listener’s vocal tract can function as a model of a speaker’s vocal tract.  This means that the sub-threshold activation functions as a model of the super-threshold activation. So, perceptual capacities involve the former modelling the latter exactly as the Motor Theory predicts.  

Such an approach does not commit the Motor Theory to the modelling/perception neurons controlling the sub-threshold activations being the same as the production neurons controlling speech production. So the account is not susceptible to falsification on that precise point.

Further Brief Challenges To Mole (2010)

The Motor Theory Explains Cerebellar Involvement In Dyslexia

Mole (2010) challenges the Motor Theory. In doing so, he challenges the idea that we use speech production capacities in speech recognition.  For this reason, any data showing links between speech production capacities and speech recognition capacities will be a problem for him.

Ivry and Justus (2001) refer to a target article. This shows 80% of dyslexia subjects have cerebellar impairments.  The cerebellum is a motor area. Dyslexia is most definitely a language disorder. So we have clear evidence for a link between language and motor areas.  That is naturally a result that the Motor Theory accommodates and which links speech production and speech recognition.

It is not open to Mole (2010) to respond that the link is only between motor control areas and writing control areas. That is because although writing skills are the primary area of deficit for dyslexic subjects, the authors also found impairments in reading ability were strongly associated with the cerebellar impairments.  This is explicable by the Motor Theory because it says that Motor deficits will result in speech recognition deficits.  Mole (2010) needs to provide an explanation of this which does not rely on the Motor Theory.

The Motor Theory Explains Links Between Speech Production And Perception In Infants

Mole (2010) does not address some important results supplied by Liberman and Mattingly (1985: p. 18) that link perception and production of speech.  These data show that infants preferred to look at a face producing the vowel they were hearing rather than the same face with the mouth shaped to produce a different vowel. 

That effect does not occur if vowel sounds are replaced with non-speech tones matched for amplitude and duration with the spoken vowels.  What this means is that the infants are able to match the acoustic signal to the optical one.  In a separate study, the same extended looking by infants occurred when a disyllable was the test speech sound.  These data are inexplicable without postulating a link between speech production and speech perception abilities, because differentiating between mouth shapes is a production-linked task – albeit one mediated by perception – and differentiating between speech percepts is a perceptual task.

The Motor Theory Explains Why Neural Stimulation Of Speech Production Areas Enhances Speech Perception

D’Ausilio et al. (2009) applied Transcranial Magnetic Stimulation (“TMS”) to areas of the brain responsible for motor control of articulators.  Articulators are the physical elements that produce speech, such as the tongue and lips.  After the TMS, they tested the subjects on their abilities to perceive speech sounds.  The stimulation of speech production areas improved the ability of the subjects to perceive speech.  The authors suggest that TMS primes the relevant neural areas so they are more liable to be activated subsequently.

Even more remarkably, the experimenters find more fine grained effects such that stimulation of the exact area involved in production of a sound enhanced perceptual abilities in relation to that sound.  D’Ausilio et al (2009, p. 383) report: “the perception of a given speech sound was facilitated by magnetically stimulating the motor representation controlling the articulator producing that sound, just before the auditory presentation”.  This constitutes powerful evidence for the Motor Theory’s claim that the neural areas responsible for speech production are also involved in speech perception.

Conclusion

Special situations require special explanations.  The Motor Theory of Speech Perception is a special explanation of speech perception which, as evidenced by the rejection of Mole’s objections, continues to be needed. 

One might say that such “specialness” means the Motor Theory stands in a vulnerable and isolated position, as it seeks to explain speech perception in a way that is very different to how we understand other forms of perception.   Here, I would revert to my brief opening remarks about the similarities between the Motor Theory and Simulation Theory.  Whilst the Motor Theory is indeed a special way to explain speech perception, it is at the same time parsimonious and explanatorily powerful because like Simulation Theory, it does not require any machinery which we do not already know we possess.  This underlies the continued attractiveness of Motor Theory as a convincing account of how people perceive speech so successfully.

See Also:

What Is “Theory Of Mind?”

#Proust: An Argument For #SimulationTheory

The Psychology of Successful Trading: see clip below of me explaining my new book!

Sherlock Holmes as Enemy of Confirmation Bias

References 

D’Ausilio, A et al. 2009  The Motor Somatotopy of Speech Perception.  Current Biology 19: pp. 381–385.  DOI: 10.1016/j.cub.2009.01.017

Fadiga, L et al. 2002  Speech Listening Specifically Modulates the Excitability of Tongue Muscles: a TMS study.  European Journal of Neuroscience, 15: pp. 399–402.  DOI: 10.1046/j.0953-816x.2001.01874.x

Fraser, A M et al. 2003  Classification modulo invariance, with application to face recognition.  Journal of Computational and Graphical Statistics, 12 (4): pp. 829–852.  DOI: 10.1198/1061860032634

Ivry, R B and T C Justus 2001  A neural instantiation of the motor theory of speech perception.  Trends in Neuroscience, 24 (9): pp. 513–5.  DOI: 10.1016/S0166-2236(00)01897-X

Lewald, J and R Guski 2003  Cross-modal perceptual integration of spatially and temporally disparate auditory and visual stimuli.  Brain Research. Cognitive Brain Research (Amsterdam), 16: pp. 468–478.  DOI: 10.1016/S0926-6410(03)00074-0

Liberman, A and I G Mattingly 1985  The Motor Theory of Speech Perception Revised.  Cognition, 21: pp. 1–36.  DOI: 10.1016/0010-0277(85)90021-6

McGurk, H and J MacDonald 1976  Hearing lips and seeing voices.  Nature, 264, (5588): pp. 746–748.  DOI: 10.1038/264746a0

Mole, C 2010 The motor theory of speech perception in Sounds and Perception: New Philosophical Essays.  Oxford: Oxford University Press.  DOI: 10.1093/acprof:oso/9780199282968.001.0001

Saldaña, H M and L D Rosenblum 1993  Visual influences on auditory pluck and bow judgments.  Perception And Psychophysics, 54 (3): pp. 406– 416.  DOI: 10.3758/BF03205276

Short, T L 2015  Simulation Theory: a Psychological and Philosophical Consideration.  Abingdon: Routledge.  URL: https://www.routledge.com/Simulation-Theory-A-psychological-and-philosophical-consideration/Short/p/book/9781138294349

Categories
the psychology of successful trading

Simulation Theory

Introduction

Theory of Mind (ToM) is the label for the abilities we have to predict and explain the behaviour of others, by ascribing mental states such as belief and desire to them, or otherwise. There are two major competing theories of ToM: the Theory Theory (TT) and Simulation Theory (ST). TT holds that we understand others by having a theory of them or their behaviour. ST holds that we understand others by putting ourselves in their place. There are also different types of ST. ST(Transformation) holds that I simulate you by becoming you. ST(Replication) holds that I simulate you by becoming like you. Below I briefly address three objections to ST.

ST(Transformation) is Incomprehensible

ST(Transformation) has been questioned. Stich and Nichols provide three possible interpretations of what Gordon’s position might mean, all of which they find unsatisfactory. They note that Gordon has characterised ST(Transformation) as meaning that “we explain and predict behaviour by “imaginative identification” — “that is, [we] use our imagination to identify with others” (Stich and Nichols [p. 91]{Davies95}) in order to fulfil that aim.

“Imaginative Identification”

They quickly dismiss the first interpretation of this. That was the idea that we experience conscious imagery when we simulate on the grounds of phenomenological implausibility. The second interpretation involves consideration of the explanation. Is the intention to cover all or merely some cases of application of ToM?Stich and Nichols think that if the change is made to `some’ then ST becomes “patently true [but] not very exciting, and […] not incompatible with TT”. (Stich and Nichols [p. 92]{Davies95}).

However, since it seems that there are ambiguous cases of use of both TT and ST, the serious defence of either should lie in the claim that one of the theories explains many important cases of application of ToM and not all. So Gordon’s line should escape Stich and Nichols particular charge here.

Stich and Nichols conclude though that Transformation ST involves “imaginative identification with the other” and that this is a label for “a special sort of mental act or process which […] need not be accompanied by conscious imagery” (Stich and Nichols [p. 92]{Davies95}). Stich and Nichols then ask what this means, bringing the charge that they find it incomprehensible.

“If I Were You” In Simulation Theory

There arise here immediate questions which are familiar. We might ask what people mean when they employ the popular locution `If I were you…’ when giving advice. The conundrum is that the person giving advice presumably means “if I were in your position with my outlook and abilities, I would do X.” However, those abilities and that outlook might preclude being in the situation being advised upon.

It does not seem plausible that the locution means “If I were you in your position with your abilities and outlook I would do X.” That is because a). presumably the person receiving the advice already has access to that type of suggestion and b). the advisor will not, necessarily. Daniel phrases this objection neatly when he asks “how much of myself am I to project into the other persons’s shoes” (Daniel [p. 39]{Daniel93}). The answer, of course, is `the right amount’.

What Makes Simulation Hard?

I will use the term S to refer to the subject doing the simulating. O is the target of simulation. S wishes to understand or predict the behaviour of O. In Simulation Theory, S does this by simulating O.

S will not be successful in simulating O if S ascribes to O abilities and experiences that are remote from those of O. That’s true irrespective of whether that profile of abilities and experiences match those of S more closely. Naturally this presents some difficulties for simulation. S’s will find it difficult to simulate O’s who are dramatically more or less intelligent than themselves.

Simulating People Much Smarter Or Dumber Than Ourselves Is Hard

Stich and Nichols may legitimately ask which line Simulation Theory takes on the conundrum. Re-examining the argument above produces the opposite conclusion. S does not want to use S’s own abilities and outlook to predict what O will do. To the extent O has different abilities and outlooks, S’s prediction will be wrong.

A chess grandmaster does not expect a novice player to use the same defence that he saw used against a particular attack in his last world championship appearance. The grandmaster may indeed struggle to reduce his abilities to the correct level. As a practical matter, this will not be a problem. The grandmaster will simply use his vastly superior playing skills to compensate for his lack of ability to predict what strange tactics the novice will employ. He will still exploit weaknesses easily.

In the other direction, the novice player would do well to predict a grandmaster-level defence against his attack. However, this information will not be available. So it seems as though there are difficulties in becoming the O when the O has significantly different levels of relevant ability.

These difficulties seem less marked when considering information asymmetry. This is because information asymmetries are ubiquitous in everyday life. They occur both between S and O and between the same S at different times. Step changes in ability in a single S are either much less frequent or indeed never seen; outside of perhaps some unusual pathologies.

Only Grandmasters Can Simulate Grandmasters

This challenge seems equally strong on both the replication and the transformation views. If S lacks the ability to become a chess grandmaster, then S also lacks the ability to become like one, in terms of ability at least. S has however no difficulty simulating information asymmetries between S and anyone else. That’s true since this is generally not related to ability differences.

Photo by Gladson Xavier on Pexels.com

However, we need to remember what the challenge is, exactly. It seems to be demanding to know what is meant by becoming someone else. I have sketched out above what this might mean. Then Stich and Nichols can say that on the above outline, it looks as though Simulation Theory provides a picture on which ToM will fail to produce accurate predictions. That will happen when S lacks some of the relevant abilities or disabilities of O. S will perhaps be more successful when the differences between S and O are those of information asymmetry. Fine: there are systematic errors in ToM. These will need explaining; I will do this in later work.

Simulation Theory(Replication) Involves Impossible Ascriptions

One logical objection brought against Goldman by Olson and Astington is fairly easy for Goldman to deal with. The objection is to charge that Goldman

“argues that the ascription of beliefs to others is done by simulating the other’s state on the basis of one’s own. But […] the only definitive evidence for ascribing belief occurs in the case of ascribing false belief. Yet one’s own beliefs are never introspectively available as false beliefs, so how could false beliefs ever be ascribed to others? That is, how could one see in another what was never experienced in one’s self?”

(Olson and Astington [p. 65]{Olson93}).

No One Experiences False Belief

What do Olson and Astington mean by the surprising claim that no one ever experiences their own false belief? They mean that very quickly on discovering conclusive evidence for the falsity of a belief, we will change that belief such that it is no longer false. Or more precisely, we will eliminate the previous belief since it has been falsified. We will replace it with its negation that is a new true belief. So it is true that we never have current experience of a belief that is false now. That of course, is not what Simulation Theory needs.

The above line does not address some situations of cognitive bias. For example, some people continue to believe for example that Brexit is a good idea. That is despite overwhelming evidence to the contrary. That is lack of adequate processing of evidence. The Brexit voters continue to have the false belief that Brexit is a good idea.

It is only true that we have no experience of our own false beliefs if it is true that we have no experience of our beliefs changing. This is because both of those scenarios require only that we have an ability to use memory with some non-zero accuracy to compare our current belief states with our previous ones. We can see then that Introspectionist ST(Replication) needs such a memory capacity. It is though not committed to the claim that it must always function correctly.

Simulation Theory Cannot Account for Some Developmental Data

Stich and Nichols claim some developmental data can be explained by TT but not Simulation Theory. In developing a response to this objection, we may also learn more about the differences between TT and ST. The data in question derive from a variant of the false belief tests. The experimenters ask children about the beliefs of another child sitting in front of them about the contents of a box. The box is closed. The other child may have either looked in the box or been told what is in it.

The first child will be good at answering correctly that the other child knows what is in the box when the other child has looked in the box. But younger children are bad at answering correctly when the other has child knows what is in the box. Older children — five and up — are good at both tasks. They know that if you see what is in the box, you know what is in the box, but they also know that you know if you are told what is in the box.

Folk Psychology And Simulation Theory

https://plato.stanford.edu/entries/folkpsych-theory/

Stich and Nichols claim that these data are consistent with TT but not with ST. They write that “as children get older, they master more and more of the principles of folk psychology” (Stich and Nichols [p. 262]{Stich93}). However, they say, while it is clear that even the younger children “form beliefs as the result of perception, verbally provided information, and inference” (Stich and Nichols [p. 262]{Stich93}) they do not have the latter two routes to assessing the beliefs of others.

Thus they are not using their own minds to simulate others, thus ST is false, according to Stich and Nichols. Of course, Stich and Nichols can’t have this conclusion. They can claim that these data show that younger children are unable to use all of the capacities available to them to form their own beliefs when simulating others. Their ToM is to that extent immature. Since Stich and Nichols allow that three-year olds have immature ToM, these data do not weigh one way or the other in the TT vs ST debate.

Maturation And Simulation Theory

We might on this picture suppose that the way ST abilities develop as the child matures is that more of the routes to knowledge that the child uses become available for the simulation as maturation proceeds. Perhaps exactly that just is the development in question.

There is a particular time course of development of these capabilities in the case of the child’s own beliefs. There is no reason to presume that the arrival of abilities to form knowledge from perception, testimony and inference are all simultaneous. So one would expect the same as the child’s abilities to simulate develop. This is exactly what we find. Empirical studies confirm that different ToM component abilities develop at different times.

As Farrant et al confirm, “[c]hildren typically pass the diverse desires task first, followed by the diverse beliefs, knowledge access, contents false belief, and real–apparent emotion tasks in that order” (Farrant et al [p. 1845]{Farrant06}). ST isn’t committed to anything by these data. But if it assumes that maturation means the child can bring more of its own abilities to bear when simulating others, ST will to that extent find empirical support.

See Also:

#Proust: An Argument For #SimulationTheory

What Is “Theory Of Mind?”