Examining the effects of audiovisual associations on motion perception through task-based fMRI
Hulusi Kafaligonul
Article No: 1   Article Type :  Research
Objective: Previous studies showed that associative learning can lead to drastic changes in perceptual experience and unexpected levels of sensory plasticity in the adult brain. However, how associative learning is involved in shaping perception and the underlying neural mechanisms are quite poorly understood. In the current study, by taking advantage of well-studied visual motion-processing hierarchy, the roles of different brain areas in audiovisual association-induced changes in motion perception are investigated.

Method: Using a previously developed audiovisual associative paradigm, behavioral and Blood Oxygen Level Dependent (BOLD) data were collected from adult human participants (n=13) before and after the association phase. Behavioral data were collected through reports on visual motion direction. Functional magnetic resonance imaging (fMRI) was based on block design and the functional data were analyzed according to a general linear model.

Results: Audiovisual associations, acquired within a short time and without any feedback, significantly affected the perception of motion direction. This effect was much more salient when the physical direction of visual motion was ambiguous. Moreover, fMRI findings pointed out that the BOLD activities across different cortical regions changed after the associative phase.

Conclusion: Taken together, these findings indicate that low-level sensory, multisensory and high-level cognitive areas play a role in the effects of audiovisual associations on motion perception. In general, this suggests that our prior experiences acquired through associations may affect perceptual processing at different hierarchical levels and over different cortical areas.
Keywords : Auditory perception, associative learning, fMRI, motion perception, visual perception
Dusunen Adam : The Journal of Psychiatry and Neurological Sciences : 2018;31:125-134
Full Text:


Visual perception is shaped both by processes spreading from low to high-level regions, beginning at the receptors being triggered by stimuli, and by our experiences obtained by learning from the external world (1). Until recently, a number of studies focused on neuronal mechanisms triggered by stimuli, whereas little importance was given to the contribution of previous experiences on visual perception and the neuronal mechanisms upon which this contribution is based. Recent studies have shown that learned associations between stimuli significantly change and shape our visual perception. Many studies that have been done from this perspective regard visual motion perception.

Associative learning is the semantic and meaningful correlation of two stimuli that cause temporally close or identical results (2-4). For example, if two visual stimuli are always seen together, over time these two stimuli will be associated and providing one of them will bring the other stimulus to mind and evoke its mental image. This shows that a learning process significantly affects visual motion perception. When stationary arrows or colors that do not contain any information on motion are provided continuously with specific motion directions, these stimuli (stationary arrows or colors) are being associated with specific directions of motion. It has been observed that subsequent provision of stimuli without information about physical motion in situations with very ambiguous and limited direction of motion and information about this motion affects the sensation of motion and perception of direction in the learned sense (5,6). While initially it had been thought that this type of learning effects impacted on high-level decision-making mechanism and only affected high-level cortex areas, recent studies have shown that neuronal activities in low-level visual motion areas and direction tuning consistently change with learning-based perceptual changes (1).

While these studies clearly demonstrate that the sensorial plasticity generated by associative learning also encompasses the low-level regions of the cortex, they only focused on information obtained from the visual modality. Considering that learning in the natural environment includes more than one modality (e.g., visual and auditory), we may think that this kind of approaches to the stimulation of natural learning mechanisms in the brain can be insufficient, while multisensory paradigms may be more effective for researching sensorial plasticity (7). Starting from this premise, interest in multisensory associative learning studies has increased and produced more results, supporting this general approach and showing that multisensory association has even greater and more dramatic effects on visual perception. The most interesting approach regards motion perception. Static auditory tones with no physical motion information can be correlated with specific directions of motion through associative learning in a short learning session (duration of 3-10 minutes) without any feedback. Subsequently it was shown that static tones significantly affected the perceived direction of motion and sensitivity even evoked motion perception in bodies that moved staying in the same spatial location (8,9). The effects of the tones acquired by associative learning lasted for some days. These studies are very important because they show that physical auditory stimuli not only play a role in shaping visual motion perception, but the meaning that the auditory stimuli acquired by associative learning also have an important role in forming motion perception.

While studies using visual motions thought to evoke motion mechanisms at various levels selectively suggest that these effects also involve low-level cortical areas, the basic mechanisms of these effects are not quite understood and the neuronal correlates are unknown. This original study, using behavioral and functional magnetic resonance imaging (fMRI) techniques, focuses on how multisensory effects shape final perception and on the detection of cortical areas involved in these effects.


Thirteen volunteers (9 female and 4 male, age range: 21-27 years) participated in our study. All of the participants were naive to of the purpose of the study and the hypotheses to be tested. The participants had normal hearing and normal or corrected vision. None of them had any history of neurological disease or disorder. At the beginning of the study, informed consent forms were received from the participants, and they confirmed that they were taking part in the study of their own will. Experimental procedures, data collection, and confidentiality measures were applied according to international standards (Declaration of Helsinki, 1964), and approval from the Ethics Committee of Ankara University’s Medical Faculty was obtained.

Behavioral (Psychophysical) Method and


Behavioral measurements and grading were carried out at the Psychophysics Laboratory of the National Magnetic Resonance Research Center (UMRAM). Setup and equipment in the laboratory were optimized for carrying out a large number of visual and behavioral experiments. Visual stimuli were given via a 21” LCD monitor (NEC MultiSync 2190UXp, 1600x1200 screen resolution and 60Hz temporal resolution) and auditory stimuli through a headset (Sennheiser HD 518). Calibration of visual and auditory stimuli was carried out using a photometer (SpectroCAL) and sound level meter (SL-4010 Lutron). Administration and timing of the stimuli, adjustment of experimental conditions for the stimulus sequence in each session, and recording of behavioral data were done with MATLAB (The MathWorks, Natick, MA) Psychtoolbox 3.0 (10,11).

The test protocol consisted of five separate steps (Figure 1A). The behavioral sections included three phases: pre-association phase test, association phase, and post-phase test. In the association phase, the participants watched stimuli consisting of dots (dimension 0.2°, 75.80cd/m2) moving 100% in the same direction (upwards or downwards) accompanied with tones of various frequencies. Those moving dots were shown through a circular opening 5° wide. The center of this opening was 5° to the left of the red focus point in the middle of the screen on the horizontal plane, and an intensity of 3.5pt/deg2 was used. Half of the participants heard a low-frequency tone (500Hz, 83dB SPL) when watching upward movement and a high-frequency tone (2000Hz, 83dB SPL) with downward motion. For the other half of the participants, the opposite combination was used. For the purpose of the test, the auditory stimulus accompanying the upward movement was named Tone A and the stimulus going with downward movement Tone B (Figure 1B). Each presentation lasted for 1 second, and each association session consisted of 200 presentations. Including the intervals between tests, each association phase session lasted between 6 and 12 minutes. Each participant completed between 2 and 4 association phase sessions. During the sessions, the participants concentrated on the red focal point in the center of the screen and passively watched the moving bodies and listened to the tones. They had no tasks and the participants were not given any feedback about the task. However, they were requested to pay attention to visual as well as auditory stimuli during the sessions. Several studies, including some carried out by our own research group, have shown that even this kind of short-term sessions with no tasks given to the participants can trigger audiovisual associations and significantly influence subsequent perception (12,13).

The pre- and post-association test phases (experimental sessions) were exactly identical. In these experimental sessions, participants observed moving stimuli consisting of random dots and listened to tones, and after each presentation they recorded by pressing a key on the keyboard if the moving stimuli had been directed upwards or downwards (two-alternative forced choice method). In addition, participants were asked to pay attention to the moving stimuli as well as to the auditory stimuli. For the random dot motion consistency, 6 different test conditions (5, 15, 30, 45, 60, 90%) were defined (Figure 1C). The random direction of the moving dots was determined according to a white noise algorithm (14).

Neuroimaging Technique and Procedures

Magnetic resonance imaging (MRI) was used for neuroimaging. The two steps of the test protocol consisted of MR imaging (Figure 1A). Pre- and post-association phase imaging phases and experimental sessions were completely identical. After anatomical recording, we went directly to functional imaging (fMRI). During these sessions, participants were requested to concentrate on the red dot in the center of the screen shown via a mirror device, observe the stimuli coming out on the left side attentively and listen to the auditory tones carefully. For the visual stimuli, a consistency of 5% was used. Tone A and tone B were administered for one second each repeatedly over 12 seconds simultaneously with the visual stimuli. As we used a task-based fMRI block design, instead of the moving visual stimuli, over 12 seconds stationary stimuli without auditory tones were shown for the remainder of the block. Thus each block consisted of 24 seconds and there were blocks/sessions consisting only of auditory and visual stimuli. Apart from these differences, the other stimulus parameters and test procedures were identical with the behavioral methods.

Image Collection Device and Apparatus: A 3-Tesla MR scanner (Siemens Intera Achieva 3T) was used to record high-resolution anatomical images and Blood Oxygen Level Dependent (BOLD) signals. The entire brain volume was captured with a 12-canal whole-body (birdcage) radio frequency (RF) head coil. To minimize head movements, a vacuum cushion and head positioning cushions at the edge of the head coil were used. Presentation of the visual stimuli was made on an MR-compatible 32” LCD screen (TELEMED System, 1366x768 screen resolution and 60Hz temporal resolution) and a specially developed device (Aref Medical) mounted on the head coil with the help of a mirror placed in front of the participants’ eyes. Auditory stimuli were presented with MR-compatible headsets (Troyka Medical). In addition, the intensity of the auditory stimuli was increased in order to eliminate the possible effects of the sound from the MRI machine.

Acquisition of Anatomical Images: Deviating from the UMRAM standard sequences, we developed structural scan sequences optimizing the distinction between white matter and gray matter. At the beginning of the anatomical imaging, for all participants three vertical slice scans were made to determine local windows. The parameters used in the three-dimensional, T1-weighted, high-resolution, single turbo FLASH sessions were as follows: voxel size=1x1x1mm; repetition time (TR)=7.982ms, echo time (TE)=3.68ms, field of vision (FOV)=256x256mm, matrix size=256x256x176, number of sections=176.

Functional MR Imaging: Functional MR Imaging (fMRI) measures the BOLD signal related to the activation of group neurons in a local population. While the BOLD signal measured by fMRI does not show the activity of a single neuron, it indicates the net result of local activity, and BOLD activity changes are quite highly correlated with perceptive and cognitive performance values. While the participants performed their task, BOLD values were obtained with a T2-weighted standard EPI sequence (28 slices taken with increasing intersection design, TR=2000ms, TE=35ms, slice thickness=3mm, FOV=240x131.5mm, matrix size=80x78x28, voxel size=3x3x3mm).

MR Data Analysis: All image data were saved as DICOM (.ima) files and later converted to NIFTI (.nii) format for subsequent analysis with FSL. Conversion was carried out using the (dcm2nii) MRIcron package. For preprocessing and statistical analysis of all images, FMRIB (FSL) software tools were used (15-17). Preprocessing consisted of the following steps: Brain extraction by BET (FSL) software, motion correction using MCFLIRT (FSL), spatial smoothing with Gauss filter, slice time correction, and temporal filtering. Subsequently, the low-resolution functional images for each participant were registered in a linear or non-linear way on the high-resolution anatomical images or on the Montreal Neurological Institute (MNI) space (18-20).

To determine the active areas changing BOLD activity according to different stimuli and test conditions, modelling and statistical analyses based on the General Linear Model (GLM) were carried out. Individual data processing, calculation of group averages, and comparison of data pre- and post-learning were carried out using FEAT (FSL) software and data processing package (21).


Behavioral Results

Figure 2 shows the average rate of tests where participants in the pre- and post-association phase sessions saw upward motion. In general, those rates change with the direction of physical motion and its consistency. In other words, when the consistency of direction is high and downwards, this rate comes close to zero, while on the other hand, when the consistency of direction is high and upwards, the rate approaches 100%. What is interesting from the perspective of our research is that these rates differ before and after the association phase. We found that the static tones correlated with upward and downward movement during the association phase had a significant effect on the subsequent direction perception. In particular in ambiguous movements with a consistency of direction below 30%, tones correlated with upward direction caused a bias towards upwards and tones correlated with downwards towards downward perception. Two-way ANOVA analyses for values in this interval of physical movement being rendered ambiguous found a significant effect of the tone (F[1,12]=7.59, p<0.05), and the interaction with the tone association phase was significant (F[1,12]=7.35, p<0.05). Subsequent post-hoc analyses found no tone effect pre-association phase (F[1,12]=0.02, p=0.886), while the tones only had an effect on the direction-related behavioral responses post-association phase (F[1,12]=16.34, p<0.01). After the association phase, in motions with a consistency of direction below 20% the difference between the two tone conditions was always found to be significant (confirmed t test, p<0.05).

Changes in BOLD Activities

For the fMRI recordings, we used a direction consistency of 5%, where the association phase had the greatest impact on the direction of motion. While inside the MR machine, the participants focused on a red spot in the middle of the screen (via the mirror contraption mounted on to the head coil) and for 12 seconds observed moving dots in the left section of the screen with a direction consistency of 5% together with the tone; for the next 12 seconds, they only saw static dots. Each block, before and after the association phase, was administered this way, with the physical stimuli being entirely identical. Figure 3 shows a significant change in the BOLD activity in relation to the 5% motion and tones [p<0.05 confirmed according to spatial aggregation; for detailed statistical method, see (21)] according to the static section and stimulus (as a basis). Pre-association phase mean group values, moving stimuli with a consistency of 5%, and auditory tones created activation in a relatively small area in the right visual area. In addition, these stimuli caused greater changes in the BOLD activation in the auditory cortex and in high-level multisensory (audiovisual) areas.

Looking at the mean group activations post-association phase, changes in the right visual area appear to be more robust, involving more the spatial area. In addition, increases are seen in the main and second-level auditory cortex area. Furthermore, aggregated voxel areas in the high-level multisensory perceptual and cognitive areas are also more robust and larger (aggregated area volumes before learning: 10852, 967, 412, 409 voxel; after learning: 19996, 2782, 1271, 426 voxel).

Group analyses for each participant before and after the association phase showed that these changes were significant in the high-level areas of the left hemisphere (Broca’s area). In addition, analyses and tests we made in regions of interest (ROI) defined according to the MNI space showed that these changes are also relevant and significant in low-level areas. For example, changes in the low-level visual areas of the right hemisphere (V1-V3) and in areas V5+/MT+ (Figure 3: sections indicated by a green arrow) increased significantly (p<0.05) after association phase compared to pre-association. Similarly, low-level auditory areas (primary auditory area A1) and multisensory areas (inferior parietal lobe IPL, Broca’s area) showed significant changes in both hemispheres (Figure 3: sections indicated by a blue arrow).


The question how high-level processes like associative learning shape perception and in which cortical areas (low-level or high-level) they cause sensory plasticity is one of the most active topics for systemic neuroscientists. The hierarchy of visual motion provides them with an excellent research framework. The motion areas on the dorsal systemic pathway (such as primary visual area V1, visual area V3, medial temporal area MT, medial superior temporal area MST, lateral intraparietal cortex LIP, intraparietal sulcus IPS, and inferior parietal lobe IPL) are heavily involved and have been functionally characterized (22,23). Benefiting from this advantage, associative learning trials have been carried out on the MT region, which is of crucial importance to motion perception while not being thought to be affected by high-level cognitive processes (5). In these studies, researcher trained macaques to associate static directional arrows with the direction of moving objects. In addition, they characterized individual nerves in the visual field area MT before and after the associative learning formation. They found that after association learning, MT neurons responded in an interesting way to static arrows, as if these were actually moving, showing a significant degree of plasticity. In more recent studies on LIP, cells in that region displayed a similar feature, being sensitive not only to correlation with motion but also to other associations (24). Importantly these findings demonstrate that modifications due to associative learning are not limited to the high-level regions of the cortex, but may also cause changes in low-level sensory areas. However, these studies focused on learning mechanisms in a single modality (vision). In our daily life, many of the stimuli we receive include more than one modality, being multisensory. Therefore, a monosensory approach to stimulating learning mechanisms is insufficient. For this reason, recently the interest in multisensory associative learning studies has grown. These studies support the general approach, showing that multisensory associations have a dramatic effect on perception. Many of the studies carried out in this context regard motion perception. It was observed that static audio tones with no indication of motion, having been correlated with particular directions of motion through associative learning, have a significant impact on visual motion perception and sensitivity to motion (8,9). Studies using visual motion, thought to stimulate motion mechanisms at various levels selectively, emphasize that these effects may also include low-level cortical areas (12,13).

Behavioral and BOLD data obtained in our study support previous research findings. First of all, as in earlier studies, the behavioral data show that the audiovisual associations achieved in learning phases without any feedback in a short period of time are able to modify the perception of the direction of motion significantly, particularly in situations with ambiguous visual information. In addition, fMRI data suggest that at the basis of these effects not a single area, but various different regions may be involved. For example, our detailed regional analyses found that activities in the low-level visual (motion regions) and auditory areas increased after audiovisual associative learning. In the same way, changes occurred in the high-level multisensory regions and left frontal areas. Based on these data and information gleaned from the literature, the areas and processes involved in audiovisual associations are briefly summarized in Figure 4.

Figure 4 proposes that the effects of associative learning on low-level sensory areas and the resulting changes may be of two different kinds. Earlier studies indicated a direct anatomic link between low-level visual and auditory areas, allowing for low-level visual and auditory interactions in this way (25-27). Associative learning can change these low-level feedforward interactions and the strength of the links. A recent study examined how audiovisual associations rather than stimulus-inducted activities affected the neural network and its structures during listening (28). The results of that study showed that the link between visual and auditory networks during listening was strengthened further after the phase of audiovisual association, supporting, at least in part, the thesis that interactions between low-level sensory areas can be strengthened. On the other hand, recently it has been hotly debated if low-level visual areas can be affected by high-level non-sensory cognitive areas through feedback links (29-31). The meaning and effect that auditory tones assume may trigger higher-level associations and non-sensory cognitive processes (such as memory) and cause a reinforcement of feedback links in those regions.

Considering the data in their entirety, this study has shown that low-level sensory as well as high-level multisensorial and cognitive areas play a role in the effects of audiovisual associations on motion perception. Generally speaking, these findings suggest that experiences acquired by association can affect perception processes at various hierarchical levels and in different cortex areas. The tests carried out in this study were task-based (stimulus-induced) images in block-design. While we know the regions affected by audiovisual association, there is no detailed information about how links and interactions between these areas are modified. Event-related task-based sensitive tests and dynamic causal modeling analysis may provide information about how these links change after the audiovisual association phase and about the direction of interactions. Future studies based on this design and relevant analyses will be more satisfactory and detailed.

Acknowledgements: I want to thank Fazilet Zeynep Yildirim for her support during data collection.

Informed Consent: Written consent was obtained from the participants.

Peer-review: Externally peer-reviewed.

Conflict of Interest: Authors declared no conflict of interest.

Financial Disclosure: This study was supported by the Scientific and Technological Research Council of Turkey (TUBITAK 112C010).


1.Albright TD. On the perception of probable things: neural substrates of associative memory, imagery, and perception. Neuron 2012; 74:227-245. [CrossRef]

2.Hebb DO. The Organization of Behavior: A Neuropsychological Theory. New York: Wiley, 1949.

3.James W. Principles of Psychology. New York: Henry Holt, 1890.

4.Konorski J. Integrative Activity of the Brain: An Interdisciplinary Approach. Chicago: University of Chicago Press, 1967.

5.Schlack A, Albright TD. Remembering visual motion: neural correlates of associative plasticity and motion recall in cortical area MT. Neuron 2007; 53:881-890. [CrossRef]

6.Schlack A, Vivian V, Albright TD. Altering motion perception by motion – colour pairing. Perception 2007; 36(Suppl.1):52.

7.Shams L, Wozny DR, Kim R, Seitz A. Influences of multisensory experience on subsequent unisensory processing. Front Psychol 2011; 2:264. [CrossRef]

8.Hidaka S, Teramoto W, Kobayashi M, Sugita Y. Sound-contingent visual motion aftereffect. BMC Neuroscience 2011; 12:44. [CrossRef]

9.Teramoto W, Hidaka S, Sugita Y. Sounds move a static visual object. PLoS One 2010; 5:e12255. [CrossRef]

10.Brainard DH. The psychophysics toolbox. Spat Vis 1997; 10:433-436. [CrossRef]

11.Pelli DG. The video toolbox software for visual psychophysics: transforming numbers into movies. Spat Vis 1997; 10:437-442. [CrossRef]

12.Kafaligonul H, Oluk C. Altering perception of low-level visual motion by audiovisual associations. Perception 2014; 43(Suppl.):36.

13.Kafaligonul H, Oluk C. Audiovisual associations alter the perception of low-level visual motion. Front Integr Neurosci 2015; 9:26. [CrossRef]

14.Britten KH, Shadlen MN, Newsome WT, Movshon JA. The analysis of visual motion: a comparison of neuronal and psychophysical performance. J Neurosci 1992; 12:4745-4765. [CrossRef]

15.Jenkinson M, Beckmann CF, Behrens TE, Woolrich MW, Smith SM. FSL. Neuroimage 2012; 62:782-790. [CrossRef]

16.Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TE, Johansen-Berg H, Bannister PR, De Luca M, Drobnjak I, Flitney DE, Niazy RK, Saunders J, Vickers J, Zhang Y, De Stefano N, Brady JM, Matthews PM. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 2004; 23(Suppl.1):208-219. [CrossRef]

17.Woolrich MW, Jbabdi S, Patenaude B, Chappell M, Makni S, Behrens T, Beckmann C, Jenkinson M, Smith SM. Bayesian analysis of neuroimaging data in FSL. Neuroimage 2009; 45(Suppl.1):173-186. [CrossRef]

18.Smith SM. Fast robust automated brain extraction. Hum Brain Mapp 2002; 17:143-155. [CrossRef]

19.Jenkinson M, Bannister P, Brady M, Smith S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage 2002; 17:825-841. [CrossRef]

20.Jenkinson M, Smith S. A global optimisation method for robust affine registration of brain images. Med Image Anal 2001; 5:143-156. [CrossRef]

21.Worsley KJ. Statistical Analysis of Activation Images: In Jezzard P, Matthews PM, Smith SM (editors). Functional Magnetic Resonance Imaging: An Introduction to Methods. Oxford: Oxford University Press, 2001, 251-270. [CrossRef]

22.Claeys KG, Lindsey DT, De Schutter E, Orban GA. A higher order motion region in human inferior parietal lobule: evidence from fMRI. Neuron 2003; 40:631-642. [CrossRef]

23.Ho CS, Giaschi DE. Low- and high-level first-order random-dot kinematograms: evidence from fMRI. Vision Res 2009; 49:1814-1824. [CrossRef]

24.Fitzgerald JK, Freedman DJ, Assad JA. Generalized associative representations in parietal cortex. Nat Neurosci 2011; 14:1075-1079. [CrossRef]

25.Cappe C, Barone P. Heteromodal connections supporting multisensory integration at low levels of cortical processing in the monkey. Eur J Neurosci 2005; 22:2886-2902. [CrossRef]

26.Clavagnier S, Falchier A, Kennedy H. Long-distance feedback projections to area V1: implications for multisensory integration, spatial awareness, and visual consciousness. Cogn Affect Behav Neurosci 2004; 4:117-126. [CrossRef]

27.Falchier A, Clavagnier S, Barone P, Kennedy H. Anatomical evidence of multimodal integration in primate striate cortex. J Neurosci 2002; 22:5749-5759. [CrossRef]

28.Yildirim FZ. Changes in FMRI resting state networks due to audiovisual association induced effects on visual motion perception. Master’s Thesis, Bilkent University, Ankara, 2016.

29.Petro LS, Vizioli L, Muckli L. Contributions of cortical feedback to sensory processing in primary visual cortex. Front Psychol 2014; 5:1223. [CrossRef]

30.Petro LS, Paton AT, Muckli L. Contextual modulation of primary visual cortex by auditory signals. Philos Trans R Soc Lond B Biol Sci 2017; 372:20160104. [CrossRef]

31.Petro LS, Muckli L. The brain’s predictive prowess revealed in primary visual cortex. Proc Natl Acad Sci U S A 2016; 113:1124-1125. [CrossRef]

Creative Commons Lisansı

Dusunen Adam: The Journal of Psychiatry and Neurological Sciences is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Düşünen Adam - Psikiyatri ve Nörolojik Bilimler Dergisi
Bakırköy Prof. Dr. Mazhar Osman Ruh Sağlığı ve Sinir Hastalıkları Eğitim ve Araştırma Hastanesi
Yerküre Tanıtım ve Yayıncılık Hizmetleri A.Ş.