Music is a uniquely human activity, but much remains unknown about how it is perceived in the human brain and its computational basis. A new study indicates that not only does the human auditory cortex respond selectively to music compared to speech, but it is mediated by neuronal subpopulations that respond specifically to different types of music, including a subset for song.
Earlier neuroimaging studies indicate that music is represented differently from other types of sound in the human non-primary auditory cortex. Non-primary voxels have been observed that are distinctly specific for music, supported by the use of functional magnetic resonance imaging (fMRI).
The current study, published in the journal Current Biology, focused on the neural representation of music and natural sounds, using a form of neuroimaging called electrocorticography (ECoG). This refers to a visual representation of intracranial recordings from the brain. The advantage of this approach is the enhanced spatiotemporal resolution compared to non-invasive methods.
What did the study show?
The researchers used a set of 165 natural sounds with an algorithm that could decompose them into their components. In addition, they exploited a large dataset of fMRI responses to the same sounds from 30 subjects who had undergone almost 90 scans, lasting two hours each. This filled in the gaps in ECoG coverage.
The analysis revealed multiple components like tonotopic frequency selectivity, spatially organized onset responses, and selective responses to speech, music, and vocalizations. By correlating the fMRI and ECoG maps, they found increased reliability for the former due to the increased coverage and number of subjects. However, overall there was a close match between the maps, indicating that this cross-correlation is a useful way to utilize the precision of ECoG with the spatial coverage of fMRI.
They found two components, termed C1 and C15, that responded almost only to speech, native or foreign, and thus without linguistic-driven selectivity. They pick up certain features of speech that have particular frequency spectra, such as phonemes spoken at low frequency vs. fricatives spoken at high frequency.
The C10 component showed a marked response to instrumental music as well as music with singing. The limited coverage of the area where music, but not speech, is selectively perceived might have affected the ability to distinguish the two with this model completely.
A new finding was that there was a highly specific component, termed C11, for music with singing, indicating that the human brain perceives song using a particular subset of neurons. Every stimulus that contained music with singing evoked a strong response, but not other sounds, even instrumental music or speech. This indicates that the selectivity was not simply due to the summation of selectivity for speech and music, as expected by the model parameters.
Further analysis showed that there were components that responded in binary fashion to speech, music, and song, further supporting the presence of a nonlinear response to song. This component showed no response to speech or to voice sounds, showing that music selectivity is different from that to speech or voice.
Further, when compared to synthetic sounds matched for modulation, C11 showed a response only to natural song, ignoring natural speech, natural instrumental music, and modulation-matched song. This shows that frequency and modulation alone cannot explain how the brain selectively responds to speech, music, and song.
What are the implications?
The findings of this study indicate that there are several different neuronal subpopulations that respond selectively to different types of musical sound, and one of these responds exclusively to sung music. The use of an innovative decomposition algorithm helped inferences relating to ECoG response components, coupled with fMRI to increase the spatial coverage of each component with greater reliability.
Our findings provide the first evidence for a neural population specifically involved in the perception of song.”
Singing is a form of sound production that is different from speech because of its melodic intonation and rhythmicity. It is unlike instrumental music in the voice-specific structure and vocal resonance. The nonlinear integration of multiple differentiating features is a unique capability of the neuronal subsets that responds strongly to song, probably non-primary neurons to which the primary auditory cortical neurons are linked.
Further research can show how and why these neurons are situated between those that are selective for speech and music responses, perhaps via deep neural networks trained to recognize speech and music. These neurons could well be linked to other parts of the brain that are responsible for memory and emotions, explaining why songs can induce strong feelings and evoke old memories.
The song-selective neurons could also interact with motor and premotor regions, which also respond to singing and other music. It is possible that these regions exert feedback on each other.
Moreover, the origin of such areas may be due to experience, especially because it involves reward circuits that can change the interconnected pathways in the auditory cortex over the long term. Such experience need not be connected with personal music training but could be simply due to a lifetime of listening to music and song. Many unanswered questions remain, however, as to how and why these neurons arose.
It is well-known that we remember words set to music better than instrumental music alone, perhaps because of the greater salience of the former. It could be that this allows for more specific representations in high-level sensory regions.
Relatively small areas of the brain harbor these highly music-selective neurons, indicating that high spatial resolution is essential when using electrodes to detect such neurons. This may be the case with voice and speech selectivity also, where each of the selective components identified in this study did not respond to stimuli that resulted in a response from the other components.
The researchers summed up:
Component modeling provides a way to (1) infer prominent response patterns, (2) suggest novel hypotheses, and (3) disentangle spatially overlapping responses. Our results illustrate each of these benefits. We uncovered a novel form of music selectivity (song selectivity) that we did not a priori expect. And the song-selective component showed clearer selectivity for singing than that present in individual electrodes.”
Further research may home in to answer questions as to how this selectivity is gained, by the melody or rhythm or by the structure of the intonation at note-level, and how best to describe and identify this computationally. This will help better understand the neural encoding for music.