Lengthened consonants mark the beginning of words

- EN - DE

A new study shows that word-initial consonants are systematically lengthened across a diverse sample of languages

An example of word-initial lengthening in Mojeño Trinitario, an Arawakan languag
An example of word-initial lengthening in Mojeño Trinitario, an Arawakan language spoken in the Amazon region of Bolivia. The word-initial /n/ (100ms) is significantly longer than the /n/ in the word-medial (50ms) and utterance-initial (50ms) positions. © Frederic Blum et al., Nature Human Behaviour (2024)

Speech consists of a continuous stream of acoustic signals, yet humans can segment words from each other with astonishing precision and speed. To find out how this is possible, a team of linguists has analysed durations of consonants at different positions in words and utterances across a diverse sample of languages. They have found that word-initial consonants are, on average, around 13 milliseconds longer than their non-initial counterparts. The diversity of languages for which this effect was found suggests that this might be a species-wide pattern - and one of several key factors for speech perception to distinguish the beginning of words within the stream of speech.

Distinguishing between words is one of the most difficult tasks in decoding spoken language. Yet humans do it effortlessly - even when languages do not seem to clearly mark where one word ends and the next begins. The acoustic cues that aid this process are poorly understood and understudied for the vast majority of the world's languages. Now, for the first time, comparative linguists have observed a pattern of acoustic effects that may serve as a distinct marker across diverse languages: the systematic lengthening of consonants at the beginning of words.

Researchers from the Max Planck Institute for Evolutionary Anthropology, the CNRS Laboratoire Structure et Dynamique des Langues, the Humboldt Universität zu Berlin and the Leibniz-Centre General Linguistics used data from the novel DoReCo corpus , because it combines two features: Firstly, it covers an unprecedented amount of linguistic and cultural diversity of human speech, containing samples from 51 populations from all’inhabited continents. Secondly, it provides precise timing information for each one of the more than one million speech sounds in the corpus. "The world-wide coverage of DoReCo is crucial for uncovering species-wide patterns in human speech given the immense cross-linguistic diversity of languages," says senior author Frank Seifart, researcher at CNRS in Paris and HU Berlin and co-editor of DoReCo.

Word-initial consonant lengthening - a potential universal?

"At the outset, we expected to find evidence contradicting the hypothesis that word-initial lengthening is a universal linguistic trait. We were quite surprised when we saw the results of our analysis," says first author Frederic Blum, a doctoral candidate at the Max Planck Institute for Evolutionary Anthropology, who initiated and led the study. "The results suggest that this phenomenon is indeed common to most of the world's languages." Strong evidence of lengthening was found in 43 of the 51 languages in the sample. The results were inconclusive for the remaining eight languages.

The authors conclude that lengthening may be one of several factors that help listeners identify word boundaries and thus segment speech into distinct words - along with other factors, such as articulatory strengthening, which has not been comparatively studied in detail so far. In the current study, some languages additionally showed evidence of a shortening effect following pauses at the beginning utterance. This is consistent with the authors' conclusion as there is no need for additional cues for word boundaries in the presence of pauses.

This study advances our understanding of acoustic processes common to all spoken languages. By focusing on non-WEIRD (Western, European, Industrialized, Rich, and Democratic) languages, the researchers hope to broaden our knowledge of cognitive processes related to speech that transcend individual populations.