If you travel to a different part of the world, the richness of a foreign language may be the first thing that strikes you. A new study from researchers at the University of Lyon suggests there may be fewer differences between tongues than you might have thought.
“Languages vary a lot in terms of the information that they pack into a syllable and also in the rate that they are spoken at. But the interesting thing is that the two kind of balance each other, so that more information dense languages are spoken slower, and those that are less informationally heavy are spoken faster. This means that there is a steady information rate that is very similar among languages,” says study co-author Dan Dediu, a researcher at Lyon’s Laboratoire Dynamique du Langage.
The battle for a universal constant
In trying to find a “universal” constant for language, Dediu’s team faced quite a battle. There are over 7000 different languages, and there are very few characteristics that connect all of them. This even extends to basic measures of how information is encoded in words. For example, the amount of syllables per word varies greatly between languages, meaning that the Shannon information rate (see grey box) varies as well. However, Dediu and his team had the insight to take into account not just the words, but the rate at which they are spoken.
Dediu and colleagues used recordings taken from 170 native adult speakers of 17 different languages across Europe and Asia. Each speaker was tasked with reading a set of 15 chunks of text, consisting of roughly 240,000 syllables.
Claude Shannon, a researcher at Bell Labs, made a huge contribution to information technology when he formulated his theory of information in a seminal paper in the 1940s. The gist of Shannon’s work was that information could be expressed as discrete binary values, which he called bits. This meant that the noise produced by long-distance communication could be silenced by rounding the distortion up or down to 1 or 0. Applying this theory to language, Shannon showed that different languages have their own level of redundance. English is sometimes said to have a 50% level of redundance, meaning half the letters in a given sentence could be removed whilst preserving meaning.
How many syllables in a second?
The researchers chose the syllable as their singular unit of information. This was adopted over two other options (this is quite a controversial subject in linguistic informatics, as it turns out):
- Phonemes – units of sound which help us separate out individual words – were excluded as Dediu’s team realized they could be easily omitted in speech
- Words – these were seen as being too language specific for easy comparison
Armed with a data set and a metric, the scientists examined their results. They revealed some interesting differences between our world’s languages:
- The number of distinct syllables in English is nearly 7000, but just a few hundred in Japanese
- Speech rate varied from 4.3 syllables up to 9.1 syllables per second
- Vowel harmony (a fascinating linguistic innovation that requires suffixes to be “in harmony” with the word they attach to) was present in four of the languages
In short, the languages sounded pretty darn different.
Despite this, Dediu's team noted that the information rate, which takes into account the speech rate and information density of the written text, was roughly consistent across all the languages recorded; information-rich text was read more slowly, whilst information-light languages were spoken faster.
Language as a gingerbread reindeer: the two B/W versions use different resolutions and number of gray levels but encode the same info, just as languages trade off different strategies but are equally efficient. Credit: Dan Dediu, Université Lumière Lyon 2
The researchers were able to settle on a number – 39.15 bits/s – as an average information rate over the 17 languages. There were some interesting variations – for example, female speakers had a lower speech and information rate.
The team showed that the differences in the written text made little difference to the information rate, suggesting that the results could be generalized beyond the text-based study conducted here. The speech rate and syllable number were significantly more variable than the information rate, cementing the latter as a valid cross-lingual connector.
What does this mean for our brain?
The authors suggest that the findings mean that information rate has to stabilize around a tight mean, as too high rates would impede the brain’s ability to process data and articulate speech clearly. On the other hand, a low information rate would require the retention of far too many words for the brain to remember before meaning could be extracted.
This highlights the dual role which language has to play, which Dediu sums up: “There are two sides to the coin when it comes to language – one is the cultural and the other biological, and when one changes - say a language becomes more informationally dense - the other reacts - its speakers start speaking it slower.”