News Release

Study: Complexity holds steady as writing systems evolve

Peer-Reviewed Publication

Santa Fe Institute

Explainer: What Determines the Complexity of Writing Systems?

video: How do writing systems change over time -- and what forces drive that evolution?

Santa Fe Institute Fellow Helena Miton and Olivier Morin at the Max Planck Institute for the Science of Human History recently used computer-aided methods to test the conclusions of previous research into the complexity of scripts and characters.

Prior work suggests two pressures that may push writing to get simpler over time: 1) More complex characters require more effort to distinguish from each other (making them more difficult to copy without error); and 2) Figures with more strokes require more movement to create. In short, simple letters are not only easier to read, but easier to make -- up to some lower bound at which characters are too simple to tell apart. Also, cultural transmission research shows that in lab experiments, complex drawings simplify as they are copied. Conufusingly, previous studies found both a universal average of three strokes per character AND an increase in complexity as the inventory of scripts grow.

In their new paper in the journal Cognition, Miton and Morin offer several hypotheses to test with their analysis:

1 - Scripts with more characters will have more complex symbols.
2 - Most variance in character complexity between scripts is caused by the script (rather than the type of script).
3 - Parentless, newly-invented (or idiosyncratic) scripts are more complex than ancient scripts.
4 - Parent scripts have more complex characters than their offspring.
5 - Contrary to earlier psychology experiments that suggest the lefthand side of characters is always simpler, the authors suspect that characters are more complex on the side that comes first when read and written (i.e., on the right if the language is read right to left).

Their data set they analyzed comprises nearly 48,000 characters from 133 scripts. Miton and Morin measured two kinds of complexity: perimetric complexity, the ratio of a given character's "coastline" to its area, or how twisty it is; and algorithmic complexity, or how much code is required to store a compressed image file of each character.

The variables they tracked included script sources, size, family, type, direction, and whether or not they were "idiosyncratic" (made up from scratch by known persons less than 200 years ago). The authors predicted idiosyncractic scripts would be less subject to evolutionary pressures and historical constraints. But are they? Likewise, branching events in written language would seem to provide the perfect occasion to "improve" a script by simplifying it. But do they?

Here are their findings:

  • The more characters in a script, the more complex the characters. BUT removing scripts of over 200 characters, mostly East Asian scripts that use logograms instead of letters, voids this result. It turns out that the TYPE of script -- the linguistic units encoded by its characters -- matters more than the script itself.
  • Characters from idiosyncratic scripts are generally no more complex than characters from any other. (But this might be because they quickly simplify, and the study used "late data points.")
  • There is no tendency for scripts to be simpler than their ancestors, suggesting a minimal viable complexity for written language.
  • The first half of a character as it is read or written -- left-first or right-first -- is in fact more complex.

Through the lens of cultural evolution, their results support a growing body of work on the ways in which letter shapes fit subtle cognitive and perceptual biases with roots in the visual and motor constraints on reading and writing. Or, in the words of Albert Einstein, "Everything should be made as simple as possible, but no simpler." view more 

Credit: Michael Garfield/Santa Fe Institute

A new paper in the journal Cognition examines the visual complexity of written language and how that complexity has evolved.

Using computational techniques to analyze more than 47,000 different characters from 133 living and extinct scripts, co-authors Helena Miton of the Santa Fe Institute and Oliver Morin of the Max Planck Institute for the Science of Human History, addressed several questions around why and how the characters of different writing systems vary in how complex they appear.

"When we started this project, we wanted to test whether you find a general simplification of characters over time," Miton says. "Do scripts simplify their characters as they spend more time exposed to evolutionary pressures from the humans who are learning them and using them?"

We interact with most types of writing through our visual system, so the characters and scripts that make up the hundreds of writing systems humans have used through history are limited to, and optimized for, the way our brains process visual information. Part of that optimization, write the authors, is the graphic complexity of the characters in a script.

Morin illustrates this in a Twitter thread, offering an image of two characters, one apparently more complex, with more detail and contours, than the other. He writes, "Why care about this? Because your brain does. Simpler letters are easier and faster to process." He goes on, "Any small improvements in processing speed can accumulate into big-time gains for readers. Letters are under pressure to simplify, but also have to carry information."

A highly cited study from 2005 suggests that writing systems tend to settle on a common solution to these pressures: using about three strokes per character. In this new paper, Miton and Morin push back against that finding, and others, by studying a larger and broader set of scripts and incorporating new methods that account for cultural evolution and lineages in writing.

Miton and Morin used two measures of graphic complexity to compare characters and scripts from the massive dataset drawn from geographic locations around the world. The first measure, "perimetric" complexity, is a ratio of inked surface to its perimeter. The other measure, "algorithmic," is the number of bytes needed to store a compressed image of a character.

Among their results, they found that large scripts -- those with more than 200 characters -- had, on average, more complex characters than scripts with a smaller number of characters. Relatedly, the study suggests that the main driver of characters' complexity was which linguistic units (e.g., phoneme, syllable, entire word, etc.) the characters encode.  

They were surprised to find little evidence for evolutionary change in complexity: scripts that were invented in the past 200 years used characters of similar complexity to those that have been around for longer.  In forthcoming work led by Piers Kelly, Miton and Morin investigate whether written characters follow an optimization process that happens more quickly than was captured in the current study's dataset.

###


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.