AUSTIN, Texas -- A classicist, biologist and computer scientist all walk into a room -- what comes next isn't the punchline but a new method to analyze relationships among ancient Latin and Greek texts, developed in part by researchers from The University of Texas at Austin.
Their work, referred to as quantitative criticism, is highlighted in a study published in the Proceedings of the National Academy of Sciences. The paper identifies subtle literary patterns in order to map relationships between texts and more broadly to trace the cultural evolution of literature.
"As scholars of the humanities well know, literature is a system within which texts bear a multitude of relationships to one another. Understanding what is distinctive about one text entails knowing how it fits within that system," said Pramit Chaudhuri, associate professor in the Department of Classics at UT Austin. "Our work seeks to harness the power of quantification and computation to describe those relationships at macro and micro levels not easily achieved by conventional reading alone."
In the study, the researchers create literary profiles based on stylometric features, such as word usage, punctuation and sentence structure, and use techniques from machine learning to understand these complex datasets. Taking a computational approach enables the discovery of small but important characteristics that distinguish one work from another -- a process that could require years using manual counting methods.
"One aspect of the technical novelty of our work lies in the unusual types of literary features studied," Chaudhuri said. "Much computational text analysis focuses on words, but there are many other important hallmarks of style, such as sound, rhythm and syntax."
Another component of their work builds on Matthew Jockers' literary "macroanalysis," which uses machine learning to identify stylistic signatures of particular genres within a large body of English literature. Implementing related approaches, Chaudhuri and his colleagues have begun to trace the evolution of Latin prose style, providing new, quantitative evidence for the sweeping impact of writers such as Caesar and Livy on the subsequent development of Roman prose literature.
"There is a growing appreciation that culture evolves and that language can be studied as a cultural artifact, but there has been less research focused specifically on the cultural evolution of literature," said the study's lead author Joseph Dexter, a Ph.D. candidate in systems biology at Harvard University. "Working in the area of classics offers two advantages: the literary tradition is a long and influential one well served by digital resources, and classical scholarship maintains a strong interest in close linguistic study of literature."
Unusually for a publication in a science journal, the paper contains several examples of the types of more speculative literary reading enabled by the quantitative methods introduced. The authors discuss the poetic use of rhyming sounds for emphasis and of particular vocabulary to evoke mood, among other literary features.
"Computation has long been employed for attribution and dating of literary works, problems that are unambiguous in scope and invite binary or numerical answers," Dexter said. "The recent explosion of interest in the digital humanities, however, has led to the key insight that similar computational methods can be repurposed to address questions of literary significance and style, which are often more ambiguous and open ended. For our group, this humanist work of criticism is just as important as quantitative methods and data."
The paper is the work of the Quantitative Criticism Lab, co-directed by Chaudhuri and Dexter in collaboration with researchers from several other institutions. It is funded in part by a 2016 National Endowment for the Humanities grant and the Andrew W. Mellon Foundation New Directions Fellowship, awarded in 2016 to Chaudhuri to further his education in statistics and biology. Chaudhuri was one of 12 scholars selected for the award, which provides humanities researchers the opportunity to train outside of their own area of special interest with a larger goal of bridging the humanities and social sciences.
Dexter and Chaudhuri are co-corresponding authors on the paper, and the co-first authors are Dexter, Theodore Katz, Nilesh Tripuraneni, and Tathagata Dasgupta. Other authors at UT Austin are Adriana Casarez, a graduate student in the School of Information, and Ayelet Haimson Lushkov, associate professor of classics. Ajay Kannan, James A. Brofos, Jorge A. Bonilla Lopez, Lea Schroeder and Maxim Rabinovich are also authors on the paper.
Proceedings of the National Academy of Sciences