Eight international research funders from four countries today jointly announced the 14 winners of the second Digging Into Data Challenge, a competition to promote innovative humanities and social science research using large-scale data analysis.
Winning teams representing Canada, the Netherlands, the United Kingdom and the United States will receive a total of about about $4.8 million in grants to investigate how data processing, analysis and transmission techniques can be applied to "big data" to change the nature of humanities and social sciences research.
Each team represents collaborations among scholars, scientists and librarians from leading universities worldwide.
Four international funders sponsored the first round of the Digging Into Data Challenge in 2009. That round led to breakthrough projects that received coverage in the New York Times, Nature, the Globe and Mail and Times Higher Education.
"We're excited to continue our involvement in the Digging Into Data Challenge as it has proven an excellent opportunity to leverage our resources through partnering with a number of other agencies, both in the U.S. and abroad," said Elizabeth Tran, an associate program officer for Social, Behavioral and Economic Sciences at the National Science Foundation in Arlington, Va., one of three federal agencies supporting the challenge.
"Digging Into Data has helped reduce some of the barriers to international research by making collaboration among the scholars as seamless as possible through a single review process and joint-decision making," she said.
First round projects included digging into a body of 53,000 18th-century letters to analyze the degree to which the effects of the Enlightenment could be observed in the letters of people with various occupations; creating tools to enable rapid and flexible access and linguistic analysis of more than 9,000 hours of spoken audio files from leading British and American spoken word corpora; and integrating a vast collection of textual, geographical and numerical data to allow the visual presentation of American railroads and their impact on society over time, among others.
Projects in the current round cover a wide variety of topics, for example: using information retrieval techniques to investigate changes in Western music; using high resolution medical imaging scanning to study Egyptian mummies; using data-mining technology to shed light on the impact of economic opportunity and spatial mobility on social structure; and using natural language processing to analyze large bodies of textual materials to study human rights abuses.
Along with the National Science Foundation, the sponsoring research funders include the Arts & Humanities Research Council, United Kingdom; the Economic & Social Research Council, United Kingdom; the Institute of Museum and Library Services, Washington, D.C.; the Joint Information Systems Committee, United Kingdom; the National Endowment for the Humanities, Washington, D.C.; the Netherlands Organization for Scientific Research and the Social Sciences and Humanities Research Council, Canada.
NSF's contribution of $550,000 supports American researchers from four of the fourteen teams. Detailed descriptions of the eight winning projects can be found below.
Additional information about the competition can be found on the Digging Into Data webpage.
DIGGING INTO DATA CHALLENGE - ROUND TWO (2011) WINNERS
Cascades, Islands, or Streams? Time, Topic, and Scholarly Activities in Humanities and Social Science Research
(Principal Investigators: Cassidy R. Sugimoto, Ying Ding, Staša Milojeviæ, Indiana University, Bloomington, NSF; Mike Thelwall, University of Wolverhampton, AHRC/ESRC/JISC; Vincent Larivière, Université de Montréal, SSHRC.)
This project will examine topic lifecycles across heterogeneous corpora, including not only scholarly and scientific literature, but also social networks, blogs and other materials. While the growth of large-scale datasets has enabled examination within scientific datasets, there is little research that looks across datasets. The team will analyze the importance of various scholarly activities for creating, sustaining and propelling new knowledge; compare and triangulate the results of topic analysis methods; and develop transparent and accessible tools. This work should identify which scholarly activities are indicative of emerging areas and identify datasets that should no longer be marginalized, but built into understandings and measurements of scholarship.
(Principal Investigators: Robert C. Stacey, University of Washington, IMLS; Arno Knobbe, Leiden University, NWO; Sarah Rees Jones, University of York, AHRC/ESRC/JISC; Michael Gervers, University of Toronto, SSHRC. Additional participating institutions: University of Brighton, Columbia University.)
This project will develop new ways of exploring the full text content of digital historical records. The project will demonstrate its approach using medieval charters which survive in abundance from the 12th to the 16th centuries and are one of the richest sources for studying the lives of people in the past.
Digging Into Connected Repositories (DiggiCORE)
(Principal Investigators: Andreas Juffinger, The European Library Office, NWO; Zdenek Zdrahal, The Open University, AHRC/ESRC/JISC.)
This project will analyze a vast set of Open Access research publications using Natural Language Processing and social network analysis methods to identify patterns in the behavior of research communities, to recognize trends in research disciplines, to learn new insights about the citation behaviors of researchers and to discover features that distinguish papers with high impact. This will enable the development of better methods for exploratory search and browsing in digital collections or new ways of evaluating research or the researcher's impact.
Digging by Debating
(Principal Investigators: Colin Allen and Katy Börner, Indiana University, Bloomington, NEH; Andrew Ravenscroft, University of East London, Chris Reed, University of Dundee, and David Bourget, University of London, AHRC/ESRC/JISC.)
A project to develop and implement a multi-scale workbench, called "InterDebates", with the goal of digging into data provided by hundreds of thousands, eventually millions, of digitized books, bibliographic databases of journal articles and comprehensive reference works written by experts. The team's hypotheses are: that detailed and identifiable arguments drive many aspects of research in the sciences and the humanities; that argumentative structures can be extracted from large datasets using a mixture of automated and social computing techniques; and, that the availability of such analyses will enable innovative interdisciplinary research, and may also play a role in supporting better-informed critical debates among students and the general public.
Digging Into Human Rights Violations: Anaphora Resolution and Emergent Witnesses
(Principal Investigators: Ben Miller, Georgia State University, NSF; Lu Xiao, University of Western Ontario, SSHRC. Additional participating institutions: University of North Florida.)
This project will develop an automated reader for large text archives of human rights abuses that will reconstruct stories from fragments scattered across a collection, and an interface for navigating those stories. By improving on anaphora resolution techniques in Natural Language Processing for the connection of pronouns to specific nouns, this system will help researchers and courts reveal witnesses and patterns contained in their own collections.
Digging Into Metadata: Enhancing Social Science and Humanities Research
(Principal Investigators: Mick Khoo, Drexel University, IMLS; Diana Massam, University of Manchester, AHRC/ESRC/JISC. Additional participating institutions: University of Glamorgan.)
The project will automatically generate new forms of metadata tags from existing metadata records and associated resources that will support discovery across multiple repositories. The project will utilize four repositories that vary in size, domain, metadata creation method and workflow, and quality. PERTAINS, a tool developed by one of the partner schools, will be used to analyze the metadata records in each repository and then to generate Dewey Decimal Classification-based tags. Clustering algorithms will be used to generate an index of similarity and match between resources in different repositories. After conducting a search, the user will retrieve a list of resources from the different collections that have been tagged in similar ways. Visualization techniques will be used to display the results in ways that enhance the research process.
Electronic Locator of Vertical Interval Successions (ELVIS): The First Large Data-Driven Research Project on Musical Style
(Principal Investigators: Michael Scott Cuthbert, Massachusetts Institute of Technology, NEH; Frauke Jürgensen, University of Aberdeen, AHRC/ESRC/JISC; Julie E. Cumming, McGill University, SSHRC. Additional participating institutions: Yale University.)
A project to study changes in Western musical style from 1300 to 1900, using the digitized collections of several large music repositories. The team notes that in order to understand style change in Western polyphonic music we need to be able to describe acceptable vertical sonorities (chords) and melodic motions in each period, and how they change over time. The project aims to do this for European polyphony from 1300 to 1900, using advanced music information retrieval techniques to study highly contrasting kinds of music that are nevertheless unified by common concepts of tonality, consonance vs. dissonance, and voice leading.
An Epidemiology of Information: Data Mining the 1918 Influenza Pandemic
(Edward T. Ewing, Bernice L. Hausman, Bruce Pencek, and Narendran Ramakrishnan, Virginia Polytechnic Institute & State University, NEH; Gunther Eysenbach, University of Toronto, SSHRC.)
This project seeks to harness the power of data mining techniques with the interpretive analytics of the humanities and social sciences to understand how newspapers shaped public opinion and represented authoritative knowledge during this deadly pandemic. This project makes use of the more than 100 newspaper titles for 1918 available from Chronicling America at the United States Library of Congress and the Peel's Prairie Provinces collection at the University of Alberta Library. The application of algorithmic techniques enables the domain expert to systematically explore a broad repository of data and identify qualitative features of the pandemic in the small scale as well as the genealogy of information flow in the large scale. This research can provide methods for understanding the spread of information and the flow of disease in other societies facing the threat of pandemics.
Imagery Lenses for Visualizing Text Corpora
(Principal Investigators: Katharine Coles, University of Utah, NEH; Min Chen, University of Oxford, AHRC/ESRC/JISC.)
A project to explore new visualization techniques for use in large scale linguistic and literary corpora using the collections of the British National Corpus and various smaller archives of poetry. The team will investigate whether or not advanced visualization techniques can provide an interface that enables humanities researchers to use their domain knowledge dynamically, while using the computational capability of computers. In particular, can data visualization help users make new observations and generate new hypotheses? The aim of this project is to answer the above methodological research question, and to create a set of new visualization tools for future scholarly research.
IMPACT Radiological Mummy Database
(Principal Investigators: Randall Thompson, Saint Luke's Mid America Heart Institute, NEH; Andrew Nelson, University of Western Ontario, SSHRC. Additional participating institutions: Al Azhar Medical School, Cairo, Quinnipiac University, Canadian Museum of Civilization, University of Southern California, University of California, San Diego, Mount Sinai School of Medicine, South Coast Radiological Medical Group, Newport Diagnostic Center, University of California, Irvine, Wisconsin Heart Hospital.)
This project is designed to provide mummy and medical researchers with a large-scale comparative database of medical imaging of mummified human remains. This departure from a case-study model for mummy studies will drive the field towards a large-scale comparative and epidemiological paradigm. The Canadian team will be investigating the evisceration and excerebration components of the Egyptian mummification tradition, and the US teams will apply the database to a greatly expanded study of atherosclerosis in ancient Egyptian mummies, as part of the IMPACT Ancient Health Research Group, and to the refinement of a novel system of diagnosis by consensus for mummified remains.
Integrated Social History Environment for Research (ISHER) - Digging Into Social Unrest
(Principal Investigators: Dan Roth, University of Illinois, Urbana-Champaign, NSF; Antal van den Bosch, Tilburg University, NWO; Sophia Ananiadou, The University of Manchester, AHRC/ESRC/JISC. Additional participating institutions: International Institute of Social History.)
This project will develop an integrated environment using sophisticated text mining tools to facilitate knowledge discovery in social history research. It will provide social historians and social scientists with the means to detect and associate events, trends, people, organizations and other entities of specific interest to social historians.
Integrating Data Mining and Data Management Technologies for Scholarly Inquiry
(Principal Investigators: Ray R. Larson, University of California, Berkeley and Richard Marciano, University of North Carolina at Chapel Hill, IMLS; Paul B. Watry, University of Liverpool, AHRC/ESRC/JISC. Additional participating institutions: Internet Archive, JSTOR.)
This project will integrate large-scale collections including JSTOR and the books collections of the Internet Archive stored and managed in a distributed preservation environment. It will also incorporate text mining and Natural Language Processing software capable of generating dynamic links to related resources discussing the same persons, places, and events. In this 17-month project we go beyond basic analysis by providing a prototype system developed to provide expert system support to scholars in their work.
Mining Microdata: Economic Opportunity and Spatial Mobility in Britain, Canada and the United States, 1850-1911
(Principal Investigators: Evan Roberts, University of Minnesota, NSF; Kevin Schürer, University of Leicester, AHRC/ESRC/JISC; Kris E. Inwood, University of Guelph, SSHRC. Additional participating institutions: University of Alberta, Université de Montréal, University of Essex.)
This project will make use of novel data-mining technology to exploit one of the largest population databases in the world, a vast collection of harmonized 19th and early 20th century census microdata from Britain, Canada, and the United States originally digitized for genealogical research. The goal is to shed light on the impact of economic opportunity and spatial mobility on social structure in Europe and North America.
(Principal Investigators: Ewan Klein, University of Edinburgh, AHRC/ESRC/JISC; Colin M. Coates, York University, SSHRC. Additional participating institutions: University of St Andrews.)
This project will examine the economic and environmental consequences of commodity trading during the nineteenth century. The project team will be using information extraction techniques to study large corpora of digitized documents from the nineteenth century. This innovative digital resource will allow historians to discover novel patterns and to explore new hypotheses, both through structured query and through a variety of visualization tools.