SINCE the early 1800s, the printed reports known as "Hansard" - after their publisher - have provided near-verbatim reports of proceedings in Parliament. More than two centuries' worth of debates have been made available in digital form. Now, a project at the University of Huddersfield will develop a new interface enabling researchers to extract just the information they need from millions of words online.
Political parties, pressure groups, journalists, think tanks and historians will be among the many groups poised to benefit from the accessible and user-friendly website developed by the year-long project, which has been awarded funding of £80,510 by the Arts and Humanities Research Council (AH/R007136/1).
The goal of the research is to make Hansard more accessible to users who might not have the specialised knowledge of the complex workings of Parliament - or the computational linguistic skills - that they need if they are to make the best use of online search findings. This will be done by providing user-friendly search options, as well as by offering various visualisations of the search findings.
"Hansard at Huddersfield" is led by the linguistics expert Professor Lesley Jeffries and is a follow up to her role in an earlier project dubbed SAMUELS (Semantic Annotation and Mark Up for Enhancing Lexical Searches). The earlier project used the celebrated Historical Thesaurus of English, developed at the University of Glasgow in collaboration with Oxford Dictionaries, to "tag" Hansard reports between 1803 and 2005 with historical linguistic information to help researchers trace topics and linguistic trends across time.
For example, it is now possible to conduct searches of the SAMUELS version of Hansard that distinguish between Labour the party, and "labour" as a more general term in the field of industrial relations or between "green" as a general adjective and as the adjective characterising environmental concerns. However, at the moment the user of this database needs to be skilled in computational linguistics to access this information. The new project, therefore, will create a user-friendly interface for other groups of researchers and professionals.
The technical lead is the University of Huddersfield's Dr Alexander von Lünen - a historian who also has a computer science degree. Also taking part is Professor Marc Alexander, of the University of Glasgow, who was lead researcher for the SAMUELS project.
Newly-appointed is research programmer Dr Hugo Sanjurjo González, a recent doctoral graduate from the University of León, Spain, and the administrative assistant is Fransina De Jager, who earned a Chancellor's Prize for her MA degree in linguistics at the University of Huddersfield and has now embarked on a PhD.
A key part of the project is collaboration with the prospective end-users and a roster of local authorities, think tanks and lobbying organisations have already stated their desire to be involved. More groups will be invited as the project progresses to take part in a series of end-user consultation meetings in London and Huddersfield.
Initially, the project will cover the scanned Hansard database from 1803 to 2005, but the aim is also to bring it right up-to-date by adding the more recent data.
Professor Jeffries said: "I am excited by the prospect of making Hansard even more accessible to the end-users and hope that the project will in some modest way enhance public engagement with democracy."