An increasing amount of information is available nowadays via the Internet. Yet can all this information be found? Search engines are already effective in searching for documents, yet are far less good at searching for entities, such as persons. Krisztian Balog introduces two new models in his thesis that make finding the right person quicker and more accurate.
Balog specifically focuses on searching and finding people within companies and organisations. In the business world in particular, an effective search system can be very useful. For example, it could enable a manager to quickly find out who had previously worked on a certain project without having to plough through a pile of paperwork.
Such a search system is not only useful within companies but can also ensure a better exchange of information between companies and the press or between companies and employment agencies. For example, an HRM department can use the search system to find out more about job applicants.
Finding and profiling
The PhD thesis focuses on two methods of information disclosure. On the one hand compiling a list of experts for a subject. On the other making a list of subjects per expert.
The problem of searching for people is that a person is not a collection of words. Text, however, is. When you search for a text you submit a number of words and then find texts that contain these. Such a search query is relatively uncomplicated. A person cannot be found in the same manner. However, a person does leave a digital trail because his or her name can be found in the texts. Balog's program automatically links the information in these texts to a person. Balog developed a method that uses these digital traces to compile a list of subjects for a person. The program accordingly selects the person that can satisfy the criteria of the search query.
Balog combines so-called generative language models with learning algorithms. The language models expose patterns in the language use with respect to persons and subjects. Learning algorithms recognise people and organisations in texts. Balog's methods have been extensively tested, for example on the intranet of large organisations with people at different locations, such as W3C and CSIRO. The method has also been tested on the intranet of a Dutch university.
The method developed can only be used within organisations for the time being, but the same technology can also be used for finding people on the Internet. The model can also process many different types of search queries and is, therefore, highly flexible. A journalist can even use the system to determine how high environmental issues are on the agenda of a political party.
Krisztian Balog was a PhD student in the research group of Maarten de Rijke. De Rijke received a Pioneer subsidy from NWO in 2001 and used this to set up the project 'Computing with Meaning'. Between 1989 and 2002 more than 100 highly experienced researchers who had the potential to become full professors received a Pioneer subsidy. This is comparable to the current Vici subidy from NWO. The research group hit the headlines earlier with MoodViews, a program for tracking and analysing moods of bloggers on the Internet (see press release 4 April 2006, http://www.