Researchers have designed a web-based platform which uses artificial neural networks to answer standard crossword clues better than existing commercial products specifically designed for the task. The system, which is freely available online, could help machines understand language more effectively.
In tests against commercial crossword-solving software, the system, designed by researchers from the UK, US and Canada, was more accurate at answering clues that were single words (e.g. 'culpability' - guilt), a short combination of words (e.g. 'devil devotee' - Satanist), or a longer sentence or phrase (e.g. 'French poet and key figure in the development of Symbolism' - Baudelaire). The system can also be used a 'reverse dictionary' in which the user describes a concept and the system returns possible words to describe that concept.
The researchers used the definitions contained in six dictionaries, plus Wikipedia, to 'train' the system so that it could understand words, phrases and sentences - using the definitions as a bridge between words and sentences. Their results, published in the journal Transactions of the Association for Computational Linguistics, suggest that a similar approach may lead to improved output from more general language understanding and dialogue systems and information retrieval engines in general. All of the code and data behind the application has been made freely available for future research.
"Over the past few years, there's been a mini-revolution in machine learning," said Felix Hill of the University of Cambridge's Computer Laboratory, one of the paper's authors. "We're seeing a lot more usage of deep learning, which is especially useful for language perception and speech recognition."
Deep learning refers to an approach in which artificial neural networks with little or no prior 'knowledge' are trained to recreate human abilities using massive amounts of data. For this particular application, the researchers used dictionaries - training the model on hundreds of thousands of definitions of English words, plus Wikipedia.
"Dictionaries contain just about enough examples to make deep learning viable, but we noticed that the models get better and better the more examples you give them," said Hill. "Our experiments show that definitions contain a valuable signal for helping models to interpret and represent the meaning of phrases and sentences."
Working with Anna Korhonen from the Cambridge's Department of Theoretical and Applied Linguistics, and researchers from the Université de Montréal and New York University, Hill used the model as a way of bridging the gap between machines that understand the meanings of individual words and machines that can understand the meanings of phrases and sentences.
"Despite recent progress in AI, problems involving language understanding are particularly difficult, and our work suggests many possible applications of deep neural networks to language technology," said Hill. "One of the biggest challenges in training computers to understand language is recreating the many rich and diverse information sources available to humans when they learn to speak and read."
However, there is still a long way to go. For instance, when Hill's system receives a query, the machine has no idea about the user's intention or the wider context of why the question is being asked. Humans, on the other hand, can use their background knowledge and signals like body language to figure out the intent behind the query.
Hill describes recent progress in learning-based AI systems in terms of behaviourism and cognitivism: two movements in psychology that effect how one views learning and education. Behaviourism, as the name implies, looks at behaviour without looking at what the brain and neurons are doing, while cognitivism looks at the mental processes that underlie behaviour. Deep learning systems like the one built by Hill and his colleagues reflect a cognitivist approach, but for a system to have something approaching human intelligence, it would have to have a little of both.
"Our system can't go too far beyond the dictionary data on which it was trained, but the ways in which it can are interesting, and make it a surprisingly robust question and answer system - and quite good at solving crossword puzzles," said Hill. While it was not built with the purpose of solving crossword puzzles, the researchers found that it actually performed better than commercially-available products that are specifically engineered for the task.
Existing commercial crossword-answering applications function in a similar way to a Google search, with one system able to reference over 1100 dictionaries. While this approach has advantages if you want to look up a definition verbatim, it works less well when you input a question or query that the model has never seen in training. It also makes it incredibly 'heavy' in terms of the amount of memory it requires. "Traditional approaches are like lugging many heavy dictionaries around with you, whereas our neural system is incredibly light," said Hill.
According to the researchers, the results show the effectiveness of definition-based training for developing models that understand phrases and sentences. They are currently looking at ways of enhancing their system, specifically by combining it with more behaviourist-style models of language learning and linguistic interaction.