Tsukuba, Japan—These days, we search online for all sorts of things—videos to watch, gifts for family, or even a plumber to fix a leaky pipe. But what should the computer do when our query could mean different things or doesn't include enough information? Search result diversification, where the search engine system returns not just the best matches but also a diverse set of answers, is widely seen as the solution, and the researcher at the University of Tsukuba has developed a new, probabilistic approach that outperforms state-of-the-art methods.
In many current algorithms, search results are diversified in a greedy manner, namely by selecting documents one by one. Unfortunately, this strategy is slow and only works if the initially selected documents are already quite close to the optimal answer. Other approaches use a "score-and-sort" strategy based on machine learning, but the criteria used for optimization during training can be very different from the criteria used for evaluation during testing.
"Another problem is that the optimization objectives in these models include parameters and must be well adjusted," explains Professor Hai-Tao Yu, author of the study. "If they are not set properly, this can impact search performance."
To address these issues, the University of Tsukuba team takes a probabilistic approach: instead of assigning each candidate document a deterministic score, they use a probabilistic score that follows a specific distribution to represent the relevance of a candidate document. They also consider the interactions among the results and carefully formulate the evaluation metric as the optimization objective.
"We can then directly optimize the evaluation metric that is used for testing," Professor Yu says. "In our approach, the metric used in training is the same one that is used for testing." Moreover, no manual tuning parameter is required when formulating the optimization objective, and the method works well even when the input data are noisy.
In future, this probabilistic approach could also be used for other tasks like document summarization and paraphrasing. For now, as we enter the era of big data, searching for information is more important than ever, and an algorithm such as this one that can provide us with relevant and diverse answers will substantially improve our quality of life.
The article, "Optimize What You Evaluate With: Search Result Diversification Based on Metric Optimization," was published in the Proceedings of The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-22) at DOI:10.1609/aaai.v36i9.21282
Associate Professor YU Hai-Tao
Faculty of Pure and Applied Sciences, University of Tsukuba
Optimize What You Evaluate With: Search Result Diversification Based on Metric Optimization
Article Publication Date