Artificial intelligence (AI) has become an integral part of our lives. It has given rise to smart assistants that take on tasks that would otherwise take humans a great deal of time and effort – in medicine, business and industry, for example. To do this, smart assistants require vast amounts of data. ‘Knowledge graphs’ are one of the preferred mechanisms for representing data here, because they can be understood by both humans and machines and ensure that information is processed logically. They are considered key for a number of popular technologies such as Internet search engines and personal digital assistants. However, existing machine learning approaches for knowledge graphs still have some shortcomings, in particular with respect to scalability, consistency and completeness. A further problem is that they do not meet the human need for comprehensibility. Researchers at Paderborn University are now working on a large-scale research project to develop explainable machine learning for large-scale knowledge graphs. The National Center for Scientific Research ‘Demokritos’ in Greece, the European Union Satellite Centre (SatCen) in Spain, the University of Amsterdam in the Netherlands as well as the companies DATEV and webLyzard technology are also involved in the ENEXA* project. The research is being funded for a period of three years to the tune of around €4 million as part of the EU’s Horizon Europe programme.
Explainability of artificial intelligence
“Current machine learning-based explanation approaches are often based on a one-off process in which the AI does not take into account whether the human receiving the explanation has really understood what is being explained,” says Professor Axel-Cyrille Ngonga Ngomo, who heads up the Data Science working group at Paderborn University. In other words: There is no conversation between sender and recipient. Ngonga adds: “The problem can be overcome through the co-construction of explanations, whereby the addressees – i.e. the humans – are more involved in the AI-driven explanation process, with explanations not only produced for them, but with them.”
Human-centred: Machine learning for large-scale applications
The concept of co-construction has not yet been used for knowledge graphs. The researchers have therefore set themselves the goal of developing explainable machine learning approaches for particularly large knowledge graphs, with the focus on the rapid computation of models and human-centred explanations. Ngonga speaks of pioneering work: “To achieve this goal, ENEXA will devise novel hybrid machine learning approaches that are able to exploit multiple representations of knowledge graphs in a concurrent fashion. The solutions developed will meet real-world runtime requirements and make explainable machine learning accessible for large-scale applications such as Internet search engines, accounting, brand marketing, and the predictive analysis of satellite imagery. By using hybrid machine learning for large knowledge graphs and for explaining these, ENEXA will be leading the way in the implementation of explanatory models from sociology and psychology in machine learning.” This is important because people often have to make decisions without always being clear on the facts, which can then have far-reaching consequences.
Benefits for industry
The use of AI for knowledge graphs also has benefits for industry. According to the researchers, however, there are certain problems here: “Knowledge extraction and storage frameworks capable of translating industrial data into large knowledge graphs and storing the results in a distributed manner are still scarce. Developing scalable AI algorithms capable of computing predictions for large, inconsistent, or incomplete data sets in a reasonable amount of time also still poses a challenge. Another challenge is providing techniques for producing understandable explanations from machine-generated results, to ensure that the computed models are reliable,” says Ngonga. The team is therefore working on developing algorithms that meet these key requirements.
The path to the goal: Three use cases
“The main goal of ENEXA is to devise explainable machine learning approaches for knowledge graphs that significantly outperform the state of the art in terms of runtime, the amount of data to be processed (scalability), data inconsistency (robustness) and explanation quality,” sums up Ngonga. Three use cases have been chosen to apply and validate these approaches as part of the project. The first is in cooperation with the company DATEV, which processes more than 60 million digital receipts per month from around 960.000 German SMEs and public institutions. These accounting relevant receipts have to be classified correctly and interpreted to generate valid and legally compliant posting records. In this process quality and traceability play a crucial role, so as to minimize errors and thereby costs and to ensure compliance with legal requirements. The quality of the automation results heavily depends on the data basis and its preparation for machine learning. In co-operation with the scientists, new approaches for more efficient and thus resource-saving processes are to be researched in the course of the project using knowledge graphs.
The second use case is being carried out in conjunction with the European Union Satellite Centre (SatCen), an EU Agency located in Spain. SatCen provides geospatial intelligence products and services resulting from the exploitation of Earth Observation assets and collateral data. One of such sources is the data from the Sentinel satellites. These Earth observation satellites are part of the European Union Copernicus programme and that produce huge volumes of data that can be combined with geospatial knowledge graphs to efficiently extract relevant information from it. The ENEXA team is looking at developing new knowledge graphs techniques to improve the management and analysis of such data.
The third use case involves improving brand communication strategies in co-operation with webLyzard technology. The company uses knowledge graphs as background knowledge for associating affective knowledge with consumer brands and predicting future developments in order to derive data-driven strategies. webLyzard technology processes up to 100 million documents a day. The goal is to obtain meaningful classification results that lead to companies distributing press releases or placing online ads to promote their content at specific time intervals that maximize the reach among specific target groups. Current approaches are not able to handle this volume of data.
A number of researchers from a variety of different disciplines are working together to ensure the success of the ENEXA project, including computational linguists, psychologists, computer scientists and software developers. This collaborative approach aims to provide new answers to social, economic and entrepreneurial challenges in connection with artificial intelligence. At its core, it is about human participation in socio-technical systems. The team is expecting to see initial results as soon as 2023.
Further information can be found here: http://enexa.eu.
*ENEXA stands for ‘Efficient Explainable Learning on Knowledge Graphs’.