Article Highlight | 17-Apr-2024

The life cycle of knowledge in big language models: A survey

Beijing Zhongke Journal Publising Co. Ltd.

Knowledge is the key to high-level intelligence. How a model obtains, stores, understands, and applies knowledge has long been a critical research topic in machine intelligence. Recent years have witnessed the rapid development of pre-trained language models (PLMs). Through self-supervised pre-training on large-scale unlabeled corpora, PLMs show strong generalization and transferring abilities across different tasks/datasets/settings over previous methods, and therefore have achieved remarkable success in natural language processing.

 

The success of pre-trained language models has raised great attention about the nature of their entailed knowledge. There have been numerous studies focusing on how knowledge can be acquired, maintained, and used by pre-trained language models. Along these lines, many novel research directions have been explored. For example, knowledge infusing devotes to injecting explicit structured knowledge into PLMs. Knowledge probing aims to evaluate the type and amount of knowledge stored in PLMs′ parameters. And knowledge editing is dedicated to modifying the incorrect or undesirable knowledge acquired by PLMs.

 

Despite the large number of related studies, current studies primarily focus on one specific stage of knowledge process in PLMs, thereby lacking a unified perspective on how knowledge circulates throughout the entire model learning, tuning, and application phases. The absence of such comprehensive studies makes it hard to better understand the connections between different knowledge-based tasks, discover the correlations between different periods during the knowledge life circle in PLMs, exploit the missing links and tasks for investigating knowledge in PLMs, or explore the shortcomings and limitations of existing studies. For example, while numerous studies attempt to assess the knowledge in language models that are already pre-trained, there are few studies dedicated to investigating why PLMs can learn from pure text without any supervision about knowledge, as well as how PLMs represent or store the knowledge. Meanwhile, many researchers have tried to explicitly inject various kinds of structural knowledge into PLMs, but few studies propose to help PLMs better acquire specific kinds of knowledge from pure text by exploiting the knowledge acquisition mechanisms behind. As a result, related research may be overly focused on several directions but fail to comprehensively understand, maintain, and control knowledge in PLMs, and therefore limits the improvements and further application.

 

In this review published in Machine Intelligence Research by the team of Prof. Han Xianpei, researchers propose to systematically review the knowledge-related studies in pre-trained language models from a knowledge engineering perspective. Inspired by research in cognitive science and knowledge engineering, they regard pre-trained language models as knowledge-based systems, and investigate the life cycle of how knowledge circulates when it is acquired, maintained, and used in pre-trained models. Specifically, researchers divide the life cycle of knowledge in pre-trained language models into the following five critical periods: knowledge acquisition, knowledge representation, knowledge probing, knowledge editing and knowledge application.

 

For each of these periods, researchers of this paper sort out the existing studies, summarize the main challenges and limitations, and discuss future directions. Based on the unified perspective, they are able to understand and utilize the close connections between different periods instead of considering them as independent tasks. For instance, understanding the knowledge representation mechanism of PLMs is valuable for researchers to design better knowledge acquisition objectives and knowledge editing strategies. Proposing reliable knowledge probing methods could help researchers find the suitable applications for PLMs, and gain insight into their limitations, thereby facilitating improvement. Through this survey, researchers are willing to comprehensively conclude the progress, challenges, and limitations of current studies, help researchers better understand the whole field from a novel perspective, and shed light on the future directions about how to better regulate, represent and apply the knowledge in language models from a unified perspective.

 

The contributions of this review are as follows: 1) It proposes to revisit pre-trained language models as knowledge-based systems, and divides the life cycle of knowledge in PLMs into five critical periods. 2) For each period, researchers review existing studies, summarize the main challenges and shortcomings for each direction. 3) Based on this review, researchers discuss about the limitations of the current research, and shed light to potential future directions.

 

Section 2 presents the overall structure of this survey, describes the taxonomy in detail, and discusses the topics in each critical period.

 

During the knowledge acquisition period, pre-trained language models learn knowledge from different knowledge sources. In Section 3, researchers categorize and describe knowledge acquisition strategies according to knowledge sources: learning from text data and learning from structured data, and then discuss the future directions.

 

Knowledge representation studies investigate how pre-trained language models encode, transform, and store the acquired knowledge. In PLMs, knowledge is encoded to dense vector representations and held in their distributed parameters, but how each kind of knowledge is encoded, transformed, and stored into the parameters is still unclear and needs further investigation. Currently, the analyzing approaches for knowledge representation in PLMs can be classified into four categories: gradient-based, causal-inspired, attention-based and layer-wise methods. Section 4 reviews the studies related to these four categories and proposes future directions of knowledge representation in PLMs.

 

Knowledge probing aims to assess how well pre-trained language models entail different kinds of knowledge. A comprehensive and accurate assessment of PLMs′ knowledge can help researchers identify and understand language models′ capabilities and deficiencies, allow a fair comparison between LMs with different architectures and pre-training tasks, guide the improvement of a specific model, and select suitable models for different real-world scenarios. In Section 5, researchers first introduce existing benchmarks for knowledge probing, then introduce the representative prompt-based and feature-based probing methods and analyze their corresponding limitations, and discuss future directions.

 

Knowledge editing aims to modify the incorrect knowledge or delete the undesirable information in PLMs. Because of inevitable mistakes learned by PLMs and the update of knowledge, reliable and effective knowledge editing approaches are essential for the sustainable application of PLMs. Current approaches include constrained fine-tuning, memory-based, meta learning inspired and location-based methods. In Section 6, researchers divide current strategies for knowledge editing into the above four categories and provide the summary of comparisons between these approaches in the Table of this part. In the following, researchers describe and discuss these methods.

 

Knowledge application studies how to effectively distill and leverage the knowledge in PLMs for other applications. Specifically, researchers divide knowledge applications into two categories: language models as knowledge bases and language models for downstream tasks, and in Section 7 researchers describe them in detail and propose future directions.

 

In this survey, researchers conduct a comprehensive review about the life circle of knowledge in pre-trained language models, including knowledge acquisition, knowledge representation, knowledge probing, knowledge editing and knowledge application. They systematically review related studies for each period, discuss the advantages and limitations of different methods, summarize the main challenge, and present some future directions. The team of Prof. Le Sun believe this survey will benefit other researchers in many areas such as language models, knowledge graph, knowledge base, etc.

 

 

See the article:

The Life Cycle of Knowledge in Big Language Models: A Survey

http://doi.org/10.1007/s11633-023-1416-x

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.