image: Model framework of the LA-TextCNN-BiLSTM
Credit: Bocheng Li
Electronic medical records (EMRs) enable healthcare institutions to digitally document patients’ clinical conditions, treatment processes, and diagnostic outcomes, supporting paperless clinical workflows. However, the large volume of unstructured clinical data has introduced new challenges for disease classification and coding. The International Classification of Diseases (ICD), developed by the World Health Organization (WHO), provides a standardized framework for categorizing diseases based on etiology, pathology, clinical presentation, and anatomical location, with ICD-11 as the latest version. Automated ICD classification and coding of EMRs can substantially reduce the workload of medical coding departments and serves as a critical foundation for the effective use of EMRs in clinical practice and medical research.
A team of researchers from the Medical Record Department of Peking Union Medical College Hospital & WHO Family of International Classification Collaborating Center in China recently developed a novel deep learning model, LA-TextCNN-BiLSTM, that significantly improves the accuracy of automatic disease classification using the latest ICD-11, according to a study published in Informatics and Health. The model, evaluated on real-world EMRs data, achieved an 83.86% accuracy rate, demonstrating robust performance in multi-label classification.
The traditional manual coding is time-consuming and error-prone. To automate this, the research team leveraged MC-BERT, a Chinese biomedical-pretrained language model, to better capture clinical semantics from electronic medical records (EMRs). The researchers integrated a label attention mechanism that uses semantic information from ICD-11 codes themselves to guide the model in focusing on diagnostically relevant text, reducing noise from redundant clinical descriptions.
“In the International Classification of Diseases (ICD) system, classification codes are more than symbolic representations; each code carries specific taxonomic significance and clinical meaning. Compared with earlier versions, ICD-11 provides substantially more detailed clinical descriptions of diagnostic entries. Building upon this advancement, our work utilized the semantic information of ICD-11 entries through label attention mechanism to further improve model performance.” shares corresponding author Li Naishi.
“This advancement may pave the way for streamlining hospital workflows, enhancing data usability in research, and supporting intelligent healthcare systems—particularly in Chinese-speaking medical environments where language complexity poses unique NLP challenges,” adds Li.
###
Contact the author: Naishi Li, Medical Record Department of Peking Union Medical College Hospital. lns@medmail.com.cn
The publisher KeAi was established by Elsevier and China Science Publishing & Media Ltd to unfold quality research globally. In 2013, our focus shifted to open access publishing. We now proudly publish more than 200 world-class, open access, English language journals, spanning all scientific disciplines. Many of these are titles we publish in partnership with prestigious societies and academic institutions, such as the National Natural Science Foundation of China (NSFC).
Journal
Informatics and Health
Method of Research
Computational simulation/modeling
Subject of Research
Not applicable
Article Title
LA-TextCNN-BiLSTM: A classification model for ICD-11
COI Statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.