"Cannot get asleep all night", "a little giddy" and other complaints in social networks can now be translated into formal medical terms, such as insomnia or vertigo. The task of comparing syndromes mentioned by patients and specific medical terms is called the normalization of medical concepts.
To find a solution for such comparisons, programmers used a specific type of networks - recurrent neural networks - and semantic vector word representation. Apart from the abovementioned two universities, contributions were made by Kurchatov Institute, First Moscow Medical University, and Saint-Petersburg Branch of Steklov Mathematical Institute. During the next few years, the researchers plan to transfer the technology to the Russian language. The research was supported by the Russian Science Foundation.
To make correct comparisons, medical texts were uploaded to the software kit, and a special vocabulary was created. The software used this data to assign a vector for each word.
Valentin Malykh, Research Associate at the MIPT Neural Networks and Deep Learning Lab, commented, "We used user comments from the web. Our network is a recurrent one, so it's capable of memorizing. Of course, not in the literal sense of that word, because the network is not a thinking system, but there is a specific mechanism it uses to memorize texts. We upload texts to it, and it then compares them to the International Classification of Diseases (ICD). The outputs are word vectors, and words and terms often encountered in a similar context are assigned similar coordinates. Thus, the neural network "compares" user texts and official medical terms."
For example, if the network receives a text containing the word "queasy", it will map such a complaint to the "nausea" symptom. Although, if it finds that someone said "belly butterflies", it may just discard such phrasing as non-symptomatic because there is nothing similar in the ICD.
This task is more than just a simple comparison of natural language with vocabulary units; the problem is that many user messages may not resemble medical terms at all.
"The importance of this research is caused by a growing demand for text data analysis. In our project, we use text analysis methods and machine learning to extract useful information from the available data," said Elena Tutubalina, Senior Research Associate the KFU Medical Informatics Lab.
Andrey Filchenkov, Research at the Computer Technology Department, ITMO University, added, that communication is one of the pertinent problems of medicine and healthcare, and this research can help remedy it somewhat.
"Algorithmically speaking, this task is more like translating between different languages, albeit very similar ones. The solution lies within natural language processing. In the last years, the most successful solution for most tasks in speech and text processing have been based on deep neural networks which help determine complex regularities in data. In particular, recurrent neural networks work well with serialized data because they can find links between elements while taking consideration of the context," he opined.
According to the authors, uninterrupted development and fine tuning of intelligent text analysis for patient complaints in social networks can influence our understanding of how medications affect the human organism. Furthermore, information about repeated medication prescriptions will be analysed, and understanding of medication effects in combination with other factors, such as simultaneous medication intake, diets, and lifestyles, will be enhanced.
Journal of Biomedical Informatics