Medical Informatics
Research Article
A system for extracting symptom mentions from texts by means of neural networks
Yuri Petrovich Serdyuk1, Natalia Aleksandrovna Vlasova2, Seda Rubenovna Momot3
Ailamazyan Program Systems Institute of RAS, Ves'kovo, Russia2 | nathalie.vlassova@gmail.com |
Abstract. This paper presents a system for extracting symptom mentions from medical texts in natural (Russian) language. The system finds symptom mentions in texts, brings them to a standard form and identifies the found symptom to a group of similar symptoms. For each stage of processing we use a separate neural network. We extract symptoms of three areas of diseases: allergic and pulmonological diseases, as well as coronavirus infection (COVID-19). We present and describe an annotated corpus of sentences that is used to train neural networks for extracting symptom mentions. These sentences were marked up with the help of a simple XML-like language. An extended BIO-markup format was proposed for the sentences directly received at the input of the neural network. We give the quality evaluation of the symptom extraction accuracy under strict and flexible testing. Possible approaches to normalization and identification of symptom mentions and their implementation are described. Our results are compared with those achieved in similar researches, thus we show the place of our system among clinical decision support systems. (In Russian).
Keywords: natural language processing, neural networks, information extraction, symptom mentions, annotated corpus, BERT-models, Covid-19
MSC-2020 68T07; 68T50For citation: Yuri P. Serdyuk, Natalia A. Vlasova, Seda R. Momot. A system for extracting symptom mentions from texts by means of neural networks. Program Systems: Theory and Applications, 2023, 14:1, pp. 95–123. (In Russ.). https://psta.psiras.ru/2023/1_95-123.
Full text of article (PDF): https://psta.psiras.ru/read/psta2023_1_95-123.pdf.
The article was submitted 26.12.2022; approved after reviewing 29.01.2023; accepted for publication 29.01.2023; published online 17.02.2023.