Volume 14 (2023) . Issue 1 (56) . Paper No. 4 (425)

Medical Informatics

Research Article

A system for extracting symptom mentions from texts by means of neural networks

Yuri Petrovich Serdyuk1, Natalia Aleksandrovna Vlasova2Correspondent author, Seda Rubenovna Momot3

Ailamazyan Program Systems Institute of RAS, Ves'kovo, Russia
2 Natalia Aleksandrovna Vlasova — Correspondent author nathalie.vlassova@gmail.com

Abstract. This paper presents a system for extracting symptom mentions from medical texts in natural (Russian) language. The system finds symptom mentions in texts, brings them to a standard form and identifies the found symptom to a group of similar symptoms. For each stage of processing we use a separate neural network. We extract symptoms of three areas of diseases: allergic and pulmonological diseases, as well as coronavirus infection (COVID-19). We present and describe an annotated corpus of sentences that is used to train neural networks for extracting symptom mentions. These sentences were marked up with the help of a simple XML-like language. An extended BIO-markup format was proposed for the sentences directly received at the input of the neural network. We give the quality evaluation of the symptom extraction accuracy under strict and flexible testing. Possible approaches to normalization and identification of symptom mentions and their implementation are described. Our results are compared with those achieved in similar researches, thus we show the place of our system among clinical decision support systems. (In Russian).

Keywords: natural language processing, neural networks, information extraction, symptom mentions, annotated corpus, BERT-models, Covid-19

MSC-20202020 Mathematics Subject Classification 68T07; 68T50MSC-2020 68-XX: Computer science
MSC-2020 68Txx: Artificial intelligence
MSC-2020 68T07: Artificial neural networks and deep learning
MSC-2020 68T50: Natural language processing

For citation: Yuri P. Serdyuk, Natalia A. Vlasova, Seda R. Momot. A system for extracting symptom mentions from texts by means of neural networks. Program Systems: Theory and Applications, 2023, 14:1, pp. 95–123. (In Russ.). https://psta.psiras.ru/2023/1_95-123.

Full text of article (PDF): https://psta.psiras.ru/read/psta2023_1_95-123.pdf.

The article was submitted 26.12.2022; approved after reviewing 29.01.2023; accepted for publication 29.01.2023; published online 17.02.2023.

© Serdyuk Y. P., Vlasova N. A., Momot S. R.
2023
Editorial address: Ailamazyan Program Systems Institute of the Russian Academy of Sciences, Peter the First Street 4«a», Veskovo village, Pereslavl area, Yaroslavl region, 152021 Russia; Phone: +7(4852) 695-228; E-mail: ; Website:  http://psta.psiras.ru
© Ailamazyan Program System Institute of Russian Academy of Science (site design) 2010–2024 The text of CC-BY-4.0 license