Volume 14 (2023) . Issue 1 (56) . Paper No. 1 (422)

Artificial intelligence and machine learning

Research Article

Tabular information recognition using convolutional neural networks

Igor Victorovich VinokurovCorrespondent author

Financial University under the Government of the Russian Federation, Moscow, Russia
Igor Victorovich Vinokurov — Correspondent author igvvinokurov@fa.ru

Abstract. The relevance of identifying tabular information and recognizing its contents for processing scanned documents is shown. The formation of a data set for training, validation and testing of a deep learning neural network (DNN) YOLOv5s for the detection of simple tables is described. The effectiveness of using this DNN when working with scanned documents is shown. Using the Keras Functional API, a convolutional neural network (CNN) was formed to recognize the main elements of tabular information — numbers, basic punctuation marks and Cyrillic letters. The results of a study of the work of this CNN are given. The implementation of the identification and recognition of tabular information on scanned documents in the developed IS updating information in databases for the Unified State Register of Real Estate system is described. (Linked article texts in Russian and in English).

Keywords: Convolutional Neural Networks, Deep Learning Neural Networks, CNN, DNN, YOLOv5s, Keras, Python

MSC-20202020 Mathematics Subject Classification 68T20; 68T07, 68T45MSC-2020 68-XX: Computer science
MSC-2020 68Txx: Artificial intelligence
MSC-2020 68T20: Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
MSC-2020 68T07,: Artificial neural networks and deep learning
MSC-2020 68T45: Machine vision and scene understanding

For citation: Igor V. Vinokurov. Tabular information recognition using convolutional neural networks. Program Systems: Theory and Applications, 2023, 14:1, pp. 3–30. (In Russ., in Engl.). https://psta.psiras.ru/2023/1_3-30.

Full text of article (PDF): https://psta.psiras.ru/read/psta2023_1_3-30.pdf.

The article was submitted 23.11.2022; approved after reviewing 28.11.2022; accepted for publication 12.12.2022; published online 13.02.2023.

© Vinokurov I. V.
2023
Editorial address: Ailamazyan Program Systems Institute of the Russian Academy of Sciences, Peter the First Street 4«a», Veskovo village, Pereslavl area, Yaroslavl region, 152021 Russia; Phone: +7(4852) 695-228; E-mail: ; Website:  http://psta.psiras.ru
© Ailamazyan Program System Institute of Russian Academy of Science (site design) 2010–2024 The text of CC-BY-4.0 license