Homepage Program Systems: Theory and Applications Русская версия
ISSN 2079-3316 Bilingual online scientific Online scientific journal of the Ailamazyan Program System Institute of the Ailamazyan PSI of PSI of Russian Academy of Science of RAS 12+ 
Volume 16 (2025) . Issue 4 (67) . Paper No. 10 (457)

Hardware and software for distributed and supercomputer systems

Research Article

Using multilevel data sources to prepare training sets for cyberattack detection

Dmitry Dmitrievich Kononov1Correspondent author, Sergey Vladislavovich Isaev2

1,2Institute of Computational Modelling of the Siberian Branch of the Russian Academy of Sciences, Krasnoyarsk, Russia
1 Dmitry Dmitrievich Kononov — Correspondent author ddk@icm.krasn.ru

Abstract. Network traffic analysis is an integral part of ensuring security in information and telecommunication systems. The use of machine learning provides modern approaches with higher detection rates for cyber threats.

A new approach for generating training datasets is proposed, which introduces a new aggregation unit “session”, utilizes signature analysis and multi-level data sources, including heterogeneous ones. A list of requirements for the datasets is generated, which includes preserving the first packets of the connection, preserving hidden areas of the packets, extended information about traffic sources (country, autonomous system number ASN). The additional information will allow to detect attacks of the “hidden communication channel” type. Using the proposed approach, a software package for creating training datasets from multilevel sources at the L7, L4, L3 levels of the OSI model has been developed. In contrast to existing works, real data of network activity as well as long time intervals are used. The proposed approach allows to use the obtained training sets to create more effective methods of intrusion detection and prevention using machine learning techniques. (In Russian).

Keywords: Internet, network security, cyber threats, network traffic analysis, datasets, machine learning

MSC-20202020 Mathematics Subject Classification 68M25; 68-11, 62N86MSC-2020 68-XX: Computer science
MSC-2020 68Mxx: Computer system organization
MSC-2020 68M25: Computer security
MSC-2020 68-11: Research data for problems pertaining to computer science
MSC-2020 62-XX: Statistics
MSC-2020 62Nxx: Survival analysis and censored data
MSC-2020 62N86: Fuzziness, and survival analysis and censored data

For citation: Dmitry D. Kononov, Sergey V. Isaev. Using multilevel data sources to prepare training sets for cyberattack detection. Program Systems: Theory and Applications, 2025, 16:4, pp. 267–285. (In Russ.). https://psta.psiras.ru/2025/4_267-285.

Full text of article (PDF): https://psta.psiras.ru/read/psta2025_4_267-285.pdf.

The article was submitted 10.07.2025; approved after reviewing 16.07.2025; accepted for publication 03.10.2025; published online 27.11.2025.

© Kononov D. D., Isaev S. V.
2025
Editorial address: Ailamazyan Program Systems Institute of the Russian Academy of Sciences, Peter the First Street 4«a», Veskovo village, Pereslavl area, Yaroslavl region, 152021 Russia;   Website:  http://psta.psiras.ru Phone: +7(4852) 695-228;   E-mail: ;   License: CC-BY-4.0License text on the Creative Commons site
© Ailamazyan Program System Institute of Russian Academy of Science (site design) 2010–2025