Hardware and software for distributed and supercomputer systems
Research Article
Using multilevel data sources to prepare training sets for cyberattack detection
Dmitry Dmitrievich Kononov1
, Sergey Vladislavovich Isaev2
| 1,2 | Institute of Computational Modelling of the Siberian Branch of the Russian Academy of Sciences, Krasnoyarsk, Russia |
| 1 |
|
Abstract. Network traffic analysis is an integral part of ensuring security in information and telecommunication systems. The use of machine learning provides modern approaches with higher detection rates for cyber threats.
A new approach for generating training datasets is proposed, which introduces a new aggregation unit “session”, utilizes signature analysis and multi-level data sources, including heterogeneous ones. A list of requirements for the datasets is generated, which includes preserving the first packets of the connection, preserving hidden areas of the packets, extended information about traffic sources (country, autonomous system number ASN). The additional information will allow to detect attacks of the “hidden communication channel” type. Using the proposed approach, a software package for creating training datasets from multilevel sources at the L7, L4, L3 levels of the OSI model has been developed. In contrast to existing works, real data of network activity as well as long time intervals are used. The proposed approach allows to use the obtained training sets to create more effective methods of intrusion detection and prevention using machine learning techniques. (In Russian).
Keywords: Internet, network security, cyber threats, network traffic analysis, datasets, machine learning
MSC-2020
68M25; 68-11, 62N86For citation: Dmitry D. Kononov, Sergey V. Isaev. Using multilevel data sources to prepare training sets for cyberattack detection. Program Systems: Theory and Applications, 2025, 16:4, pp. 267–285. (In Russ.). https://psta.psiras.ru/2025/4_267-285.
Full text of article (PDF): https://psta.psiras.ru/read/psta2025_4_267-285.pdf.
The article was submitted 10.07.2025; approved after reviewing 16.07.2025; accepted for publication 03.10.2025; published online 27.11.2025.