Building robust malware detection through conditional Generative Adversarial Network-based data augmentation

Elshan Baghirov

Program Systems: Theory and Applications

ISSN 2079-3316

Bilingual online scientific Online scientific journal of the Ailamazyan Program System Institute of the Ailamazyan PSI of PSI of Russian Academy of Science of RAS

12+

Volume 15 (2024) . Issue 4 (63) . Paper No. 6 (453)

Hardware and software for distributed and supercomputer systems

Research Article

DOI

10.25209/2079-3316-2024-15-4-97-110

Building robust malware detection through conditional Generative Adversarial Network-based data augmentation

Elshan Baghirov

	Institute of Information Technology, Baku. Azerbaijan
	elsenbagirov1995@gmail.com

Abstract. Malware detection is essential in cybersecurity, yet its accuracy is often compromised by class imbalance and limited labeled data. This study leverages conditional Generative Adversarial Networks (cGANs) to generate synthetic malware samples, addressing these challenges by augmenting the minority class.

The cGAN model generates realistic malware samples conditioned on class labels, balancing the dataset without altering the benign class. Applied to the CICMalDroid2020 dataset, the augmented data is used to train a LightGBM model, leading to improved detection accuracy, particularly for underrepresented malware classes.

The results demonstrate the efficacy of cGANs as a robust data augmentation tool, enhancing the performance and reliability of machine learning-based malware detection systems.

Keywords: malware detection, Generative Adversarial Networks, machine learning, cybersecurity, data augmentation

MSC-2020

97P30; 97P20, 97R40 MSC-2020 97-XX: Mathematics education
MSC-2020 97Pxx: Computer science (educational aspects)
MSC-2020 97P30: Systems, databases (educational aspects)
MSC-2020 97P20: Theoretical computer science (educational aspects)

MSC-2020 97-XX: Mathematics education
MSC-2020 97Pxx: Computer science (educational aspects)
MSC-2020 97P30: Systems, databases (educational aspects)
MSC-2020 97P20: Theoretical computer science (educational aspects)

For citation: Elshan Baghirov. Building robust malware detection through conditional Generative Adversarial Network-based data augmentation. Program Systems: Theory and Applications, 2024, 15:4, pp. 97–110. https://psta.psiras.ru/2024/4_97-110.

Full text of article (PDF): https://psta.psiras.ru/read/psta2024_4_97-110.pdf.

The article was submitted 05.12.2024; approved after reviewing 07.12.2024; accepted for publication 07.12.2024; published online 10.12.2024.

2024

Editorial address: Ailamazyan Program Systems Institute of the Russian Academy of Sciences, Peter the First Street 4«a», Veskovo village, Pereslavl area, Yaroslavl region, 152021 Russia; Website: http://psta.psiras.ru

Phone: +7(4852) 695-228; E-mail: ; License: CC-BY-4.0 License text on the Creative Commons site