Manuscript received July 25, 2025; revised August 30; accepted September 5, 2025; published September 26, 2025
Abstract—Malware remains a persistent and growing threat in the digital space, making it essential for the development of accurate and efficient detection techniques. This study proposes a hybrid deep learning model that combines Convolutional Neural Networks (CNNs), Autoencoders (AEs), and Vision Transformers (ViTs) for malware classification using Portable Executable (PE) header metadata, evaluated using the ClaMP_Integrated-5184 dataset. In the Proposed Architecture, the CNN component extracts local spatial features, the Autoencoder compresses and denoises the feature space, and the Vision Transformer captures global dependencies for robust classification. The results show that the model achieved a classification accuracy of 98% and an F1-score of 98%, outperforming benchmark and state-of-the-art models. Findings highlight the effectiveness of hybrid deep learning architectures in static malware detection and demonstrate a promising approach for enhancing real-time malware detection systems.
Keywords—CNN, Autoencoders, Vision Transformer, malware detection, PE Headers, static malware classification
Cite: Nanji Emmanuella Lakan, Oluwaseyi Ezekiel Olorunshola, Fatimah Adamu-Fika, and Samaila Musa Abdullahi, "Malware Classification Using a Hybrid CNN-Autoencoder and Transformer Encoded Approach," International Journal of Machine Learning vol. 15, no. 4, pp. 71-77, 2025.
Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).