About the Dataset

The dataset is a substantial, well-organized Android Malware dataset that serves as a valuable resource for researchers specializing in the field of Android malware forensics. It encompasses a comprehensive array of malware families in their original APK format, offering extensive opportunities for in-depth analysis. The dataset comprises approximately 47,971 malware APK samples, representing 345 distinct malware families. In addition, the dataset contains 7488 benign APK samples. The collection incorporates a diverse range of sources, including official datasets and code repositories. Spanning from prominent malware families prevalent in 2010 to the most recent ones discovered in 2024, the dataset exhibits a temporal breadth. To facilitate analysis and comprehension, we have classified the gathered 345 malware families into seven distinct categories, as illustrated in Figure 1.

The dataset also consists of a meticulously collected set of images derived from the APKs previously mentioned, which encompasses a diverse range of malware families, providing researchers with ample opportunities for in-depth analysis. The original Maliod-DS dataset contains approximately 47,971 malware APK samples, representing 345 distinct malware families. These families have been meticulously categorized into seven specific categories, namely Adware, Backdoor, Banking, Ransomware, Riskware, SMSMalware, and Spyware. Also, the dataset contains 7488 benign APK samples.

To create the image-based dataset from our Maliod-DS dataset, we directly converted the collected APKs into 1D 8-bit binary vectors, without employing any preprocessing techniques during the conversion process. Subsequently, these binary vectors were transformed into visually represented 2D images, available in both color and grayscale formats. As a result, two versions of the image-based dataset were generated, one in color format and the other in grayscale format.

Dataset Image
Figure 1: A high-level view of the 7 categories with the total number of family samples in each.

How to Cite

If you use this dataset, please cite the following paper:

I. Almomani, T. Almashat and W. El-Shafai, "Maloid-DS: Labeled Dataset for Android Malware Forensics," in IEEE Access, doi: 10.1109/ACCESS.2024.3400211. 

BibTeX

@ARTICLE{10529242,
  author={Almomani, Iman and Almashat, Tala and El-Shafai, Walid},
  journal={IEEE Access}, 
  title={Maloid-DS: Labeled Dataset for Android Malware Forensics}, 
  year={2024},
  volume={},
  number={},
  pages={1-1},
  keywords={Malware;Operating systems;Training;Computer viruses;Computer science;COVID-19;Artificial intelligence;Androids;Forensics;Labeling;Deep learning;Detection algorithms;Classification algorithms;Computer security;Android OS;Malware forensics;Labeled datasets;Deep learning;Malware analysis;Detection and classification;Cybersecurity applications},
  doi={10.1109/ACCESS.2024.3400211}
}

Acknowledgement

SEL would like to acknowledge the support of Prince Sultan University.

Dataset Access

Please send an application email to sel@psu.edu.sa stating the following,

  • The name of your research institution
  • The name of the person requesting access

Make sure to send your application from your university (or research institution) email account.

Top