A machine learning malware analysis framework for Android apps.

Overview

๐Ÿ•ต๏ธ A machine learning malware analysis framework for Android apps. โ˜ข๏ธ


DroidDetective is a Python tool for analysing Android applications (APKs) for potential malware related behaviour and configurations. When provided with a path to an application (APK file) Droid Detective will make a prediction (using it's ML model) of if the application is malicious. Features and qualities of Droid Detective include:

  • Analysing which of ~330 permissions are specified in the application's AndroidManifest.xml file. ๐Ÿ™…
  • Analysing the number of standard and proprietary permissions in use in the application's AndroidManifest.xml file. ๐Ÿงฎ
  • Using a RandomForest machine learning classifier, trained off the above data, from ~14 malware families and ~100 Google Play Store applications. ๐Ÿ’ป

๐Ÿค– Getting Started

Installation

All DroidDetective dependencies can be installed manually or via the requirements file, with

pip install -r REQUIREMENTS.txt

DroidDetective has been tested on both Windows 10 and Ubuntu 18.0 LTS.

Usage

DroidDetective can be run by providing the Python file with an APK as a command line parameter, such as:

python DroidDetective.py myAndroidApp.apk

If an apk_malware.model file is not present, then the tooling will first train the model and will require a training set of APKs in both a folder at the root of the project called malware and another called normal. Once run successfully a result will be printed onto the CLI on if the model has identified the APK to be malicious or benign. An example of this output can be seen below:

>> Analysed file 'com.android.camera2.apk', identified as not malware.

An additional parameter can be provided to DroidDetective.py as a Json file to save the results to. If this Json file already exists the results of this run will be appended to the Json file.

python DroidDetective.py myAndroidApp.apk output.json

An example of this output Json is as follows:

{
    "com.android.camera2": false,
}

โš—๏ธ Data Science | The ML Model

DroidDetective is a Python tool for analyzing Android applications (APKs) for potential malware related behaviour. This works by training a Random Forest classifier on information derived from both known malware APKs and standard APKs available on the Android app store. This tooling comes pre-trained, however, the model can be re-trained on a new dataset at any time. โš™๏ธ

This model currently uses permissions from an APKs AndroidManifest.xml file as a feature set. This works by creating a dictionary of each standard Android permission and setting the feature to 1 if the permission is present in the APK. Similarly, a feature is added for the amount of permissions in use in the manifest and for the amount of unidentified permissions found in the manifest.

The pre-trained model was trained off approximately 14 malware families (each with one or more APK files), located from ashisdb's repository, and approximately 100 normal applications located from the Google Play Store.

The below denotes the statistics for this ML model:

Accuracy: 0.9310344827586207
Recall: 0.9166666666666666
Precision: 0.9166666666666666
F-Measure: 0.9166666666666666

The top 10 highest weighted features (i.e. Android permissions) used by this model, for identifying malware, can be seen below:

"android.permission.SYSTEM_ALERT_WINDOW": 0.019091367939223395,
"android.permission.ACCESS_NETWORK_STATE": 0.021001765263234648,
"android.permission.ACCESS_WIFI_STATE": 0.02198962579120518,
"android.permission.RECEIVE_BOOT_COMPLETED": 0.026398914436102188,
"android.permission.GET_TASKS": 0.03595458598076517,
"android.permission.WAKE_LOCK": 0.03908212881520419,
"android.permission.WRITE_SMS": 0.057041576632290585,
"android.permission.INTERNET": 0.08816028225034145,
"android.permission.WRITE_EXTERNAL_STORAGE": 0.09835914154294739,
"other_permission": 0.10189463965313218,
"num_of_permissions": 0.12392224814084198

๐Ÿ“œ License

GNU General Public License v3.0

Owner
James Stevenson
Iโ€™m a Software Engineer and Security Researcher, with a background of over five years in the computer security industry.
James Stevenson
My implementation of transformers related papers for computer vision in pytorch

vision_transformers This is my personnal repo to implement new transofrmers based and other computer vision DL models I am currenlty working without a

samsja 1 Nov 10, 2021
Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases.

Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases. Ivy wraps the functional APIs of existing frameworks. Framework-agnostic functions, libraries an

Ivy 8.2k Jan 02, 2023
Key information extraction from invoice document with Graph Convolution Network

Key Information Extraction from Scanned Invoices Key information extraction from invoice document with Graph Convolution Network Related blog post fro

Phan Hoang 39 Dec 16, 2022
Improving Factual Consistency of Abstractive Text Summarization

Improving Factual Consistency of Abstractive Text Summarization We provide the code for the papers: "Entity-level Factual Consistency of Abstractive T

61 Nov 27, 2022
DCSAU-Net: A Deeper and More Compact Split-Attention U-Net for Medical Image Segmentation

DCSAU-Net: A Deeper and More Compact Split-Attention U-Net for Medical Image Segmentation By Qing Xu, Wenting Duan and Na He Requirements pytorch==1.1

Qing Xu 20 Dec 09, 2022
implement of SwiftNet:Real-time Video Object Segmentation

SwiftNet The official PyTorch implementation of SwiftNet:Real-time Video Object Segmentation, which has been accepted by CVPR2021. Requirements Python

haochen wang 64 Dec 14, 2022
[CVPR 2021] Unsupervised Degradation Representation Learning for Blind Super-Resolution

DASR Pytorch implementation of "Unsupervised Degradation Representation Learning for Blind Super-Resolution", CVPR 2021 [arXiv] Overview Requirements

Longguang Wang 318 Dec 24, 2022
Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

THUDM 101 Dec 16, 2022
a morph transfer UGATIT for image translation.

Morph-UGATIT a morph transfer UGATIT for image translation. Introduction ไธญๆ–‡ๆŠ€ๆœฏๆ–‡ๆกฃ This is Pytorch implementation of UGATIT, paper "U-GAT-IT: Unsupervise

55 Nov 14, 2022
Uncertain natural language inference

Uncertain Natural Language Inference This repository hosts the code for the following paper: Tongfei Chen*, Zhengping Jiang*, Adam Poliak, Keisuke Sak

Tongfei Chen 14 Sep 01, 2022
A simple tutoral for error correction task, based on Pytorch

gramcorrector A simple tutoral for error correction task, based on Pytorch Grammatical Error Detection (sentence-level) a binary sequence-based classi

peiyuan_gong 8 Dec 03, 2022
Deep Learning for humans

Keras: Deep Learning for Python Under Construction In the near future, this repository will be used once again for developing the Keras codebase. For

Keras 57k Jan 09, 2023
Code accompanying "Dynamic Neural Relational Inference" from CVPR 2020

Code accompanying "Dynamic Neural Relational Inference" This codebase accompanies the paper "Dynamic Neural Relational Inference" from CVPR 2020. This

Colin Graber 48 Dec 23, 2022
Code base for NeurIPS 2021 publication titled Kernel Functional Optimisation (KFO)

KernelFunctionalOptimisation Code base for NeurIPS 2021 publication titled Kernel Functional Optimisation (KFO) We have conducted all our experiments

2 Jun 29, 2022
[ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction

REval Table of Contents Introduction Overview Requirements Installation Probing Usage Citation License ๐ŸŽ“ Introduction REval is a simple framework for

13 Jan 06, 2023
Convolutional Neural Network for 3D meshes in PyTorch

MeshCNN in PyTorch SIGGRAPH 2019 [Paper] [Project Page] MeshCNN is a general-purpose deep neural network for 3D triangular meshes, which can be used f

Rana Hanocka 1.4k Jan 04, 2023
Code of paper: "DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks"

DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks Abstract: Adversarial training has been proven to

ๅ€ชไป•ๆ–‡ (Shiwen Ni) 58 Nov 10, 2022
Subdivision-based Mesh Convolutional Networks

Subdivision-based Mesh Convolutional Networks The official implementation of SubdivNet in our paper, Subdivion-based Mesh Convolutional Networks Requi

Zheng-Ning Liu 181 Dec 28, 2022
Lite-HRNet: A Lightweight High-Resolution Network

LiteHRNet Benchmark ๐Ÿ”ฅ ๐Ÿ”ฅ Based on MMsegmentation ๐Ÿ”ฅ ๐Ÿ”ฅ Cityscapes FCN resize concat config mIoU last mAcc last eval last mIoU best mAcc best eval bes

16 Dec 12, 2022
Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

Octavio Arriaga 5.3k Dec 30, 2022