Various capabilities for static malware analysis.

Overview

Malchive

The malchive serves as a compendium for a variety of capabilities mainly pertaining to malware analysis, such as scripts supporting day to day binary analysis and decoder modules for various components of malicious code.

The goals behind the 'malchive' are to:

  • Allow teams to centralize efforts made in this realm and enforce communication and continuity
  • Have a shared corpus of tools for people to build on
  • Enforce clean coding practices
  • Allow others to interface with project members to develop their own capabilities
  • Promote a positive feedback loop between Threat Intel and Reverse Engineering staff
  • Make static file analysis more accessible
  • Serve as a vehicle to communicate the unique opportunity space identified via deep dive analysis

Documentation

At its core, malchive is a bunch of standalone scripts organized in a manner that the authors hope promotes the project's goals.

To view the documentation associated with this project, checkout the wiki page!

Scripts within the malchive are split up into the following core categories:

  • Utilities - These scripts may be run standalone to assist with static binary analysis or as modules supporting a broader program. Utilities always have a standalone component.
  • Helpers - These modules primarily serve to assist components in one or more of the other categories. They generally do not have a stand-alone component and instead serve the intents of those that do.
  • Binary Decoders - The purpose of scripts in this category is to retrieve, decrypt, and return embedded data (typically inside malware).
  • Active Discovery - Standalone scripts designed to emulate a small portion of a malware family's protocol for the purposes of discovering active controllers.

Installation

The malchive is a packaged distribution that is easily installed and will automatically create console stand-alone scripts.

Steps

You will need to install some dependencies for some of the required Python modules to function correctly.

  • First do a source install of YARA and make sure you compile using --dotnet
  • Next source install the YARA Python package.
  • Ensure you have sqlite3-dev installed
    • Debian: libsqlite3-dev
    • Red Hat: sqlite-devel / pip install pysqlite3

You can then clone the malchive repo and install...

  • pip install . when in the parent directory.
  • To remove, just pip uninstall malchive

Scripts

Console scripts stemming from utilities are appended with the prefix malutil, decoders are appended with maldec, and active discovery scripts are appended with maldisc. This allows for easily identifiable malchive scripts via tab autocompletion.

; running superstrings from cmd line
malutil-superstrings 1.exe -ss
0x9535 (stack) lstrlenA
0x9592 (stack) GetFileSize
0x95dd (stack) WriteFile
0x963e (stack) CreateFileA
0x96b0 (stack) SetFilePointer
0x9707 (stack) GetSystemDirectoryA

; running a decoder from cmd line
maldec-pivy test.exe_
{
    "MD5": "2973ee05b13a575c06d23891ab83e067",
    "Config": {
        "PersistActiveSetupName": "StubPath",
        "DefaultBrowserKey": "SOFTWARE\\Classes\\http\\shell\\open\\command",
        "PersistActiveSetupKeyPart": "Software\\Microsoft\\Active Setup\\Installed Components\\",
        "ServerId": "TEST - WIN_XP",
        "Callbacks": [
            {
                "callback": "192.168.1.104",
                "protocol": "Direct",
                "port": 3333
            },
            {
                "callback": "192.168.1.111",
                "protocol": "Direct",
                "port": 4444
            }
        ],
        "ProxyCfgPresent": false,
        "Password": "test$321$",
        "Mutex": ")#V0qA.I4",
        "CopyAsADS": true,
        "Melt": true,
        "InjectPersist": true,
        "Inject": true
    }
}

; cmd line use with other common utilities
echo -ne 'eJw9kLFuwzAMRIEC7ZylrVGgRSFZiUbBZmwqsMUP0VfcnuQn+rMde7KLTBIPj0ce34tHyMUJjrnw
p3apz1kicjoJrDRlQihwOXmpL4RmSR5qhEU9MqvgWo8XqGMLJd+sKNQPK0dIGjK+e5WANIT6NeOs
k2mI5NmYAmcrkbn4oLPK5gZX+hVlRoKloMV20uQknv2EPunHKQtcig1cpHY4Jodie5pRViV+rp1t
629J6Dyu4hwLR97LINqY5rYILm1hhlvinoyJZavOKTrwBHTwpZ9yPSzidUiPt8PUTkZ0FBfayWLp
a71e8U8YDrbtu0aWDj+/eBOu+jRkYabX+3hPu9LZ5fb41T+7fmRf' | base64 -d | zlib-flate -uncompress | malutil-xor - [KEY]

Interfacing

Utilities, decoders, and discovery scripts in this collection are designed to support single ad-hoc analysis as well as inclusion into other frameworks. After installation, the malchive should be part of your Python path. At this point accessing any of the scripts is straight forward.

Here are a few examples:

; accessing decoder modules
import sys
from malchive.decoders import testdecoder

p = testdecoder.GetConfig(open(sys.argv[1], 'rb').read())
print('password', p.rc4_key)
for c in p.callbacks:
    print('c2 address', c)

; accessing utilities
from malchive.utilities import xor
ret = xor.GenericXor(buff=b'testing', key=[0x51], count=0xff)
print(ret.run_crypt())

; accessing helpers
from malchive.helpers import winfunc
key = winfunc.CryptDeriveKey(b'testdatatestdata')

To understand more about a given module, see the associated wiki entry.

Contributing

Contributing to the malchive is easy, just ensure the following requirements are met:

  • When writing utilities, decoders, or discovery scripts, consider using the available templates or review existing code if you're not sure how to get started.
  • Make sure modification or contributions pass pre-commit tests.
  • Ensure the contribution is placed in one of the component folders.
  • Updated the setup file if needed with an entry.
  • Python3 is a must.

Legal

©2021 The MITRE Corporation. ALL RIGHTS RESERVED.

Approved for Public Release; Distribution Unlimited. Public Release Case Number 21-0153

Owner
MITRE Cybersecurity
MITRE Cybersecurity
This repo is to provide a list of literature regarding Deep Learning on Graphs for NLP

This repo is to provide a list of literature regarding Deep Learning on Graphs for NLP

Graph4AI 230 Nov 22, 2022
Creating a Feed of MISP Events from ThreatFox (by abuse.ch)

ThreatFox2Misp Creating a Feed of MISP Events from ThreatFox (by abuse.ch) What will it do? This will fetch IOCs from ThreatFox by Abuse.ch, convert t

17 Nov 22, 2022
Uses Google's gTTS module to easily create robo text readin' on command.

Tool to convert text to speech, creating files for later use. TTRS uses Google's gTTS module to easily create robo text readin' on command.

0 Jun 20, 2021
TLA - Twitter Linguistic Analysis

TLA - Twitter Linguistic Analysis Tool for linguistic analysis of communities TLA is built using PyTorch, Transformers and several other State-of-the-

Tushar Sarkar 47 Aug 14, 2022
Phrase-Based & Neural Unsupervised Machine Translation

Unsupervised Machine Translation This repository contains the original implementation of the unsupervised PBSMT and NMT models presented in Phrase-Bas

Facebook Research 1.5k Dec 28, 2022
Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

GPT2-NewsTitle 带有超详细注释的GPT2新闻标题生成项目 UpDate 01.02.2021 从网上收集数据,将清华新闻数据、搜狗新闻数据等新闻数据集,以及开源的一些摘要数据进行整理清洗,构建一个较完善的中文摘要数据集。 数据集清洗时,仅进行了简单地规则清洗。

logCong 785 Dec 29, 2022
NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

NeuralQA: A Usable Library for (Extractive) Question Answering on Large Datasets with BERT Still in alpha, lots of changes anticipated. View demo on n

Victor Dibia 220 Dec 11, 2022
DataCLUE: 国内首个以数据为中心的AI测评(含模型分析报告)

DataCLUE 以数据为中心的AI测评(DataCLUE) DataCLUE: A Chinese Data-centric Language Evaluation Benchmark 内容导引 章节 描述 简介 介绍以数据为中心的AI测评(DataCLUE)的背景 任务描述 任务描述 实验结果

CLUE benchmark 135 Dec 22, 2022
This is the 25 + 1 year anniversary version of the 1995 Rachford-Rice contest

Rachford-Rice Contest This is the 25 + 1 year anniversary version of the 1995 Rachford-Rice contest. Can you solve the Rachford-Rice problem for all t

13 Sep 20, 2022
(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Towards Abstractive Grounded Summarization of Podcast Transcripts We provide the source code for the paper "Towards Abstractive Grounded Summarization

10 Jul 01, 2022
Text classification on IMDB dataset using Keras and Bi-LSTM network

Text classification on IMDB dataset using Keras and Bi-LSTM Text classification on IMDB dataset using Keras and Bi-LSTM network. Usage python3 main.py

Hamza Rashid 2 Sep 27, 2022
A minimal Conformer ASR implementation adapted from ESPnet.

Conformer ASR A minimal Conformer ASR implementation adapted from ESPnet. Introduction I want to use the pre-trained English ASR model provided by ESP

Niu Zhe 3 Jan 24, 2022
Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Alexander Veysov 3.2k Dec 31, 2022
DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task。涵盖68个领域、共计916万词的专业词典知识库,可用于文本分类、知识增强、领域词汇库扩充等自然语言处理应用。

liuhuanyong 357 Dec 24, 2022
Exploring dimension-reduced embeddings

sleepwalk Exploring dimension-reduced embeddings This is the code repository. See here for the Sleepwalk web page. License and disclaimer This program

S. Anders's research group at ZMBH 91 Nov 29, 2022
A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:

A Python package implementing a new model for text classification with visualization tools for Explainable AI 🍣 Online live demos: http://tworld.io/s

Sergio Burdisso 285 Jan 02, 2023
Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP

Pretrain and Fine-tune a T5 model with Flax on GCP This tutorial details how pretrain and fine-tune a FlaxT5 model from HuggingFace using a TPU VM ava

Gabriele Sarti 41 Nov 18, 2022
Deduplication is the task to combine different representations of the same real world entity.

Deduplication is the task to combine different representations of the same real world entity. This package implements deduplication using active learning. Active learning allows for rapid training wi

63 Nov 17, 2022
A Structured Self-attentive Sentence Embedding

Structured Self-attentive sentence embeddings Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR

Kaushal Shetty 488 Nov 28, 2022
Knowledge Management for Humans using Machine Learning & Tags

HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your t

Ravn Tech, Inc. 166 Jan 07, 2023