QSynthesis is a Python3 API to perform I/O based program synthesis of bitvector expressions.

Overview

Qsynthesis

QSynthesis is a Python3 API to perform I/O based program synthesis of bitvector expressions. It aims at facilitating code deobfuscation. The algorithm is greybox approach combining both a blackbox I/O based synthesis and a whitebox AST search to synthesize sub-expressions (if the root node cannot be synthesized).

This algorithm as originaly been described at the BAR academic workshop:

The code has been release as part of the following Black Hat talk:

Disclaimer: This framework is experimental, and shall only be used for experimentation purposes. It mainly aims at stimulating research in this area.

Documentation

The installation, examples, and API documentation is available on the dedicated documentation: Documentation

Functionalities

The core synthesis is based on Triton symbolic engine on which is built the whole framework. It provides the following functionalities:

  • synthesis of bitvector expressions
  • ability to check through SMT the semantic equivalence of synthesized expressions
  • ability to synthesize constants (if the expression encode a constant)
  • ability to improve oracles (pre-computed tables) overtime through a learning mechanism
  • ability to reassemble synthesized expression back to assembly
  • ability to serve oracles through a REST API to facilitate the synthesis usage
  • an IDA plugin providing an integration of the synthesis

Quick start

Installation

In order to work Triton first has to be installed: install documentation. Triton does not automatically install itself in a virtualenv, copy it in your venv or use --system-site-packages when configuring your venv.

Then:

$ git clone [email protected]:synthesis/qsynthesis.git
$ cd qsynthesis
$ pip3 install '.[all]'

The [all] will installed all dependencies (see the documentation for a light install).

Table generation

The synthesis algorithm requires generating oracle tables derived from a grammar (a set of variables and operators). Qsynthesis installation provides the utility qsynthesis-table-manager enabling manipulating tables. The following command generate a table with 3 variables of 64 bits, 5 operators using a vector of 16 inputs. We limit the generation to 5 million entries.

$ qsynthesis-table-manager generate -bs 64 --var-num 3 --input-num 16 --random-level 5 --ops AND,NEG,MUL,XOR,NOT --watchdog 80 --limit 5000000 my_oracle_table
Generate Table
Watchdog value: 80.0
Depth 2 (size:3) (Time:0m0.23120s)
Depth 3 (size:21) (Time:0m0.23198s)
Depth 4 (size:574) (Time:0m0.26068s)
Depth 5 (size:400858) (Time:0m21.23231s)
Threshold reached, generation interrupted
Stop required
Depth 5 (size:5000002) (Time:4m52.56009s) [RAM:9.52Gb]

Note: The generation process is RAM consuming the --watchdog enables setting a percentage of the RAM above which the generation is interrupted.

Synthesizing a bitvector expression

We then can try simplifying a seemingly obfuscated expression with:

from qsynthesis import SimpleSymExec, TopDownSynthesizer, InputOutputOracleLevelDB

blob = b'UH\x89\xe5H\x89}\xf8H\x89u\xf0H\x89U\xe8H\x89M\xe0L\x89E\xd8H\x8bE' \
       b'\xe0H\xf7\xd0H\x0bE\xf8H\x89\xc2H\x8bE\xe0H\x01\xd0H\x8dH\x01H\x8b' \
       b'E\xf8H+E\xe8H\x8bU\xe8H\xf7\xd2H\x0bU\xf8H\x01\xd2H)\xd0H\x83\xe8' \
       b'\x02H!\xc1H\x8bE\xe0H\xf7\xd0H\x0bE\xf8H\x89\xc2H\x8bE\xe0H\x01\xd0' \
       b'H\x8dp\x01H\x8bE\xf8H+E\xe8H\x8bU\xe8H\xf7\xd2H\x0bU\xf8H\x01\xd2' \
       b'H)\xd0H\x83\xe8\x02H\t\xf0H)\xc1H\x89\xc8H\x83\xe8\x01]\xc3'

# Perform symbolic execution of the instructions
symexec = SimpleSymExec("x86_64")
symexec.initialize_register('rip', 0x40B160)  # arbitrary address
symexec.initialize_register('rsp', 0x800000)  # arbitrary stack
symexec.execute_blob(blob, 0x40B160)
rax = symexec.get_register_ast("rax")  # retrieve rax register expressions

# Load lookup tables
ltm = InputOutputOracleLevelDB.load("my_oracle_table")

# Perform Synthesis of the expression
synthesizer = TopDownSynthesizer(ltm)
synt_rax, simp = synthesizer.synthesize(rax)

print(f"expression: {rax.pp_str}")
print(f"synthesized expression: {synt_rax.pp_str} [{simp}]")

Limitations

  • synthesis accuracy limited by pre-computed tables exhaustivness
  • table generation limited by RAM consumption
  • reassembly cannot involve memory variable, destination is necessarily a register and architecture depends on llvmlite (thus mostly x86_64)
  • the code references trace-based synthesis which is disabled (as the underlying framework is not yet open-source)

Authors

  • Robin David (@RobinDavid), Quarkslab

Contributors

Huge thanks to contributors to this research:

  • Luigi Coniglio
  • Jonathan Salwan
You might also like...
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

A stack-based systems language that supports structures, functions, expressions, and user-defined operator behaviour

A stack-based systems language that supports structures, functions, expressions, and user-defined operator behaviour. Currently compiles to URCL with plans to add additional formats in the future.

a BTC mining program based on python3

BTC-Miner a BTC mining program based on python3 Our project refers to the nightminer project by ricmoo, which is written in Python2 (https://github.co

texlive expressions for documents

tex2nix Generate Texlive environment containing all dependencies for your document rather than downloading gigabytes of texlive packages. Installation

Custom Python linting through AST expressions
Custom Python linting through AST expressions

bellybutton bellybutton is a customizable, easy-to-configure linting engine for Python. What is this good for? Tools like pylint and flake8 provide, o

Graphical Python debugger which lets you easily view the values of all evaluated expressions
Graphical Python debugger which lets you easily view the values of all evaluated expressions

birdseye birdseye is a Python debugger which records the values of expressions in a function call and lets you easily view them after the function exi

A collection of common regular expressions bundled with an easy to use interface.

CommonRegex Find all times, dates, links, phone numbers, emails, ip addresses, prices, hex colors, and credit card numbers in a string. We did the har

Turning SymPy expressions into PyTorch modules.

sympytorch A micro-library as a convenience for turning SymPy expressions into PyTorch Modules. All SymPy floats become trainable parameters. All SymP

Turning SymPy expressions into JAX functions

sympy2jax Turn SymPy expressions into parametrized, differentiable, vectorizable, JAX functions. All SymPy floats become trainable input parameters. S

The goal of this library is to generate more helpful exception messages for numpy/pytorch matrix algebra expressions.
The goal of this library is to generate more helpful exception messages for numpy/pytorch matrix algebra expressions.

Tensor Sensor See article Clarifying exceptions and visualizing tensor operations in deep learning code. One of the biggest challenges when writing co

A collection of common regular expressions bundled with an easy to use interface.

CommonRegex Find all times, dates, links, phone numbers, emails, ip addresses, prices, hex colors, and credit card numbers in a string. We did the har

Enable ++x and --x expressions in Python

By default, Python supports neither pre-increments (like ++x) nor post-increments (like x++). However, the first ones are syntactically correct since Python parses them as two subsequent +x operations, where + is the unary plus operator (same with --x and the unary minus). They both have no effect, since in practice -(-x) == +(+x) == x.

Karen is a Discord Bot that will check for a list of forbidden words/expressions, removing the message that contains them and replying with another message.

Karen is a Discord Bot that will check for a list of forbidden words/expressions, removing the message that contains them and replying with another message. Everything is highly customizable.

MacroTools provides a library of tools for working with Julia code and expressions.

MacroTools.jl MacroTools provides a library of tools for working with Julia code and expressions. This includes a powerful template-matching system an

A library for pattern matching on symbolic expressions in Python.
A library for pattern matching on symbolic expressions in Python.

MatchPy is a library for pattern matching on symbolic expressions in Python. Work in progress Installation MatchPy is available via PyPI, and

poetry2nix turns Poetry projects into Nix derivations without the need to actually write Nix expressions

poetry2nix poetry2nix turns Poetry projects into Nix derivations without the need to actually write Nix expressions. It does so by parsing pyproject.t

Comments
  • unrolling issue with PlaceHolderSynthesizer

    unrolling issue with PlaceHolderSynthesizer

    The PlaceHolderSynthesizer seem's not to properly replace placeholder variable during the synthesis process, on some use-cases. The resulting synthesized expression is thus completely wrong. I need to dig deeper.

    opened by RobinDavid 7
Owner
Quarkslab
Quarkslab
Python library for reading and writing tabular data via streams.

tabulator-py A library for reading and writing tabular data (csv/xls/json/etc). [Important Notice] We have released Frictionless Framework. This frame

Frictionless Data 231 Dec 09, 2022
Import Python modules from any file system path

pathimp Import Python modules from any file system path. Installation pip3 install pathimp Usage import pathimp

Danijar Hafner 2 Nov 29, 2021
QSynthesis is a Python3 API to perform I/O based program synthesis of bitvector expressions.

QSynthesis is a Python3 API to perform I/O based program synthesis of bitvector expressions. It aims at facilitating code deobfuscation. The algorithm is greybox approach combining both a blackbox I/

Quarkslab 103 Dec 30, 2022
csv2ir is a script to convert ir .csv files to .ir files for the flipper.

csv2ir csv2ir is a script to convert ir .csv files to .ir files for the flipper. For a repo of .ir files, please see https://github.com/logickworkshop

Alex 38 Dec 31, 2022
Swiss army knife for Apple's .tbd file manipulation

Description Inspired by tbdswizzler, this simple python tool for manipulating Apple's .tbd format. Installation python3 -m pip install --user -U pytbd

10 Aug 31, 2022
RMfuse provides access to your reMarkable Cloud files in the form of a FUSE filesystem

RMfuse provides access to your reMarkable Cloud files in the form of a FUSE filesystem. These files are exposed either in their original format, or as PDF files that contain your annotations. This le

Robert Schroll 82 Nov 24, 2022
Small Python script to generate a calendar (.ics) file from SIMASTER courses schedule.

simaster.ics Small Python script to generate a calendar (.ics) file from SIMASTER courses schedule. Usage Getting the events.json file from SIMASTER O

Faiz Jazadi 8 Nov 02, 2022
Dragon Age: Origins toolset to extract/build .erf files, patch language-specific .dlg files, and view the contents of files in the ERF or GFF format

DAOTools This is a set of tools for Dragon Age: Origins modding. It can patch the text lines of .dlg files, extract and build an .erf file, and view t

8 Dec 06, 2022
dotsend is a web application which helps you to upload your large files and share file via link

dotsend is a web application which helps you to upload your large files and share file via link

Devocoe 0 Dec 03, 2022
A simple Python code that takes input from a csv file and makes it into a vcf file.

Contacts-Maker A simple Python code that takes input from a csv file and makes it into a vcf file. Imagine a college or a large community where each y

1 Feb 13, 2022
Python package to read and display segregated file names present in a directory based on type of the file

tpyfilestructure Python package to read and display segregated file names present in a directory based on type of the file. Installation You can insta

Tharun Kumar T 2 Nov 28, 2021
Instant Fuzzy File Search for Alfred

List all the files inside a folder using fd, and instantly fuzzy-search through all of them using fzf, all from inside Alfred with a single keyword: fzf.

Mr. Pennyworth 37 Nov 30, 2022
A python script to pull the transactions of an Algorand wallet and put them into a CSV file.

AlgoCSV A python script to pull the transactions of an Algorand wallet and put them into a CSV file. Dependancies: Requests Main features: Groups: Com

21 Jun 25, 2022
Pure Python tools for reading and writing all TIFF IFDs, sub-IFDs, and tags.

Tiff Tools Pure Python tools for reading and writing all TIFF IFDs, sub-IFDs, and tags. Developed by Kitware, Inc. with funding from The National Canc

Digital Slide Archive 32 Dec 14, 2022
Media file renamer and organizion tool

mnamer mnamer (media renamer) is an intelligent and highly configurable media organization utility. It parses media filenames for metadata, searches t

Jessy Williams 533 Dec 29, 2022
Add Ranges and page numbers to IIIF Manifest from a CSV.

Add Ranges and page numbers to IIIF Manifest from CSV specific to a workflow of the Bibliotheca Hertziana.

Raffaele Viglianti 3 Apr 28, 2022
File-manager - A basic file manager, written in Python

File Manager A basic file manager, written in Python. Installation Install Pytho

Samuel Ko 1 Feb 05, 2022
Python codes for the server and client end that facilitates file transfers. (Using AWS EC2 instance as the server)

Server-and-Client-File-Transfer Python codes for the server and client end that facilitates file transfers. I will be using an AWS EC2 instance as the

Amal Farhad Shaji 2 Oct 13, 2021
CSV-Handler written in Python3

CSVHandler This code allows you to work intelligently with CSV files. A file in CSV syntax is converted into several lists, which are combined in a to

Max Tischberger 1 Jan 13, 2022
Organizer is a python program that organizes your downloads folder

Organizer Organizer is a python program that organizes your downloads folder, it can run as a service and so will start along with the system, and the

Gustavo 2 Oct 18, 2021