FairFuzz: AFL extension targeting rare branches

Related tags

Deep Learningafl-rb
Overview

FairFuzz

An AFL extension to increase code coverage by targeting rare branches. FairFuzz has a particular advantage on programs with highly nested structure (packet analyzers, xmllint, programs compiled with laf-inte, etc). AFL is written and maintained by Michal Zalewski [email protected]; FairFuzz extension by Caroline Lemieux [email protected].

Our paper on FairFuzz was published in ASE 2018.

Summary

This is a modified version of AFL which changes (1) the way in which AFL selects input for mutation and (2) the way in which AFL mutates inputs to target rare "branches" in the program under test. AFL keeps track of program coverage an input achieves by counting the number of times the input hits a Basic Block Transition (see AFL technical details for more detail). This basic block transition can be loosely associated with a branch in the control flow graph, thus we call refer to these transitions as branches.

On many benchmarks, this modification achieves faster branch coverage than AFL or AFLFast. The advantage seems particularly strong on programs structured with many nested conditional statements. The graphs below show the number of basic block transitions covered over 24 hours on 9 different benchmarks. Each run was repeated 20 times (one fuzzer only): the dark middle line is the average coverage achieved, and the bands represent 95% confidence intervals.

24 hour runs on benchmarks

Evaluation was conducted on on AFL 2.40b. All runs were done with no AFL dictionary: the emphasis on rare branches appears to yield better automated discovery of special sequences. On the tcpdump and xmllint benchmarks, FairFuzz was able to discover rare sequences over the 20 runs, which the other techniques did not discover. Over these 20 techniques each different technique, the number of runs finding portions of the sequence <!ATTLIST after 24 hours was:

subsequence AFL FidgetyAFL AFLFast.new FairFuzz
<!A 7 15 18 17
<!AT 1 2 3 11
<!ATT 1 0 0 1

More details in article.

Technical Summary

What is a rare branch?

We consider a branch to be rare if it is hit by fewer inputs than the other branches explored so far. We maintain a dynamic threshold for rarity: if a branch is hit by fewer inputs than (or exactly as many as) this threshold, it is rare. Precisely, the threshold is the lowest power of two bounding the number of inputs hitting the least-hit branch. For example, if the branch hit by the fewest inputs is hit by 19 inputs, the threshold will be 32, so any branch hit by ≤32 inputs will be rare. Once the rarest branch is hit by 33 inputs, the threshold will go up to 64.

To keep track of rarity information, after running every input, we increment a hit count for each branch.

How are inputs hitting rare branches selected?

When going through the queue to select inputs to mutate, FairFuzz selects inputs only if -- at the time of selection -- they hit a rare branch. The rare branch an input hits is selected as the target branch. If an input hits multiple rare branches, FairFuzz selects the rarest one (the one hit by the fewest inputs) as the target branch. If an input hits no rare branches, it is skipped (see the -q option below).

How are mutations targeted to a branch?

Once an input is selected from the queue, along with its designated target branch, FairFuzz computes a branch mask for the target branch. For every position in the input, the branch mask designates whether

  1. the byte at that position can be overwritten (overwritable),
  2. the byte at that position can be deleted (deleteable), or
  3. a byte can be inserted at that position (insertable),

while still hitting the target branch. FairFuzz computes an approximation to this branch mask dynamically, in a way similar to AFL's effector map. FairFuzz goes through the input, and at each position flips the byte, deletes the byte, or inserts a byte, then runs the mutated input to check whether it hits the target branch. If the mutated input hits the target branch, the position is marked as overwritable, deleteable, or insertable, respectively.

The branch mask is then used to influence mutations as follows:

  1. In the deterministic stages1, a mutation is only performed at a location if the branch mask designates the position as overwritable (or insertable, for dictionary element insert). For multi-byte modifications, a modification is only performed if all bytes are overwritable. The mask is used in conjunction with the effector map when the effector map is used.
  2. In the havoc stage, positions for mutations are randomly selected within the modifiable positions of the branch mask. For example, if bytes 5-10 and 13-20 in a 20-byte input can be overwritten without failing to hit the target branch, when randomly selecting a byte to randomly mutate, FairFuzz will randomly select a position in [5,6,7,8,9,10,13,...,20] to mutate. After a havoc mutation that deletes (resp. adds) a sequence of bytes in the input, FairFuzz deletes the sequence (resp. adds an "all modifiable" sequence) at the corresponding location in the branch mask.2 The branch mask becomes approximate after this point.

1 The mask is not used in the bitflipping stage since this would interfere with AFL's automatic detection of dictionary elements.

2 The mask is not used to determine where to splice inputs in the splicing stage: during splicing, the first part of the branch mask is kept, but the spliced half of the input is marked as all modifiable.

Usage summary

For basic AFL usage, see the README in docs/ (the one with no .md extension). There are four FairFuzz Rare Branches-specific options:

Running options (may be useful for functionality):

  • -r adds an additional trimming stage before mutating inputs. This trimming is more aggressive than AFL's, trimming the input down only according to the target branch -- the resulting trimmed input may have a different path than the original input, but will still hit the target branch. Recommended for use if there are very large seed files and deterministic mutations are being run.
  • -q num bootstraps the rare branch input selection from the queue with the regular AFL input selection mechanism. If after an entire cycle through the queue, no new branches are discovered, bootstrap according to num as follows:
    • -q 1 go back to regular AFL queueing + no branch mask on mutations until a new branch is found
    • -q 2 go back to regular AFL queueing + no branch mask on mutations + no deterministic mutations until a new branch is found
    • -q 3 go back to regular AFL queueing + no branch mask on mutations for a single queueing cycle

Evaluation options (mostly useful for comparing AFL versions):

  • -b disables the branch mask. (sets every position in the mask as modifiable -- will incur unnecessary slowdown compared to AFL)
  • -s runs a "shadow" mutation run before the branch-mask enabled run. Side effects are disabled in this run. This allows for direct comparison of the effect of the branch mask on new coverage discovered/number of inputs hitting the target branch. See min-branch-fuzzing.log produced in the AFL output directory for details.
Owner
Caroline Lemieux
Caroline Lemieux
Code Release for the paper "TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation"

TriBERT This repository contains the code for the NeurIPS 2021 paper titled "TriBERT: Full-body Human-centric Audio-visual Representation Learning for

UBC Computer Vision Group 8 Aug 31, 2022
Code for Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks Under construction. Description Code for Phase diagram of S

Rodrigo Veiga 3 Nov 24, 2022
Fast, modular reference implementation and easy training of Semantic Segmentation algorithms in PyTorch.

TorchSeg This project aims at providing a fast, modular reference implementation for semantic segmentation models using PyTorch. Highlights Modular De

ycszen 1.4k Jan 02, 2023
This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on table detection and table structure recognition.

WTW-Dataset This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on ICCV 2021. Here, you can download the

109 Dec 29, 2022
3D mesh stylization driven by a text input in PyTorch

Text2Mesh [Project Page] Text2Mesh is a method for text-driven stylization of a 3D mesh, as described in "Text2Mesh: Text-Driven Neural Stylization fo

Threedle (University of Chicago) 649 Dec 27, 2022
Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Datset)

Graphlevel-SSL Overview Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Dataset). It is unified framework to co

JunSeok 8 Oct 15, 2021
Face Recognition & AI Based Smart Attendance Monitoring System.

In today’s generation, authentication is one of the biggest problems in our society. So, one of the most known techniques used for authentication is h

Sagar Saha 1 Jan 14, 2022
[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective [PDF] Wuyang Chen, Xinyu Gong, Zhangyang Wang In ICLR 2

VITA 156 Nov 28, 2022
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data Au

14 Nov 28, 2022
A python library for self-supervised learning on images.

Lightly is a computer vision framework for self-supervised learning. We, at Lightly, are passionate engineers who want to make deep learning more effi

Lightly 2k Jan 08, 2023
JAX-based neural network library

Haiku: Sonnet for JAX Overview | Why Haiku? | Quickstart | Installation | Examples | User manual | Documentation | Citing Haiku What is Haiku? Haiku i

DeepMind 2.3k Jan 04, 2023
Automatic meme generation model using Tensorflow Keras.

Memefly You can find the project at MemeflyAI. Contributors Nick Buukhalter Harsh Desai Han Lee Project Overview Trello Board Product Canvas Automatic

BloomTech Labs 2 Jan 13, 2022
An offline deep reinforcement learning library

d3rlpy: An offline deep reinforcement learning library d3rlpy is an offline deep reinforcement learning library for practitioners and researchers. imp

Takuma Seno 817 Jan 02, 2023
City Surfaces: City-scale Semantic Segmentation of Sidewalk Surfaces

City Surfaces: City-scale Semantic Segmentation of Sidewalk Surfaces Paper Temporary GitHub page for City Surfaces paper. More soon! While designing s

14 Nov 10, 2022
The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.

Face Alignment in Full Pose Range: A 3D Total Solution By Jianzhu Guo. [Updates] 2020.8.30: The pre-trained model and code of ECCV-20 are made public

Jianzhu Guo 3.4k Jan 02, 2023
Deep Learning applied to Integral data analysis

DeepIntegralCompton Deep Learning applied to Integral data analysis Module installation Move to the root directory of the project and execute : pip in

Thomas Vuillaume 1 Dec 10, 2021
External Attention Network

Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks paper : https://arxiv.org/abs/2105.02358 Jittor code will come soon

MenghaoGuo 357 Dec 11, 2022
Solve a Rubiks Cube using Python Opencv and Kociemba module

Rubiks_Cube_Solver Solve a Rubiks Cube using Python Opencv and Kociemba module Main Steps Get the countours of the cube check whether there are tota

Adarsh Badagala 176 Jan 01, 2023
Optimizing Deeper Transformers on Small Datasets

DT-Fixup Optimizing Deeper Transformers on Small Datasets Paper published in ACL 2021: arXiv Detailed instructions to replicate our results in the pap

16 Nov 14, 2022
“袋鼯麻麻——智能购物平台”能够精准地定位识别每一个商品

“袋鼯麻麻——智能购物平台”能够精准地定位识别每一个商品,并且能够返回完整地购物清单及顾客应付的实际商品总价格,极大地降低零售行业实际运营过程中巨大的人力成本,提升零售行业无人化、自动化、智能化水平。

thomas-yanxin 192 Jan 05, 2023