A PyTorch implementation of Deep SAD, a deep Semi-supervised Anomaly Detection method.

Overview

Deep SAD: A Method for Deep Semi-Supervised Anomaly Detection

This repository provides a PyTorch implementation of the Deep SAD method presented in our ICLR 2020 paper ”Deep Semi-Supervised Anomaly Detection”.

Citation and Contact

You find a PDF of the Deep Semi-Supervised Anomaly Detection ICLR 2020 paper on arXiv https://arxiv.org/abs/1906.02694.

If you find our work useful, please also cite the paper:

@InProceedings{ruff2020deep,
  title     = {Deep Semi-Supervised Anomaly Detection},
  author    = {Ruff, Lukas and Vandermeulen, Robert A. and G{\"o}rnitz, Nico and Binder, Alexander and M{\"u}ller, Emmanuel and M{\"u}ller, Klaus-Robert and Kloft, Marius},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://openreview.net/forum?id=HkgH0TEYwH}
}

If you would like get in touch, just drop us an email to [email protected].

Abstract

Deep approaches to anomaly detection have recently shown promising results over shallow methods on large and complex datasets. Typically anomaly detection is treated as an unsupervised learning problem. In practice however, one may have---in addition to a large set of unlabeled samples---access to a small pool of labeled samples, e.g. a subset verified by some domain expert as being normal or anomalous. Semi-supervised approaches to anomaly detection aim to utilize such labeled samples, but most proposed methods are limited to merely including labeled normal samples. Only a few methods take advantage of labeled anomalies, with existing deep approaches being domain-specific. In this work we present Deep SAD, an end-to-end deep methodology for general semi-supervised anomaly detection. We further introduce an information-theoretic framework for deep anomaly detection based on the idea that the entropy of the latent distribution for normal data should be lower than the entropy of the anomalous distribution, which can serve as a theoretical interpretation for our method. In extensive experiments on MNIST, Fashion-MNIST, and CIFAR-10, along with other anomaly detection benchmark datasets, we demonstrate that our method is on par or outperforms shallow, hybrid, and deep competitors, yielding appreciable performance improvements even when provided with only little labeled data.

The need for semi-supervised anomaly detection

fig1

Installation

This code is written in Python 3.7 and requires the packages listed in requirements.txt.

Clone the repository to your machine and directory of choice:

git clone https://github.com/lukasruff/Deep-SAD-PyTorch.git

To run the code, we recommend setting up a virtual environment, e.g. using virtualenv or conda:

virtualenv

# pip install virtualenv
cd <path-to-Deep-SAD-PyTorch-directory>
virtualenv myenv
source myenv/bin/activate
pip install -r requirements.txt

conda

cd <path-to-Deep-SAD-PyTorch-directory>
conda create --name myenv
source activate myenv
while read requirement; do conda install -n myenv --yes $requirement; done < requirements.txt

Running experiments

We have implemented the MNIST, Fashion-MNIST, and CIFAR-10 datasets as well as the classic anomaly detection benchmark datasets arrhythmia, cardio, satellite, satimage-2, shuttle, and thyroid from the Outlier Detection DataSets (ODDS) repository (http://odds.cs.stonybrook.edu/) as reported in the paper.

The implemented network architectures are as reported in the appendix of the paper.

Deep SAD

You can run Deep SAD experiments using the main.py script.

Here's an example on MNIST with 0 considered to be the normal class and having 1% labeled (known) training samples from anomaly class 1 with a pollution ratio of 10% of the unlabeled training data (with unknown anomalies from all anomaly classes 1-9):

cd <path-to-Deep-SAD-PyTorch-directory>

# activate virtual environment
source myenv/bin/activate  # or 'source activate myenv' for conda

# create folders for experimental output
mkdir log/DeepSAD
mkdir log/DeepSAD/mnist_test

# change to source directory
cd src

# run experiment
python main.py mnist mnist_LeNet ../log/DeepSAD/mnist_test ../data --ratio_known_outlier 0.01 --ratio_pollution 0.1 --lr 0.0001 --n_epochs 150 --lr_milestone 50 --batch_size 128 --weight_decay 0.5e-6 --pretrain True --ae_lr 0.0001 --ae_n_epochs 150 --ae_batch_size 128 --ae_weight_decay 0.5e-3 --normal_class 0 --known_outlier_class 1 --n_known_outlier_classes 1;

Have a look into main.py for all possible arguments and options.

Baselines

We also provide an implementation of the following baselines via the respective baseline_<method_name>.py scripts: OC-SVM (ocsvm), Isolation Forest (isoforest), Kernel Density Estimation (kde), kernel Semi-Supervised Anomaly Detection (ssad), and Semi-Supervised Deep Generative Model (SemiDGM).

Here's how to run SSAD for example on the same experimental setup as above:

cd <path-to-Deep-SAD-PyTorch-directory>

# activate virtual environment
source myenv/bin/activate  # or 'source activate myenv' for conda

# create folder for experimental output
mkdir log/ssad
mkdir log/ssad/mnist_test

# change to source directory
cd src

# run experiment
python baseline_ssad.py mnist ../log/ssad/mnist_test ../data --ratio_known_outlier 0.01 --ratio_pollution 0.1 --kernel rbf --kappa 1.0 --normal_class 0 --known_outlier_class 1 --n_known_outlier_classes 1;

The autoencoder is provided through Deep SAD pre-training using --pretrain True with main.py. To then run a hybrid approach using one of the classic methods on top of autoencoder features, simply point to the saved autoencoder model using --load_ae ../log/DeepSAD/mnist_test/model.tar and set --hybrid True.

To run hybrid SSAD for example on the same experimental setup as above:

cd <path-to-Deep-SAD-PyTorch-directory>

# activate virtual environment
source myenv/bin/activate  # or 'source activate myenv' for conda

# create folder for experimental output
mkdir log/hybrid_ssad
mkdir log/hybrid_ssad/mnist_test

# change to source directory
cd src

# run experiment
python baseline_ssad.py mnist ../log/hybrid_ssad/mnist_test ../data --ratio_known_outlier 0.01 --ratio_pollution 0.1 --kernel rbf --kappa 1.0 --hybrid True --load_ae ../log/DeepSAD/mnist_test/model.tar --normal_class 0 --known_outlier_class 1 --n_known_outlier_classes 1;

License

MIT

Owner
Lukas Ruff
PhD student in the ML group at TU Berlin.
Lukas Ruff
Uses diff command to compare expected output with student's submission output

AUTOGRADER for GRADESCOPE using diff with partial grading Description: Uses diff command to compare expected output with student's submission output U

2 Jan 11, 2022
YAML metadata extension for Python-Markdown

YAML metadata extension for Python-Markdown This extension adds YAML meta data handling to markdown with all YAML features. As in the original, metada

Nikita Sivakov 14 Dec 30, 2022
A document format conversion service based on Pandoc.

reformed Document format conversion service based on Pandoc. Usage The API specification for the Reformed server is as follows: GET /api/v1/formats: L

David Lougheed 3 Jul 18, 2022
Members: Thomas Longuevergne Program: Network Security Course: 1DV501 Date of submission: 2021-11-02

Mini-project report Members: Thomas Longuevergne Program: Network Security Course: 1DV501 Date of submission: 2021-11-02 Introduction This project was

1 Nov 08, 2021
Data Inspector is an open-source python library that brings 15++ types of different functions to make EDA, data cleaning easier.

Data Inspector Data Inspector is an open-source python library that brings 15 types of different functions to make EDA, data cleaning easier. Author:

Kazi Amit Hasan 38 Nov 24, 2022
Yu-Gi-Oh! Master Duel translation script

Yu-Gi-Oh! Master Duel translation script

715 Jan 08, 2023
Python bindings to OpenSlide

OpenSlide Python OpenSlide Python is a Python interface to the OpenSlide library. OpenSlide is a C library that provides a simple interface for readin

OpenSlide 297 Dec 21, 2022
Clases y ejercicios del curso de python diactodo por la UNSAM

Programación en Python En el marco del proyecto de Inteligencia Artificial Interdisciplinaria, la Escuela de Ciencia y Tecnología de la UNSAM vuelve a

Maximiliano Villalva 3 Jan 06, 2022
EasyModerationKit is an open-source framework designed to moderate and filter inappropriate content.

EasyModerationKit is a public transparency statement. It declares any repositories and legalities used in the EasyModeration system. It allows for implementing EasyModeration into an advanced charact

Aarav 1 Jan 16, 2022
A simple XLSX/CSV reader - to dictionary converter

sheet2dict A simple XLSX/CSV reader - to dictionary converter Installing To install the package from pip, first run: python3 -m pip install --no-cache

Tomas Pytel 216 Nov 25, 2022
EasyMultiClipboard - Python script written to handle more than 1 string in clipboard

EasyMultiClipboard - Python script written to handle more than 1 string in clipboard

WVlab 1 Jun 18, 2022
Que es S4K Builder?, Fácil un constructor de tokens grabbers con muchas opciones, como BTC Miner, Clipper, shutdown PC, Y más! Disfrute el proyecto. <3

S4K Builder Este script Python 3 de código abierto es un constructor del muy popular registrador de tokens que está en [mi GitHub] (https://github.com

SadicX 1 Oct 22, 2021
Practical Python Programming

Welcome! When I first learned Python nearly 25 years ago, I was immediately struck by how I could productively apply it to all sorts of messy work pro

Dabeaz LLC 8.3k Jan 08, 2023
A swagger tool for tornado, using python to write api doc!

SwaggerDoc About A swagger tool for tornado, using python to write api doc! Installation pip install swagger-doc Quick Start code import tornado.ioloo

aaashuai 1 Jan 10, 2022
Convenient tools for using Swagger to define and validate your interfaces in a Pyramid webapp.

Convenient tools for using Swagger to define and validate your interfaces in a Pyramid webapp.

Scott Triglia 64 Sep 18, 2022
Grokking the Object Oriented Design Interview

Grokking the Object Oriented Design Interview

Tusamma Sal Sabil 2.6k Jan 08, 2023
Documentation and issues for Pylance - Fast, feature-rich language support for Python

Documentation and issues for Pylance - Fast, feature-rich language support for Python

Microsoft 1.5k Dec 29, 2022
A powerful Sphinx changelog-generating extension.

What is Releases? Releases is a Python (2.7, 3.4+) compatible Sphinx (1.8+) extension designed to help you keep a source control friendly, merge frien

Jeff Forcier 166 Dec 29, 2022
Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts

Have you always wished Jupyter notebooks were plain text documents? Wished you could edit them in your favorite IDE? And get clear and meaningful diff

Marc Wouts 5.7k Jan 04, 2023
Python Eacc is a minimalist but flexible Lexer/Parser tool in Python.

Python Eacc is a parsing tool it implements a flexible lexer and a straightforward approach to analyze documents.

Iury de oliveira gomes figueiredo 60 Nov 16, 2022