Open source implementation of "A Self-Supervised Descriptor for Image Copy Detection" (SSCD).

Overview

A Self-Supervised Descriptor for Image Copy Detection (SSCD)

This is the open-source codebase for "A Self-Supervised Descriptor for Image Copy Detection", recently accepted to CVPR 2022.

This work uses self-supervised contrastive learning with strong differential entropy regularization to create a fingerprint for image copy detection.

SSCD diagram

About this codebase

This implementation is built on Pytorch Lightning, with some components from Classy Vision.

Our original experiments were conducted in a proprietary codebase using data files (fonts and emoji) that are not licensed for redistribution. This version uses Noto fonts and Twemoji emoji, via the AugLy project. As a result, models trained in this codebase perform slightly differently than our pretrained models.

Pretrained models

We provide trained models from our original experiments to allow others to reproduce our evaluation results.

For convenience, we provide equivalent model files in a few formats:

  • Files ending in .classy.pt are weight files using Classy Vision ResNe(X)t backbones, which is how these models were trained.
  • Files ending in .torchvision.pt are weight files using Torchvision ResNet backbones. These files may be easier to integrate in Torchvision-based codebases. See model.py for how we integrate GeM pooling and L2 normalization into these models.
  • Files ending in .torchscript.pt are standalone TorchScript models that can be used in any pytorch project without any SSCD code.

We provide the following models:

name dataset trunk augmentations dimensions classy vision torchvision torchscript
sscd_disc_blur DISC ResNet50 strong blur 512 link link link
sscd_disc_advanced DISC ResNet50 advanced 512 link link link
sscd_disc_mixup DISC ResNet50 advanced + mixup 512 link link link
sscd_disc_large DISC ResNeXt101 32x4 advanced + mixup 1024 link link
sscd_imagenet_blur ImageNet ResNet50 strong blur 512 link link link
sscd_imagenet_advanced ImageNet ResNet50 advanced 512 link link link
sscd_imagenet_mixup ImageNet ResNet50 advanced + mixup 512 link link link

We recommend sscd_disc_mixup (ResNet50) as a default SSCD model, especially when comparing to other standard ResNet50 models, and sscd_disc_large (ResNeXt101) as a higher accuracy alternative using a bit more compute.

Classy Vision and Torchvision use different default cardinality settings for ResNeXt101. We do not provide a Torchvision version of the sscd_disc_large model for this reason.

Installation

If you only plan to use torchscript models for inference, no installation steps are necessary, and any environment with a recent version of pytorch installed can run our torchscript models.

For all other uses, see installation steps below.

The code is written for pytorch-lightning 1.5 (the latest version at time of writing), and may need changes for future Lightning versions.

Option 1: Install dependencies using Conda

Install and activate conda, then create a conda environment for SSCD as follows:

# Create conda environment
conda create --name sscd -c pytorch -c conda-forge \
  pytorch torchvision cudatoolkit=11.3 \
  "pytorch-lightning>=1.5,<1.6" lightning-bolts \
  faiss python-magic pandas numpy

# Activate environment
conda activate sscd

# Install Classy Vision and AugLy from PIP:
python -m pip install classy_vision augly

You may need to select a cudatoolkit version that corresponds to the system CUDA library version you have installed. See PyTorch documentation for supported combinations of pytorch, torchvision and cudatoolkit versions.

For a non-CUDA (CPU only) installation, replace cudatoolkit=... with cpuonly.

Option 2: Install dependencies using PIP

# Create environment
python3 -m virtualenv ./venv

# Activate environment
source ./venv/bin/activate

# Install dependencies in this environment
python -m pip install -r ./requirements.txt --extra-index-url https://download.pytorch.org/whl/cu113

The --extra-index-url option selects a newer version of CUDA libraries, required for NVidia A100 GPUs. This can be omitted if A100 support is not needed.

Inference using SSCD models

This section describes how to use pretrained SSCD models for inference. To perform inference for DISC and Copydays evaluations, see Evaluation.

Preprocessing

We recommend preprocessing images for inference either resizing the small edge to 288 or resizing the image to a square tensor.

Using fixed-sized square tensors is more efficient on GPUs, to make better use of batching. Copy detection using square tensors benefits from directly resizing to the target tensor size. This skews the image, and does not preserve aspect ratio. This differs from the common practice for classification inference.

from torchvision import transforms

normalize = transforms.Normalize(
    mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225],
)
small_288 = transforms.Compose([
    transforms.Resize(288),
    transforms.ToTensor(),
    normalize,
])
skew_320 = transforms.Compose([
    transforms.Resize([320, 320]),
    transforms.ToTensor(),
    normalize,
])

Inference using Torchscript

Torchscript files can be loaded directly in other projects without any SSCD code or dependencies.

import torch
from PIL import Image

model = torch.jit.load("/path/to/sscd_disc_mixup.torchscript.pt")
img = Image.open("/path/to/image.png").convert('RGB')
batch = small_288(img).unsqueeze(0)
embedding = model(batch)[0, :]

These Torchscript models are prepared for inference. For other uses (eg. fine-tuning), use model weight files, as described below.

Load model weight files

To load model weight files, first construct the Model object, then load the weights using the standard torch.load and load_state_dict methods.

import torch
from sscd.models.model import Model

model = Model("CV_RESNET50", 512, 3.0)
weights = torch.load("/path/to/sscd_disc_mixup.classy.pt")
model.load_state_dict(weights)
model.eval()

Once loaded, these models can be used interchangeably with Torchscript models for inference.

Model backbone strings can be found in the Backbone enum in model.py. Classy Vision models start with the prefix CV_ and Torchvision models start with TV_.

Using SSCD descriptors

SSCD models produce 512 dimension (except the "large" model, which uses 1024 dimensions) L2 normalized descriptors for each input image. The similarity of two images with descriptors a and b can be measured by descriptor cosine similarity (a.dot(b); higher is more similar), or equivalently using euclidean distance ((a-b).norm(); lower is more similar).

For the sscd_disc_mixup model, DISC image pairs with embedding cosine similarity greater than 0.75 are copies with 90% precision, for example. This corresponds to a euclidean distance less than 0.7, or squared euclidean distance less than 0.5.

Descriptor post-processing

For best results, we recommend additional descriptor processing when sample images from the target distribution are available. Centering (subtracting the mean) followed by L2 normalization, or whitening followed by L2 normalization, can improve accuracy.

Score normalization can make similarity more consistent and improve global accuracy metrics (but has no effect on ranking metrics).

Other model formats

If pretrained models in another format (eg. ONYX) would be useful for you, let us know by filing a feature request.

Reproducing evaluation results

To reproduce evaluation results, see Evaluation.

Training SSCD models

For information on how to train SSCD models, see Training.

License

The SSCD codebase uses the CC-NC 4.0 International license.

Citation

If you find our codebase useful, please consider giving a star and cite as:

@article{pizzi2022self,
  title={A Self-Supervised Descriptor for Image Copy Detection},
  author={Pizzi, Ed and Roy, Sreya Dutta and Ravindra, Sugosh Nagavara and Goyal, Priya and Douze, Matthijs},
  journal={Proc. CVPR},
  year={2022}
}
Owner
Meta Research
Meta Research
Source code for 2021 ICCV paper "In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces"

In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces This is the PyTorch implementation for 2021 ICCV paper "In-the-Wild Single C

27 Dec 06, 2022
Code accompanying our NeurIPS 2021 traffic4cast challenge

Traffic forecasting on traffic movie snippets This repo contains all code to reproduce our approach to the IARAI Traffic4cast 2021 challenge. In the c

Nina Wiedemann 2 Aug 09, 2022
A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost

A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost

Jun-Yan Zhu 27 Aug 08, 2022
Tutorial to set up TensorFlow Object Detection API on the Raspberry Pi

A tutorial showing how to set up TensorFlow's Object Detection API on the Raspberry Pi

Evan 1.1k Dec 26, 2022
Tutorials and implementations for "Self-normalizing networks"

Self-Normalizing Networks Tutorials and implementations for "Self-normalizing networks"(SNNs) as suggested by Klambauer et al. (arXiv pre-print). Vers

Institute of Bioinformatics, Johannes Kepler University Linz 1.6k Jan 07, 2023
python library for invisible image watermark (blind image watermark)

invisible-watermark invisible-watermark is a python library and command line tool for creating invisible watermark over image.(aka. blink image waterm

Shield Mountain 572 Jan 07, 2023
Python based Advanced AI Assistant

Knick is a virtual artificial intelligence project, fully developed in python. The objective of this project is to develop a virtual assistant that can handle our minor, intermediate as well as heavy

19 Nov 15, 2022
Robotic Process Automation in Windows and Linux by using Driagrams.net BPMN diagrams.

BPMN_RPA Robotic Process Automation in Windows and Linux by using BPMN diagrams. With this Framework you can draw Business Process Model Notation base

23 Dec 14, 2022
This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting.

GAN Memory for Lifelong learning This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting. Please consider citing our paper

Miaoyun Zhao 43 Dec 27, 2022
The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

SuperGen The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding. Requirements Before running, you

Yu Meng 38 Dec 12, 2022
SOLO and SOLOv2 for instance segmentation, ECCV 2020 & NeurIPS 2020.

SOLO: Segmenting Objects by Locations This project hosts the code for implementing the SOLO algorithms for instance segmentation. SOLO: Segmenting Obj

Xinlong Wang 1.5k Dec 31, 2022
Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"

**Codebase and data are uploaded in progress. ** VOLT(-py) is a vocabulary learning codebase that allows researchers and developers to automaticaly ge

416 Jan 09, 2023
Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)

T-Zero This repository serves primarily as codebase and instructions for training, evaluation and inference of T0. T0 is the model developed in Multit

BigScience Workshop 253 Dec 27, 2022
Depth-Aware Video Frame Interpolation (CVPR 2019)

DAIN (Depth-Aware Video Frame Interpolation) Project | Paper Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, and Ming-Hsuan Yang IEEE C

Wenbo Bao 7.7k Dec 31, 2022
Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

peng gao 42 Nov 26, 2022
StocksMA is a package to facilitate access to financial and economic data of Moroccan stocks.

Creating easier access to the Moroccan stock market data What is StocksMA ? StocksMA is a package to facilitate access to financial and economic data

Salah Eddine LABIAD 28 Jan 04, 2023
Code for "The Box Size Confidence Bias Harms Your Object Detector"

The Box Size Confidence Bias Harms Your Object Detector - Code Disclaimer: This repository is for research purposes only. It is designed to maintain r

Johannes G. 24 Dec 07, 2022
Exadel CompreFace is a free and open-source face recognition GitHub project

Exadel CompreFace is a leading free and open-source face recognition system Exadel CompreFace is a free and open-source face recognition service that

Exadel 2.6k Jan 04, 2023
Machine Learning in Asset Management (by @firmai)

Machine Learning in Asset Management If you like this type of content then visit ML Quant site below: https://www.ml-quant.com/ Part One Follow this l

Derek Snow 1.5k Jan 02, 2023
Implementation of OpenAI paper with Simple Noise Scale on Fastai V2

README Implementation of OpenAI paper "An Empirical Model of Large-Batch Training" for Fastai V2. The code is based on the batch size finder implement

13 Dec 10, 2021