A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Last update: Nov 11, 2022

Overview

Duplicate Image Detection

Getting Started

Install dependencies pip install -r requirements.txt
Run service python main.py

Testing

Test with pytest

How it Works

This system uses a perceptual hashing function, similar to Apple's CSAM Detection. Instead of generating image hashes using NeuralHash, it uses a difference hash (dHash), which is simpler and less computationally intensive as it doesn't require neural networks. Since we don't have the same privacy constraints as Apple, we will be using nearest neighbor searches to identify duplicate images.

Difference Hash

dHash is a perceptual hashing function that produces hash values that are resilient to image scaling, as well as changes in color, brightness, and aspect ratio [1]. There are 4 main steps for creating a difference hash for an image:

Convert to greyscale*
Resize image to (hash_size+1, hash_size)
Calculate horizontal gradient, reducing image size to (hash_size, hash_size)
Assign bits based on horizontal gradient values

*We convert the image to greyscale before resizing for optimal performance

Nearest Neighbors

Image hashes that we want to check for duplicates against will be stored in a binary index for fast and efficient nearest neighbor searches. We will use Hamming distance as a metric to determine the similarity between image hashes, for dHash, distances less than 10 (96.09% similarity) likely indicate similar/duplicate images [1].

References

[1] https://www.hackerfactor.com/blog/?/archives/529-Kind-of-Like-That.html

A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Related tags

Overview

Duplicate Image Detection

Getting Started

Testing

How it Works

Difference Hash

Nearest Neighbors

References

Owner

Matthew Podolak

A program that uses computer vision to detect hand gestures, used for controlling movie players.

A simple baseline for 3d human pose estimation in PyTorch.

MG-GCN: Scalable Multi-GPU GCN Training Framework

Hyper-parameter optimization for sklearn

DanceTrack: Multiple Object Tracking in Uniform Appearance and Diverse Motion

Official implementation of the paper "AAVAE: Augmentation-AugmentedVariational Autoencoders"

Bounding Wasserstein distance with couplings

some academic posters as references. May we have in-person poster session soon!

Tutorial on active learning with the Nvidia Transfer Learning Toolkit (TLT).

LightLog is an open source deep learning based lightweight log analysis tool for log anomaly detection.

【steal piano】GitHub偷情分析工具！

Optimal space decomposition based-product quantization for approximate nearest neighbor search

Age Progression/Regression by Conditional Adversarial Autoencoder

A PyTorch implementation of "Graph Wavelet Neural Network" (ICLR 2019)

Repository for reproducing `Model-Based Robust Deep Learning`

Stitch it in Time: GAN-Based Facial Editing of Real Videos

This repository contains the code for the paper Neural RGB-D Surface Reconstruction

Fast, flexible and easy to use probabilistic modelling in Python.

Pytorch implementation of Decoupled Spatial-Temporal Transformer for Video Inpainting

kullanışlı ve işinizi kolaylaştıracak bir araç