Convert Apple NeuralHash model for CSAM Detection to ONNX.

Overview

AppleNeuralHash2ONNX

Convert Apple NeuralHash model for CSAM Detection to ONNX.

Intro

Apple NeuralHash is a perceptual hashing method for images based on neural networks. It can tolerate image resize and compression. The steps of hashing is as the following:

  1. Convert image to RGB.
  2. Resize image to 360x360.
  3. Normalize RGB values to [-1, 1] range.
  4. Perform inference on the NeuralHash model.
  5. Calculate dot product of a 96x128 matrix with the resulting vector of 128 floats.
  6. Apply binary step to the resulting 96 float vector.
  7. Convert the vector of 1.0 and 0.0 to bits, resulting in 96-bit binary data.

In this project, we convert Apple's NeuralHash model to ONNX format. A demo script for testing the model is also included.

Prerequisite

OS

Both macOS and Linux will work. In the following sections Debian is used for Linux example.

LZFSE decoder

  • macOS: Install by running brew install lzfse.
  • Linux: Build and install from lzfse source.

Python

Python 3.6 and above should work. Install the following dependencies:

pip install onnx coremltools

Conversion Guide

Step 1: Get NeuralHash model

You will need 4 files from a recent macOS or iOS build:

  • neuralhash_128x96_seed1.dat
  • NeuralHashv3b-current.espresso.net
  • NeuralHashv3b-current.espresso.shape
  • NeuralHashv3b-current.espresso.weights

Option 1: From macOS or jailbroken iOS device (Recommended)

If you have a recent version of macOS (11.4+) or jailbroken iOS (14.7+) installed, simply grab these files from /System/Library/Frameworks/Vision.framework/Resources/ (on macOS) or /System/Library/Frameworks/Vision.framework/ (on iOS).

Option 2: From iOS IPSW (click to reveal)
  1. Download any .ipsw of a recent iOS build (14.7+) from ipsw.me.
  2. Unpack the file:
cd /path/to/ipsw/file
mkdir unpacked_ipsw
cd unpacked_ipsw
unzip ../*.ipsw
  1. Locate system image:
ls -lh

What you need is the largest .dmg file, for example 018-63036-003.dmg.

  1. Mount system image. On macOS simply open the file in Finder. On Linux run the following commands:
# Build and install apfs-fuse
sudo apt install fuse libfuse3-dev bzip2 libbz2-dev cmake g++ git libattr1-dev zlib1g-dev
git clone https://github.com/sgan81/apfs-fuse.git
cd apfs-fuse
git submodule init
git submodule update
mkdir build
cd build
cmake ..
make
sudo make install
sudo ln -s /bin/fusermount /bin/fusermount3
# Mount image
mkdir rootfs
apfs-fuse 018-63036-003.dmg rootfs

Required files are under /System/Library/Frameworks/Vision.framework/ in mounted path.

Put them under the same directory:

mkdir NeuralHash
cd NeuralHash
cp /System/Library/Frameworks/Vision.framework/Resources/NeuralHashv3b-current.espresso.* .
cp /System/Library/Frameworks/Vision.framework/Resources/neuralhash_128x96_seed1.dat .

Step 2: Decode model structure and shapes

Normally compiled Core ML models store structure in model.espresso.net and shapes in model.espresso.shape, both in JSON. It's the same for NeuralHash model but compressed with LZFSE.

dd if=NeuralHashv3b-current.espresso.net bs=4 skip=7 | lzfse -decode -o model.espresso.net
dd if=NeuralHashv3b-current.espresso.shape bs=4 skip=7 | lzfse -decode -o model.espresso.shape
cp NeuralHashv3b-current.espresso.weights model.espresso.weights

Step 3: Convert model to ONNX

cd ..
git clone https://github.com/AsuharietYgvar/TNN.git
cd TNN
python3 tools/onnx2tnn/onnx-coreml/coreml2onnx.py ../NeuralHash

The resulting model is NeuralHash/model.onnx.

Usage

Inspect model

Netron is a perfect tool for this purpose.

Calculate neural hash with onnxruntime

  1. Install required libraries:
pip install onnxruntime pillow
  1. Run nnhash.py on an image:
python3 nnhash.py /path/to/model.onnx /path/to/neuralhash_128x96_seed1.dat image.jpg

Example output:

ab14febaa837b6c1484c35e6

Note: Neural hash generated here might be a few bits off from one generated on an iOS device. This is expected since different iOS devices generate slightly different hashes anyway. The reason is that neural networks are based on floating-point calculations. The accuracy is highly dependent on the hardware. For smaller networks it won't make any difference. But NeuralHash has 200+ layers, resulting in significant cumulative errors.

Device Hash
iPad Pro 10.5-inch 2b186faa6b36ffcc4c4635e1
M1 Mac 2b5c6faa6bb7bdcc4c4731a1
iOS Simulator 2b5c6faa6bb6bdcc4c4731a1
ONNX Runtime 2b5c6faa6bb6bdcc4c4735a1

Credits

  • nhcalc for uncovering NeuralHash private API.
  • TNN for compiled Core ML to ONNX script.
Owner
Asuhariet Ygvar
Asuhariet Ygvar
Official code for "On the Frequency Bias of Generative Models", NeurIPS 2021

Frequency Bias of Generative Models Generator Testbed Discriminator Testbed This repository contains official code for the paper On the Frequency Bias

35 Nov 01, 2022
FastyAPI is a Stack boilerplate optimised for heavy loads.

FastyAPI A FastAPI based Stack boilerplate for heavy loads. Explore the docs » View Demo · Report Bug · Request Feature Table of Contents About The Pr

Ali Chaayb 47 Dec 27, 2022
Multistream CNN for Robust Acoustic Modeling

Multistream Convolutional Neural Network (CNN) A multistream CNN is a novel neural network architecture for robust acoustic modeling in speech recogni

ASAPP Research 37 Sep 21, 2022
Multivariate Time Series Transformer, public version

Multivariate Time Series Transformer Framework This code corresponds to the paper: George Zerveas et al. A Transformer-based Framework for Multivariat

363 Jan 03, 2023
Code and Experiments for ACL-IJCNLP 2021 Paper Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering.

Code and Experiments for ACL-IJCNLP 2021 Paper Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering.

Sidd Karamcheti 50 Nov 16, 2022
A demonstration of using a live Tensorflow session to create an interactive face-GAN explorer.

Streamlit Demo: The Controllable GAN Face Generator This project highlights Streamlit's new hash_func feature with an app that calls on TensorFlow to

Streamlit 257 Dec 31, 2022
Official code for "Mean Shift for Self-Supervised Learning"

MSF Official code for "Mean Shift for Self-Supervised Learning" Requirements Python = 3.7.6 PyTorch = 1.4 torchvision = 0.5.0 faiss-gpu = 1.6.1 In

UMBC Vision 44 Nov 21, 2022
A PyTorch implementation of QANet.

QANet-pytorch NOTICE I'm very busy these months. I'll return to this repo in about 10 days. Introduction An implementation of QANet with PyTorch. Any

H. Z. 343 Nov 03, 2022
:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement

huybery 60 Dec 31, 2022
Implementation of Stochastic Image-to-Video Synthesis using cINNs.

Stochastic Image-to-Video Synthesis using cINNs Official PyTorch implementation of Stochastic Image-to-Video Synthesis using cINNs accepted to CVPR202

CompVis Heidelberg 135 Dec 28, 2022
TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

FunMatch-Distillation TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A g

Sayak Paul 67 Dec 20, 2022
Keras implementations of Generative Adversarial Networks.

This repository has gone stale as I unfortunately do not have the time to maintain it anymore. If you would like to continue the development of it as

Erik Linder-Norén 8.9k Jan 04, 2023
A little Python application to auto tag your photos with the power of machine learning.

Tag Machine A little Python application to auto tag your photos with the power of machine learning. Report a bug or request a feature Table of Content

Florian Torres 14 Dec 21, 2022
We are More than Our JOints: Predicting How 3D Bodies Move

We are More than Our JOints: Predicting How 3D Bodies Move Citation This repo contains the official implementation of our paper MOJO: @inproceedings{Z

72 Oct 20, 2022
Official Pytorch implementation of C3-GAN

Official pytorch implemenation of C3-GAN Contrastive Fine-grained Class Clustering via Generative Adversarial Networks [Paper] Authors: Yunji Kim, Jun

NAVER AI 114 Dec 02, 2022
Official Pytorch Implementation of Unsupervised Image Denoising with Frequency Domain Knowledge

Unsupervised Image Denoising with Frequency Domain Knowledge (BMVC 2021 Oral) : Official Project Page This repository provides the official PyTorch im

Donggon Jang 12 Sep 26, 2022
HMLET (Hybrid-Method-of-Linear-and-non-linEar-collaborative-filTering-method)

Methods HMLET (Hybrid-Method-of-Linear-and-non-linEar-collaborative-filTering-method) Dynamically selecting the best propagation method for each node

Yong 7 Dec 18, 2022
[ICCV 2021 Oral] NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

NerfingMVS Project Page | Paper | Video | Data NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo Yi Wei, Shaohui

Yi Wei 369 Dec 24, 2022
CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax.

Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax ⚠️ Latest: Current repo is a complete version. But we delet

FishYuLi 341 Dec 23, 2022
A repository that finds a person who looks like you by using face recognition technology.

Find Your Twin Hello everyone, I've always wondered how casting agencies do the casting for a scene where a certain actor is young or old for a movie

Cengizhan Yurdakul 3 Jan 29, 2022