Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.

Overview

Neural Magic Eye

Preprint | Project Page | Colab Runtime

Official PyTorch implementation of the preprint paper "NeuralMagicEye: Learning to See and Understand the Scene Behind an Autostereogram", arXiv:2012.15692.

An autostereogram, a.k.a. magic eye image, is a single-image stereogram that can create visual illusions of 3D scenes from 2D textures. This paper studies an interesting question that whether a deep CNN can be trained to recover the depth behind an autostereogram and understand its content. The key to the autostereogram magic lies in the stereopsis - to solve such a problem, a model has to learn to discover and estimate disparity from the quasi-periodic textures. We show that deep CNNs embedded with disparity convolution, a novel convolutional layer proposed in this paper that simulates stereopsis and encodes disparity, can nicely solve such a problem after being sufficiently trained on a large 3D object dataset in a self-supervised fashion. We refer to our method as "NeuralMagicEye". Experiments show that our method can accurately recover the depth behind autostereograms with rich details and gradient smoothness. Experiments also show the completely different working mechanisms for autostereogram perception between neural networks and human eyes. We hope this research can help people with visual impairments and those who have trouble viewing autostereograms.

In this repository, we provide the complete training/inference implementation of our paper based on Pytorch and provide several demos that can be used for reproducing the results reported in our paper. With the code, you can also try on your own data by following the instructions below.

The implementation of the UNet architecture in our code is partially adapted from the project pytorch-CycleGAN-and-pix2pix.

License

See the LICENSE file for license rights and limitations (MIT).

One-min video result

IMAGE ALT TEXT HERE

Requirements

See Requirements.txt.

Setup

  1. Clone this repo:
git clone https://github.com/jiupinjia/neural-magic-eye.git 
cd neural-magic-eye
  1. Download our pretrained autostereogram decoding network from the Google Drive, and unzip them to the repo directory.
unzip checkpoints_decode_sp_u256_bn_df.zip

To reproduce our results

Decoding autostereograms

python demo_decode_image.py --in_folder ./test_images --out_folder ./decode_output --net_G unet_256 --norm_type batch --with_disparity_conv --in_size 256 --checkpoint_dir ./checkpoints_decode_sp_u256_bn_df

Decoding autostereograms (animated)

  • Stanford Bunny

python demo_decode_animated.py --in_file ./test_videos/bunny.mp4 --out_folder ./decode_output --net_G unet_256 --norm_type batch --with_disparity_conv --in_size 256 --checkpoint_dir ./checkpoints_decode_sp_u256_bn_df
  • Stanford Armadillo

python demo_decode_animated.py --in_file ./test_videos/bunny.mp4 --out_folder ./decode_output --net_G unet_256 --norm_type batch --with_disparity_conv --in_size 256 --checkpoint_dir ./checkpoints_decode_sp_u256_bn_df

Google Colab

Here we also provide a minimal working example of the inference runtime of our method. Check out this link and see your result on Colab.

To retrain your decoding/classification model

If you want to retrain our model, or want to try a different network configuration, you will first need to download our experimental dataset and then unzip it to the repo directory.

unzip datasets.zip

Note that to build the training pipeline, you will need a set of depth images and background textures, which are already there included in our pre-processed dataset (see folders ./dataset/ShapeNetCore.v2 and ./dataset/Textures for more details). The autostereograms will be generated on the fly during the training process.

In the following, we provide several examples for training our decoding/classification models with different configurations. Particularly, if you are interested in exploring different network architectures, you can check out --net_G , --norm_type , --with_disparity_conv and --with_skip_connection for more details.

To train the decoding network (on mnist dataset, unet_64 + bn, without disparity_conv)

python train_decoder.py --dataset mnist --net_G unet_64 --in_size 64 --batch_size 32 --norm_type batch --checkpoint_dir ./checkpoints_your_model_name_here --vis_dir ./val_out_your_model_name_here

To train the decoding network (on shapenet dataset, resnet18 + in + disparity_conv + fpn)

python train_decoder.py --dataset shapenet --net_G resnet18fcn --in_size 128 --batch_size 32 --norm_type instance --with_disparity_conv --with_skip_connection --checkpoint_dir ./checkpoints_your_model_name_here --vis_dir ./val_out_your_model_name_here

To train the watermark decoding model (unet256 + bn + disparity_conv)

python train_decoder.py --dataset watermarking --net_G unet_256 --in_size 256 --batch_size 16 --norm_type batch --with_disparity_conv --checkpoint_dir ./checkpoints_your_model_name_here --vis_dir ./val_out_your_model_name_here

To train the classification network (on mnist dataset, resnet18 + in + disparity_conv)

python train_classifier.py --dataset mnist --net_G resnet18 --in_size 64 --batch_size 32 --norm_type instance --with_disparity_conv --checkpoint_dir ./checkpoints_your_model_name_here --vis_dir ./val_out_your_model_name_here

To train the classification network (on shapenet dataset, resnet18 + bn + disparity_conv)

python train_classifier.py --dataset shapenet --net_G resnet18 --in_size 64 --batch_size 32 --norm_type batch --with_disparity_conv --checkpoint_dir ./checkpoints_your_model_name_here --vis_dir ./val_out_your_model_name_here

Network architectures and performance

In the following, we show the decoding/classification accuracy with different model architectures. We hope these statistics can help you if you want to build your own model.

Citation

If you use our code for your research, please cite the following paper:

@misc{zou2020neuralmagiceye,
      title={NeuralMagicEye: Learning to See and Understand the Scene Behind an Autostereogram}, 
      author={Zhengxia Zou and Tianyang Shi and Yi Yuan and Zhenwei Shi},
      year={2020},
      eprint={2012.15692},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Owner
Zhengxia Zou
Postdoc at the University of Michigan. Research interest: computer vision and applications in remote sensing, self-driving, and video games.
Zhengxia Zou
Automatic voice-synthetised summaries of latest research papers on arXiv

PaperWhisperer PaperWhisperer is a Python application that keeps you up-to-date with research papers. How? It retrieves the latest articles from arXiv

Valerio Velardo 124 Dec 20, 2022
Understanding Convolution for Semantic Segmentation

TuSimple-DUC by Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, and Garrison Cottrell. Introduction This repository is for Under

TuSimple 585 Dec 31, 2022
An open source library for face detection in images. The face detection speed can reach 1000FPS.

libfacedetection This is an open source library for CNN-based face detection in images. The CNN model has been converted to static variables in C sour

Shiqi Yu 11.4k Dec 27, 2022
A self-supervised 3D representation learning framework named viewpoint bottleneck.

Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck Paper Created by Liyi Luo, Beiwen Tian, Hao Zhao and Guyue Zhou from Institute for AI In

63 Aug 11, 2022
Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness Code for Paper "Imbalanced Gradients: A Subtle Cause of Overestimated Adv

Hanxun Huang 11 Nov 30, 2022
This repository contains a CBIR system that uses swin transformer to extract image's feature.

Swin-transformer based CBIR This repository contains a CBIR(content-based image retrieval) system. Here we use Swin-transformer to extract query image

JsHou 12 Nov 17, 2022
Using Hotel Data to predict High Value And Potential VIP Guests

Description Using hotel data and AI to predict high value guests and potential VIP guests. Hotel can leverage on prediction resutls to run more effect

HCG 12 Feb 14, 2022
Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer"

SCGAN Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer" Prepare The pre-trained model is avaiable at http

118 Dec 12, 2022
TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

Microsoft 1.3k Dec 30, 2022
PyTorch implementation of the paper: Label Noise Transition Matrix Estimation for Tasks with Lower-Quality Features

Label Noise Transition Matrix Estimation for Tasks with Lower-Quality Features Estimate the noise transition matrix with f-mutual information. This co

<a href=[email protected]"> 1 Jun 05, 2022
Automatically erase objects in the video, such as logo, text, etc.

Video-Auto-Wipe Read English Introduction:Here   本人不定期的基于生成技术制作一些好玩有趣的算法模型,这次带来的作品是“视频擦除”方向的应用模型,它实现的功能是自动感知到视频中我们不想看见的部分(譬如广告、水印、字幕、图标等等)然后进行擦除。由于图标擦

seeprettyface.com 141 Dec 26, 2022
Hitters Linear Regression - Hitters Linear Regression With Python

Hitters_Linear_Regression Kullanacağımız veri seti Carnegie Mellon Üniversitesi'

AyseBuyukcelik 2 Jan 26, 2022
[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

3DVG-Transformer This repository is for the ICCV 2021 paper "3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds" Our method "3DV

22 Dec 11, 2022
A transformer-based method for Healthcare Image Captioning in Vietnamese

vieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese This repo GitHub contains our solution for vieCap4H

Doanh B C 4 May 05, 2022
Deep deconfounded recommender (Deep-Deconf) for paper "Deep causal reasoning for recommendations"

Deep Causal Reasoning for Recommender Systems The codes are associated with the following paper: Deep Causal Reasoning for Recommendations, Yaochen Zh

Yaochen Zhu 22 Oct 15, 2022
PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch.

snn-localization repo PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch. Install Dependencies Orig

Sami BARCHID 1 Jan 06, 2022
Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

MKGFormer Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion" Model Architecture Illu

ZJUNLP 68 Dec 28, 2022
An executor that performs image segmentation on fashion items

ClothingSegmenter U2NET fashion image/clothing segmenter based on https://github.com/levindabhi/cloth-segmentation Overview The ClothingSegmenter exec

Jina AI 5 Mar 30, 2022
A Python implementation of global optimization with gaussian processes.

Bayesian Optimization Pure Python implementation of bayesian global optimization with gaussian processes. PyPI (pip): $ pip install bayesian-optimizat

fernando 6.5k Jan 02, 2023
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Damian Panek 176 Nov 28, 2022