Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.

Overview

Neural Magic Eye

Preprint | Project Page | Colab Runtime

Official PyTorch implementation of the preprint paper "NeuralMagicEye: Learning to See and Understand the Scene Behind an Autostereogram", arXiv:2012.15692.

An autostereogram, a.k.a. magic eye image, is a single-image stereogram that can create visual illusions of 3D scenes from 2D textures. This paper studies an interesting question that whether a deep CNN can be trained to recover the depth behind an autostereogram and understand its content. The key to the autostereogram magic lies in the stereopsis - to solve such a problem, a model has to learn to discover and estimate disparity from the quasi-periodic textures. We show that deep CNNs embedded with disparity convolution, a novel convolutional layer proposed in this paper that simulates stereopsis and encodes disparity, can nicely solve such a problem after being sufficiently trained on a large 3D object dataset in a self-supervised fashion. We refer to our method as "NeuralMagicEye". Experiments show that our method can accurately recover the depth behind autostereograms with rich details and gradient smoothness. Experiments also show the completely different working mechanisms for autostereogram perception between neural networks and human eyes. We hope this research can help people with visual impairments and those who have trouble viewing autostereograms.

In this repository, we provide the complete training/inference implementation of our paper based on Pytorch and provide several demos that can be used for reproducing the results reported in our paper. With the code, you can also try on your own data by following the instructions below.

The implementation of the UNet architecture in our code is partially adapted from the project pytorch-CycleGAN-and-pix2pix.

License

See the LICENSE file for license rights and limitations (MIT).

One-min video result

IMAGE ALT TEXT HERE

Requirements

See Requirements.txt.

Setup

  1. Clone this repo:
git clone https://github.com/jiupinjia/neural-magic-eye.git 
cd neural-magic-eye
  1. Download our pretrained autostereogram decoding network from the Google Drive, and unzip them to the repo directory.
unzip checkpoints_decode_sp_u256_bn_df.zip

To reproduce our results

Decoding autostereograms

python demo_decode_image.py --in_folder ./test_images --out_folder ./decode_output --net_G unet_256 --norm_type batch --with_disparity_conv --in_size 256 --checkpoint_dir ./checkpoints_decode_sp_u256_bn_df

Decoding autostereograms (animated)

  • Stanford Bunny

python demo_decode_animated.py --in_file ./test_videos/bunny.mp4 --out_folder ./decode_output --net_G unet_256 --norm_type batch --with_disparity_conv --in_size 256 --checkpoint_dir ./checkpoints_decode_sp_u256_bn_df
  • Stanford Armadillo

python demo_decode_animated.py --in_file ./test_videos/bunny.mp4 --out_folder ./decode_output --net_G unet_256 --norm_type batch --with_disparity_conv --in_size 256 --checkpoint_dir ./checkpoints_decode_sp_u256_bn_df

Google Colab

Here we also provide a minimal working example of the inference runtime of our method. Check out this link and see your result on Colab.

To retrain your decoding/classification model

If you want to retrain our model, or want to try a different network configuration, you will first need to download our experimental dataset and then unzip it to the repo directory.

unzip datasets.zip

Note that to build the training pipeline, you will need a set of depth images and background textures, which are already there included in our pre-processed dataset (see folders ./dataset/ShapeNetCore.v2 and ./dataset/Textures for more details). The autostereograms will be generated on the fly during the training process.

In the following, we provide several examples for training our decoding/classification models with different configurations. Particularly, if you are interested in exploring different network architectures, you can check out --net_G , --norm_type , --with_disparity_conv and --with_skip_connection for more details.

To train the decoding network (on mnist dataset, unet_64 + bn, without disparity_conv)

python train_decoder.py --dataset mnist --net_G unet_64 --in_size 64 --batch_size 32 --norm_type batch --checkpoint_dir ./checkpoints_your_model_name_here --vis_dir ./val_out_your_model_name_here

To train the decoding network (on shapenet dataset, resnet18 + in + disparity_conv + fpn)

python train_decoder.py --dataset shapenet --net_G resnet18fcn --in_size 128 --batch_size 32 --norm_type instance --with_disparity_conv --with_skip_connection --checkpoint_dir ./checkpoints_your_model_name_here --vis_dir ./val_out_your_model_name_here

To train the watermark decoding model (unet256 + bn + disparity_conv)

python train_decoder.py --dataset watermarking --net_G unet_256 --in_size 256 --batch_size 16 --norm_type batch --with_disparity_conv --checkpoint_dir ./checkpoints_your_model_name_here --vis_dir ./val_out_your_model_name_here

To train the classification network (on mnist dataset, resnet18 + in + disparity_conv)

python train_classifier.py --dataset mnist --net_G resnet18 --in_size 64 --batch_size 32 --norm_type instance --with_disparity_conv --checkpoint_dir ./checkpoints_your_model_name_here --vis_dir ./val_out_your_model_name_here

To train the classification network (on shapenet dataset, resnet18 + bn + disparity_conv)

python train_classifier.py --dataset shapenet --net_G resnet18 --in_size 64 --batch_size 32 --norm_type batch --with_disparity_conv --checkpoint_dir ./checkpoints_your_model_name_here --vis_dir ./val_out_your_model_name_here

Network architectures and performance

In the following, we show the decoding/classification accuracy with different model architectures. We hope these statistics can help you if you want to build your own model.

Citation

If you use our code for your research, please cite the following paper:

@misc{zou2020neuralmagiceye,
      title={NeuralMagicEye: Learning to See and Understand the Scene Behind an Autostereogram}, 
      author={Zhengxia Zou and Tianyang Shi and Yi Yuan and Zhenwei Shi},
      year={2020},
      eprint={2012.15692},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Owner
Zhengxia Zou
Postdoc at the University of Michigan. Research interest: computer vision and applications in remote sensing, self-driving, and video games.
Zhengxia Zou
LSTMs (Long Short Term Memory) RNN for prediction of price trends

Price Prediction with Recurrent Neural Networks LSTMs BTC-USD price prediction with deep learning algorithm. Artificial Neural Networks specifically L

5 Nov 12, 2021
SAN for Product Attributes Prediction

SAN Heterogeneous Star Graph Attention Network for Product Attributes Prediction This repository contains the official PyTorch implementation for ADVI

Xuejiao Zhao 9 Dec 12, 2022
nnFormer: Interleaved Transformer for Volumetric Segmentation

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation ". Please

jsguo 610 Dec 28, 2022
PyTorch wrappers for using your model in audacity!

audacitorch This package contains utilities for prepping PyTorch audio models for use in Audacity. More specifically, it provides abstract classes for

Hugo Flores García 130 Dec 14, 2022
LocUNet is a deep learning method to localize a UE based solely on the reported signal strengths from a set of BSs.

LocUNet LocUNet is a deep learning method to localize a UE based solely on the reported signal strengths from a set of BSs. The method utilizes accura

4 Oct 05, 2022
An implementation of the paper "A Neural Algorithm of Artistic Style"

A Neural Algorithm of Artistic Style implementation - Neural Style Transfer This is an implementation of the research paper "A Neural Algorithm of Art

Srijarko Roy 27 Sep 20, 2022
Learned Initializations for Optimizing Coordinate-Based Neural Representations

Learned Initializations for Optimizing Coordinate-Based Neural Representations Project Page | Paper Matthew Tancik*1, Ben Mildenhall*1, Terrance Wang1

Matthew Tancik 127 Jan 03, 2023
2D Human Pose estimation using transformers. Implementation in Pytorch

PE-former: Pose Estimation Transformer Vision transformer architectures perform very well for image classification tasks. Efforts to solve more challe

Panteleris Paschalis 23 Oct 17, 2022
The open-source and free to use Python package miseval was developed to establish a standardized medical image segmentation evaluation procedure

miseval: a metric library for Medical Image Segmentation EVALuation The open-source and free to use Python package miseval was developed to establish

59 Dec 10, 2022
LyaNet: A Lyapunov Framework for Training Neural ODEs

LyaNet: A Lyapunov Framework for Training Neural ODEs Provide the model type--config-name to train and test models configured as those shown in the pa

Ivan Dario Jimenez Rodriguez 21 Nov 21, 2022
Autonomous Perception: 3D Object Detection with Complex-YOLO

Autonomous Perception: 3D Object Detection with Complex-YOLO LiDAR object detect

Thomas Dunlap 2 Feb 18, 2022
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

Payphone 8 Nov 21, 2022
Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Relational Self-Attention: What's Missing in Attention for Video Understanding This repository is the official implementation of "Relational Self-Atte

mandos 43 Dec 07, 2022
End-to-end Temporal Action Detection with Transformer. [Under review]

TadTR: End-to-end Temporal Action Detection with Transformer By Xiaolong Liu, Qimeng Wang, Yao Hu, Xu Tang, Song Bai, Xiang Bai. This repo holds the c

Xiaolong Liu 105 Dec 25, 2022
This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

Hierarchical Motion Understanding via Motion Programs (CVPR 2021) This repository contains the official implementation of: Hierarchical Motion Underst

Sumith Kulal 40 Dec 05, 2022
Kaggle | 9th place (part of) solution for the Bristol-Myers Squibb – Molecular Translation challenge

Part of the 9th place solution for the Bristol-Myers Squibb – Molecular Translation challenge translating images containing chemical structures into I

Erdene-Ochir Tuguldur 22 Nov 30, 2022
Rethinking of Pedestrian Attribute Recognition: A Reliable Evaluation under Zero-Shot Pedestrian Identity Setting

Pytorch Pedestrian Attribute Recognition: A strong PyTorch baseline of pedestrian attribute recognition and multi-label classification.

Jian 79 Dec 18, 2022
Speedy Implementation of Instance-based Learning (IBL) agents in Python

A Python library to create single or multi Instance-based Learning (IBL) agents that are built based on Instance Based Learning Theory (IBLT) 1 Instal

0 Nov 18, 2021
Multi-angle c(q)uestion answering

Macaw Introduction Macaw (Multi-angle c(q)uestion answering) is a ready-to-use model capable of general question answering, showing robustness outside

AI2 430 Jan 04, 2023
Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

torch-imle Concise and self-contained PyTorch library implementing the I-MLE gradient estimator proposed in our NeurIPS 2021 paper Implicit MLE: Backp

UCL Natural Language Processing 249 Jan 03, 2023