Are Convolutional Neural Networks or Transformers more like human vision?

This repository contains the code and fine-tuned models of popular Convolutional Neural Networks (CNNs) and the recently proposed Vision Transformer (ViT) on the augmented Imagenet dataset and the shape/texture bias tests run on the Stylized Imagenet dataset.

This work compares CNNs and the ViT against humans in terms of error consistency beyond traditional metrics. Through these tests, we were able to show that recently proposed self-attention based Transformer models have more human-like errors that traditional CNNs.

Colab

You can directly run tests on the results using a Google Colaboratory without needing to install anything on your local machine. Click "Open in Colab" below:

Developer

Shikhar Tuli. For any questions, comments or suggestions, please reach me at [email protected].

Cite this work

If you use our experimental results or fine-tuned models, please cite:

@article{tuli2021cogsci,
      title={Are Convolutional Neural Networks or Transformers more like human vision?}, 
      author={Shikhar Tuli and Ishita Dasgupta and Erin Grant and Thomas L. Griffiths},
      year={2021},
      eprint={2105.07197},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Study of human inductive biases in CNNs and Transformers.

Related tags

Overview

Are Convolutional Neural Networks or Transformers more like human vision?

Colab

Developer

Cite this work

Owner

Shikhar Tuli

Improving Transferability of Representations via Augmentation-Aware Self-Supervision

Milano is a tool for automating hyper-parameters search for your models on a backend of your choice.

Determined: Deep Learning Training Platform

State of the Art Neural Networks for Generative Deep Learning

An implementation of "Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport"

Single-Stage Instance Shadow Detection with Bidirectional Relation Learning (CVPR 2021 Oral)

Acoustic mosquito detection code with Bayesian Neural Networks

OpenDILab Multi-Agent Environment

Winning solution of the Indoor Location & Navigation Kaggle competition

It is an open dataset for object detection in remote sensing images.

Codes accompanying the paper "Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning" (NeurIPS 2021 Spotlight

The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting".

MARE - Multi-Attribute Relation Extraction

B-cos Networks: Attention is All we Need for Interpretability

Implementation of CaiT models in TensorFlow and ImageNet-1k checkpoints. Includes code for inference and fine-tuning.

A chemical analysis of lipophilicities & molecule drawings including ML

Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)

Attention for PyTorch with Linear Memory Footprint

Code that accompanies the paper Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance

Code for the submitted paper Surrogate-based cross-correlation for particle image velocimetry