Vision transformers (ViTs) have found only limited practical use in processing images

Related tags

Deep LearningCXV
Overview

CXV

Convolutional Xformers for Vision

PWCPWC PWC

Vision transformers (ViTs) have found only limited practical use in processing images, in spite of their state-of-the-art accuracy on certain benchmarks. The reason for their limited use include their need for larger training datasets and more computational resources compared to convolutional neural networks (CNNs), owing to the quadratic complexity of their self-attention mechanism. We propose a linear attention-convolution hybrid architecture -- Convolutional X-formers for Vision (CXV) -- to overcome these limitations. We replace the quadratic attention with linear attention mechanisms, such as Performer, Nyströmformer, and Linear Transformer, to reduce its GPU usage. Inductive prior for image data is provided by convolutional sub-layers, thereby eliminating the need for class token and positional embeddings used by the ViTs. CXV outperforms other architectures, token mixers (eg ConvMixer, FNet and MLP Mixer), transformer models (eg ViT, CCT, CvT and hybrid Xformers), and ResNets for image classification in scenarios with limited data and GPU resources.

Models:

  • CNV - Convolutional Nyströmformer for Vision
  • CPV - Convolutional Performer for Vision
  • CLTV - Convolutional Linear Transformer for Vision
Owner
Cloudwalker
Cloudwalker
Emotion Recognition from Facial Images

Reconhecimento de Emoções a partir de imagens faciais Este projeto implementa um classificador simples que utiliza técncias de deep learning e transfe

Gabriel 2 Feb 09, 2022
Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

184 Dec 11, 2022
K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce (EMNLP Founding 2021)

Introduction K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce. Installation PyTor

Xu Song 21 Nov 16, 2022
Preparation material for Dropbox interviews

Dropbox-Onsite-Interviews A guide for the Dropbox onsite interview! The Dropbox interview question bank is very small. The bank has been in a Chinese

386 Dec 31, 2022
Read and write layered TIFF ImageSourceData and ImageResources tags

Read and write layered TIFF ImageSourceData and ImageResources tags Psdtags is a Python library to read and write the Adobe Photoshop(r) specific Imag

Christoph Gohlke 4 Feb 05, 2022
Make your master artistic punk avatar through machine learning world famous paintings.

Master-art-punk Make your master artistic punk avatar through machine learning world famous paintings. 通过机器学习世界名画制作属于你的大师级艺术朋克头像 Nowadays, NFT is beco

Philipjhc 53 Dec 27, 2022
Unofficial implementation of Point-Unet: A Context-Aware Point-Based Neural Network for Volumetric Segmentation

Point-Unet This is an unofficial implementation of the MICCAI 2021 paper Point-Unet: A Context-Aware Point-Based Neural Network for Volumetric Segment

Namt0d 9 Dec 07, 2022
Rust bindings for the C++ api of PyTorch.

tch-rs Rust bindings for the C++ api of PyTorch. The goal of the tch crate is to provide some thin wrappers around the C++ PyTorch api (a.k.a. libtorc

Laurent Mazare 2.3k Dec 30, 2022
A collection of Reinforcement Learning algorithms from Sutton and Barto's book and other research papers implemented in Python.

Reinforcement-Learning-Notebooks A collection of Reinforcement Learning algorithms from Sutton and Barto's book and other research papers implemented

Pulkit Khandelwal 1k Dec 28, 2022
Aesara is a Python library that allows one to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays.

Aesara is a Python library that allows one to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays.

Aesara 898 Jan 07, 2023
This is Official implementation for "Pose-guided Feature Disentangling for Occluded Person Re-Identification Based on Transformer" in AAAI2022

PFD:Pose-guided Feature Disentangling for Occluded Person Re-identification based on Transformer This repo is the official implementation of "Pose-gui

Tao Wang 93 Dec 18, 2022
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

LancoPKU 105 Jan 03, 2023
Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.

Neural Magic Eye Preprint | Project Page | Colab Runtime Official PyTorch implementation of the preprint paper "NeuralMagicEye: Learning to See and Un

Zhengxia Zou 56 Jul 15, 2022
Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation

Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation Introduction WAKD is a PyTorch implementation for our ICPR-2022 pap

2 Oct 20, 2022
Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

GradTTS Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv) About this repo This is an unoffic

HeyangXue1997 103 Dec 23, 2022
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Jan 03, 2023
This repository contains python code necessary to replicated the experiments performed in our paper "Invariant Ancestry Search"

InvariantAncestrySearch This repository contains python code necessary to replicated the experiments performed in our paper "Invariant Ancestry Search

Phillip Bredahl Mogensen 0 Feb 02, 2022
PartImageNet is a large, high-quality dataset with part segmentation annotations

PartImageNet: A Large, High-Quality Dataset of Parts We will release our dataset and scripts soon after cleaning and approval. Introduction PartImageN

Ju He 77 Nov 30, 2022
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

piglet PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021] This repo contains code and data for PIGLeT. If you like

Rowan Zellers 51 Oct 08, 2022