Vision transformers (ViTs) have found only limited practical use in processing images

Last update: Sep 10, 2022

Related tags

Overview

CXV

Convolutional Xformers for Vision

Vision transformers (ViTs) have found only limited practical use in processing images, in spite of their state-of-the-art accuracy on certain benchmarks. The reason for their limited use include their need for larger training datasets and more computational resources compared to convolutional neural networks (CNNs), owing to the quadratic complexity of their self-attention mechanism. We propose a linear attention-convolution hybrid architecture -- Convolutional X-formers for Vision (CXV) -- to overcome these limitations. We replace the quadratic attention with linear attention mechanisms, such as Performer, Nyströmformer, and Linear Transformer, to reduce its GPU usage. Inductive prior for image data is provided by convolutional sub-layers, thereby eliminating the need for class token and positional embeddings used by the ViTs. CXV outperforms other architectures, token mixers (eg ConvMixer, FNet and MLP Mixer), transformer models (eg ViT, CCT, CvT and hybrid Xformers), and ResNets for image classification in scenarios with limited data and GPU resources.

Models:

CNV - Convolutional Nyströmformer for Vision
CPV - Convolutional Performer for Vision
CLTV - Convolutional Linear Transformer for Vision

Vision transformers (ViTs) have found only limited practical use in processing images

Related tags

Overview

CXV

Convolutional Xformers for Vision

Owner

Cloudwalker

Official code for "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" [ICRA 2021]

A knowledge base construction engine for richly formatted data

The official PyTorch implementation for NCSNv2 (NeurIPS 2020)

Stochastic Scene-Aware Motion Prediction

Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch

Pixel-wise segmentation on VOC2012 dataset using pytorch.

Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation"

Reinforcement Learning for Automated Trading

A heterogeneous entity-augmented academic language model based on Open Academic Graph (OAG)

NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

Official code for NeurIPS 2021 paper "Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN"

Neural Koopman Lyapunov Control

Official pytorch implementation of the AAAI 2021 paper Semantic Grouping Network for Video Captioning

Videocaptioning.pytorch - A simple implementation of video captioning

🔎 Monitor deep learning model training and hardware usage from your mobile phone 📱

This is the code repository for the paper A hierarchical semantic segmentation framework for computer-vision-based bridge column damage detection

Implementation of "Scaled-YOLOv4: Scaling Cross Stage Partial Network" using PyTorch framwork.

Spatial Intention Maps for Multi-Agent Mobile Manipulation (ICRA 2021)

Part-aware Measurement for Robust Multi-View Multi-Human 3D Pose Estimation and Tracking

PyTorch implementation of our ICCV 2021 paper Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer.