Vision transformers (ViTs) have found only limited practical use in processing images

Last update: Sep 10, 2022

Related tags

Overview

CXV

Convolutional Xformers for Vision

Vision transformers (ViTs) have found only limited practical use in processing images, in spite of their state-of-the-art accuracy on certain benchmarks. The reason for their limited use include their need for larger training datasets and more computational resources compared to convolutional neural networks (CNNs), owing to the quadratic complexity of their self-attention mechanism. We propose a linear attention-convolution hybrid architecture -- Convolutional X-formers for Vision (CXV) -- to overcome these limitations. We replace the quadratic attention with linear attention mechanisms, such as Performer, Nyströmformer, and Linear Transformer, to reduce its GPU usage. Inductive prior for image data is provided by convolutional sub-layers, thereby eliminating the need for class token and positional embeddings used by the ViTs. CXV outperforms other architectures, token mixers (eg ConvMixer, FNet and MLP Mixer), transformer models (eg ViT, CCT, CvT and hybrid Xformers), and ResNets for image classification in scenarios with limited data and GPU resources.

Models:

CNV - Convolutional Nyströmformer for Vision
CPV - Convolutional Performer for Vision
CLTV - Convolutional Linear Transformer for Vision

Vision transformers (ViTs) have found only limited practical use in processing images

Related tags

Overview

CXV

Convolutional Xformers for Vision

Owner

Cloudwalker

FANet - Real-time Semantic Segmentation with Fast Attention

Fully Connected DenseNet for Image Segmentation

Official implementation of Rich Semantics Improve Few-Shot Learning (BMVC, 2021)

'A C2C E-COMMERCE TRUST MODEL BASED ON REPUTATION' Python implementation

Artificial intelligence technology inferring issues and logically supporting facts from raw text

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

Image process framework based on plugin like imagej, it is esay to glue with scipy.ndimage, scikit-image, opencv, simpleitk, mayavi...and any libraries based on numpy

An experimentation and research platform to investigate the interaction of automated agents in an abstract simulated network environments.

Code for MSc Quantitative Finance Dissertation

Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting

Adapter-BERT: Parameter-Efficient Transfer Learning for NLP.

A benchmark for the task of translation suggestion

Implementation of " SESS: Self-Ensembling Semi-Supervised 3D Object Detection" (CVPR2020 Oral)

Greedy Gaussian Segmentation

RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

Generalized Data Weighting via Class-level Gradient Manipulation

This implementation contains the application of GPlearn's symbolic transformer on a commodity futures sector of the financial market.

A PyTorch implementation of QANet.

Unofficial implementation of One-Shot Free-View Neural Talking Head Synthesis

tmm_fast is a lightweight package to speed up optical planar multilayer thin-film device computation.