A PyTorch implementation of "CoAtNet: Marrying Convolution and Attention for All Data Sizes".

Last update: Jan 07, 2023

Overview

CoAtNet

Overview

This is a PyTorch implementation of CoAtNet specified in "CoAtNet: Marrying Convolution and Attention for All Data Sizes", arXiv 2021.

👉 Check out MobileViT if you are interested in other Convolution + Transformer models.

Usage

import torch
from coatnet import coatnet_0

img = torch.randn(1, 3, 224, 224)
net = coatnet_0()
out = net(img)

Try out other block combinations mentioned in the paper:

from coatnet import CoAtNet

num_blocks = [2, 2, 3, 5, 2]            # L
channels = [64, 96, 192, 384, 768]      # D
block_types=['C', 'T', 'T', 'T']        # 'C' for MBConv, 'T' for Transformer

net = CoAtNet((224, 224), 3, num_blocks, channels, block_types=block_types)
out = net(img)

Citation

@article{dai2021coatnet,
  title={CoAtNet: Marrying Convolution and Attention for All Data Sizes},
  author={Dai, Zihang and Liu, Hanxiao and Le, Quoc V and Tan, Mingxing},
  journal={arXiv preprint arXiv:2106.04803},
  year={2021}
}

Credits

Code adapted from MobileNetV2 and ViT.

Owner

Justin Wu

GitHub Repository https://arxiv.org/abs/2106.04803

A framework for analyzing computer vision models with simulated data

3DB: A framework for analyzing computer vision models with simulated data Paper Quickstart guide Blog post Installation Follow instructions on: https:

112 Jan 01, 2023

The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch Railway

Openspoor The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch

7 Aug 22, 2022

For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.

LongScientificFormer For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training. Some code

6 Nov 02, 2022

A python module for scientific analysis of 3D objects based on VTK and Numpy

A lightweight and powerful python module for scientific analysis and visualization of 3d objects.

1.5k Jan 06, 2023

Differentiable simulation for system identification and visuomotor control

gradsim gradSim: Differentiable simulation for system identification and visuomotor control gradSim is a unified differentiable rendering and multiphy

105 Dec 18, 2022

Official Chainer implementation of GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral)

GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral) [Project] [Paper] [Demo] [Related Work: A2RL (for Auto Image Cropping)] [C

402 Dec 27, 2022

The goal of the exercises below is to evaluate the candidate knowledge and problem solving expertise regarding the main development focuses for the iFood ML Platform team: MLOps and Feature Store development.

The goal of the exercises below is to evaluate the candidate knowledge and problem solving expertise regarding the main development focuses for the iFood ML Platform team: MLOps and Feature Store dev

0 Feb 03, 2022

Contains code for Deep Kernelized Dense Geometric Matching

DKM - Deep Kernelized Dense Geometric Matching Contains code for Deep Kernelized Dense Geometric Matching We provide pretrained models and code for ev

83 Dec 23, 2022

Code for the paper "Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are in envir

269 Jan 05, 2023

LSTM-VAE Implementation and Relevant Evaluations

LSTM-VAE Implementation and Relevant Evaluations Before using any file in this repository, please create two directories under the root directory name

5 Oct 08, 2022

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

161 Dec 08, 2022

DCSL - Generalizable Crowd Counting via Diverse Context Style Learning

DCSL Generalizable Crowd Counting via Diverse Context Style Learning Requirement

3 Jun 13, 2022

Tooling for the Common Objects In 3D dataset.

CO3D: Common Objects In 3D This repository contains a set of tools for working with the Common Objects in 3D (CO3D) dataset. Download the dataset The

724 Jan 06, 2023

The project was to detect traffic signs, based on the Megengine framework.

trafficsign 赛题旷视AI智慧交通开源赛道，初赛1/177，复赛1/12。本赛题为复杂场景的交通标志检测，对五种交通标志进行识别。框架 megengine 算法方案网络框架 atss + resnext101_32x8d 训练阶段图片尺寸最终提交版本输入图片尺寸为(1500,2

20 Dec 02, 2022

Learning from Synthetic Humans, CVPR 2017

Learning from Synthetic Humans (SURREAL) Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev and Cordelia Schmid,

538 Dec 18, 2022

End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model

onnx-facial-lmk-detector End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model, model.onnx. Demo You can

42 Dec 30, 2022

Official code for "Stereo Waterdrop Removal with Row-wise Dilated Attention (IROS2021)"

Stereo-Waterdrop-Removal-with-Row-wise-Dilated-Attention This repository includes official codes for "Stereo Waterdrop Removal with Row-wise Dilated A

29 Oct 01, 2022

QT Py Media Knob using rotary encoder & neopixel ring

QTPy-Knob QT Py USB Media Knob using rotary encoder & neopixel ring The QTPy-Knob features: Media knob for volume up/down/mute with "qtpy-knob.py" Cir

56 Dec 30, 2022

A sample pytorch Implementation of ACL 2021 research paper "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

Span-ASTE-Pytorch This repository is a pytorch version that implements Ali's ACL 2021 research paper Learning Span-Level Interactions for Aspect Senti

10 Dec 06, 2022

[UNMAINTAINED] Automated machine learning for analytics & production

auto_ml Automated machine learning for production and analytics Installation pip install auto_ml Getting started from auto_ml import Predictor from au

1.6k Jan 02, 2023

A PyTorch implementation of "CoAtNet: Marrying Convolution and Attention for All Data Sizes".

Related tags

Overview

CoAtNet

Overview

Usage

Citation

Credits

Owner

Justin Wu

A framework for analyzing computer vision models with simulated data

The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch Railway

For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.

A python module for scientific analysis of 3D objects based on VTK and Numpy

Differentiable simulation for system identification and visuomotor control

Official Chainer implementation of GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral)

The goal of the exercises below is to evaluate the candidate knowledge and problem solving expertise regarding the main development focuses for the iFood ML Platform team: MLOps and Feature Store development.

Contains code for Deep Kernelized Dense Geometric Matching

Code for the paper "Reinforcement Learning as One Big Sequence Modeling Problem"

LSTM-VAE Implementation and Relevant Evaluations

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

DCSL - Generalizable Crowd Counting via Diverse Context Style Learning

Tooling for the Common Objects In 3D dataset.

The project was to detect traffic signs, based on the Megengine framework.

Learning from Synthetic Humans, CVPR 2017

End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model

Official code for "Stereo Waterdrop Removal with Row-wise Dilated Attention (IROS2021)"

QT Py Media Knob using rotary encoder & neopixel ring

A sample pytorch Implementation of ACL 2021 research paper "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

[UNMAINTAINED] Automated machine learning for analytics & production