Pyramid Pooling Transformer for Scene Understanding

Related tags

Deep LearningP2T
Overview

Pyramid Pooling Transformer for Scene Understanding

Requirements:

  • torch 1.6+
  • torchvision 0.7.0
  • timm==0.3.2
  • Validated on torch 1.6.0, torchvision 0.7.0

Models Pretrained on ImageNet1K

Variants Input Size [email protected] [email protected] #Params (M) Pretrained Models
P2T-Tiny 224 x 224 78.1 94.1 11.1 Google Drive
P2T-Small 224 x 224 82.1 95.9 23.0 Google Drive
P2T-Base 224 x 224 83.0 96.2 36.2 Google Drive

Pretrained Models for Downstream tasks

To be updated.

Something Else

Note: we have prepared a stronger version of P2T. Since P2T is still in peer review, we will release the stronger P2T after the acceptance.

You might also like...
 Neural Scene Graphs for Dynamic Scene (CVPR 2021)
Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Implementation of Neural Scene Graphs, that optimizes multiple radiance fields to represent different objects and a static scene background. Learned representations can be rendered with novel object compositions and views.

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Automatic number plate recognition using tech:  Yolo, OCR, Scene text detection, scene text recognation, flask, torch
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

Code for
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Learning the Best Pooling Strategy for Visual Semantic Embedding Official PyTorch implementation of the paper Learning the Best Pooling Strategy for V

Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021

ATLOP Code for AAAI 2021 paper Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling. If you make use of this co

This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.
This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.

Locus This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order

Compact Bilinear Pooling for PyTorch

Compact Bilinear Pooling for PyTorch. This repository has a pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch. This

A Pytorch Implementation for Compact Bilinear Pooling.

CompactBilinearPooling-Pytorch A Pytorch Implementation for Compact Bilinear Pooling. Adapted from tensorflow_compact_bilinear_pooling Prerequisites I

Comments
  • How to load ImageNet1K pretrained weight to semantic segmentation model?

    How to load ImageNet1K pretrained weight to semantic segmentation model?

    Hello, thanks for open source!

    I use mmseg, and load weight from image classification result, it warns: WARNING - The model and loaded state dict do not match exactly missing keys in source state_dict: backbone.head.weight, backbone.head.bias unexpected key in source state_dict: cls_token, ln1.bias, ln1.weight, layers.0.ln1.bias, layers.0.ln1.weight, layers.0.ln2.bias, layers.0.ln2.weight, layers.0.ffn.layers.0.0.bias, layers.0.ffn.layers.0.0.weight, layers.0.ffn.layers.1.bias, layers.0.ffn.layers.1.weight, layers.0.attn.attn.out_proj.bias, layers.0.attn.attn.out_proj.weight, layers.0.attn.attn.in_proj_bias, layers.0.attn.attn.in_proj_weight, layers.1.ln1.bias, layers.1.ln1.weight, layers.1.ln2.bias, layers.1.ln2.weight, layers.1.ffn.layers.0.0.bias, layers.1.ffn.layers.0.0.weight, layers.1.ffn.layers.1.bias, layers.1.ffn.layers.1.weight, layers.1.attn.attn.out_proj.bias, layers.1.attn.attn.out_proj.weight ...... And the experimental results are terrible as the experiments initialize weight with random.

    So I load weight from ADE20K result, it work and warns: WARNING - The model and loaded state dict do not match exactly missing keys in source state_dict: backbone.head.weight, backbone.head.bias And the result is similar to the result you offer.

    Which weight should I load? ImageNet1K or ADE20K? Or should I modify the keys of weight in ImageNet1K to adapt the key in segmentation?

    opened by asd123pwj 8
  • Questions about your ablation studies

    Questions about your ablation studies

    Hello,

    I have some questions about your ablation studies of pyramid pooling. Could you detail about your baseline version in Table 9? First, you say that you replace P-MHSA with an MHSA with a single pooling operation, what is the detail about single pooling operation? Ex: Pooling Ratios? Second, do you compared your method with original MHSA?

    opened by pp00704831 3
  • P2T replaces PVT trunk bug

    P2T replaces PVT trunk bug

    When I replaced the PVT trunk with P2T in my code, I encountered an error :
    RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 512, 3, 3]], which is output 0 of AdaptiveAvgPool2DBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

    opened by liu-tianxiang 2
  • P2T on ImageNet-22K?

    P2T on ImageNet-22K?

    Hi @yuhuan-wu , thank you for share the code of this excellent work! Have you trained P2T on ImageNet-22K dataset or any further plan to do it? If so, could you please share the pretrained model on ImageNet-22k?

    Thank you.

    opened by fyaft2012 1
Owner
Yu-Huan Wu
Ph.D. student at Nankai University
Yu-Huan Wu
Code for IntraQ, PyTorch implementation of our paper under review

IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization paper Requirements Python = 3.7.10 Pytorch == 1.7

1 Nov 19, 2021
Industrial Image Anomaly Localization Based on Gaussian Clustering of Pre-trained Feature

Industrial Image Anomaly Localization Based on Gaussian Clustering of Pre-trained Feature Q. Wan, L. Gao, X. Li and L. Wen, "Industrial Image Anomaly

smiler 6 Dec 25, 2022
PyTorch experiments with the Zalando fashion-mnist dataset

zalando-pytorch PyTorch experiments with the Zalando fashion-mnist dataset Project Organization ├── LICENSE ├── Makefile - Makefile with co

Federico Baldassarre 31 Sep 25, 2021
Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets

Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets (including obl

Azavea 1.7k Dec 22, 2022
Object Detection using YOLO from PyImageSearch

Object Detection using YOLO from PyImageSearch By applying object detection, you’ll not only be able to determine what is in an image, but also where

Mohamed NIANG 1 Feb 09, 2022
Code for the paper "PortraitNet: Real-time portrait segmentation network for mobile device" @ CAD&Graphics2019

PortraitNet Code for the paper "PortraitNet: Real-time portrait segmentation network for mobile device". @ CAD&Graphics 2019 Introduction We propose a

265 Dec 01, 2022
Augmentation for Single-Image-Super-Resolution

SRAugmentation Augmentation for Single-Image-Super-Resolution Implimentation CutBlur Cutout CutMix Cutup CutMixup Blend RGBPermutation Identity OneOf

Yubo 6 Jun 27, 2022
This repository contains an implementation of the Permutohedral Attention Module in Pytorch

Permutohedral_attention_module This repository contains an implementation of the Permutohedral Attention Module

Samuel JOUTARD 26 Nov 27, 2022
BasicRL: easy and fundamental codes for deep reinforcement learning。It is an improvement on rainbow-is-all-you-need and OpenAI Spinning Up.

BasicRL: easy and fundamental codes for deep reinforcement learning BasicRL is an improvement on rainbow-is-all-you-need and OpenAI Spinning Up. It is

RayYoh 12 Apr 28, 2022
A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding his way.

GuidEye A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding h

Munal Jain 0 Aug 09, 2022
Lolviz - A simple Python data-structure visualization tool for lists of lists, lists, dictionaries; primarily for use in Jupyter notebooks / presentations

lolviz By Terence Parr. See Explained.ai for more stuff. A very nice looking javascript lolviz port with improvements by Adnan M.Sagar. A simple Pytho

Terence Parr 785 Dec 30, 2022
Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery (ICCV 2021 Oral) Run this model on Replicate Optimization: Global directions: Mapper: Check ou

3.3k Jan 05, 2023
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

MultiModal-InfoMax This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Informa

Deep Cognition and Language Research (DeCLaRe) Lab 89 Dec 26, 2022
📚 Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks. Papermill lets you: parameterize notebooks execute notebooks This

nteract 5.1k Jan 03, 2023
Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR)

This is the official implementation of our paper Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR), which has been accepted by WSDM2022.

Yongchun Zhu 81 Dec 29, 2022
Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

SuperGAT Official implementation of Self-supervised Graph Attention Networks (SuperGAT). This model is presented at How to Find Your Friendly Neighbor

Dongkwan Kim 127 Dec 28, 2022
Official implementation for the paper "Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection"

Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection PyTorch code release of the paper "Attentive Prototypes for Sour

Deepti Hegde 23 Oct 17, 2022
A Streamlit demo demonstrating the Deep Dream technique. Adapted from the TensorFlow Deep Dream tutorial.

Streamlit Demo: Deep Dream A Streamlit demo demonstrating the Deep Dream technique. Adapted from the TensorFlow Deep Dream tutorial How to run this de

Streamlit 11 Dec 12, 2022
Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code

Learning the Beauty in Songs: Neural Singing Voice Beautifier Jinglin Liu, Chengxi Li, Yi Ren, Zhiying Zhu, Zhou Zhao Zhejiang University ACL 2022 Mai

Jinglin Liu 257 Dec 30, 2022
CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability

This is the official repository of the paper: CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability A private copy of the

Fadi Boutros 33 Dec 31, 2022