Implementation of the Point Transformer layer, in Pytorch

Overview

Point Transformer - Pytorch

Implementation of the Point Transformer self-attention layer, in Pytorch. The simple circuit above seemed to have allowed their group to outperform all previous methods in point cloud classification and segmentation.

Install

$ pip install point-transformer-pytorch

Usage

import torch
from point_transformer_pytorch import PointTransformerLayer

attn = PointTransformerLayer(
    dim = 128,
    pos_mlp_hidden_dim = 64,
    attn_mlp_hidden_mult = 4
)

x = torch.randn(1, 16, 128)
pos = torch.randn(1, 16, 3)

attn(x, pos) # (1, 16, 128)

Citations

@misc{zhao2020point,
    title={Point Transformer}, 
    author={Hengshuang Zhao and Li Jiang and Jiaya Jia and Philip Torr and Vladlen Koltun},
    year={2020},
    eprint={2012.09164},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Comments
  • Did You Falsify Your Experimental Results???

    Did You Falsify Your Experimental Results???

    No one can reproduce the performance reported in your original paper. Please post your pre-trained model or your original code. Otherwise, we must question your academic ethics!****

    opened by TruthIsEveryThing 1
  • Issues with my wrapper code

    Issues with my wrapper code

    I wrote some wrapper code to turn this layer into a full transformer and I can't seem to figure out what is going wrong. The following works:

    import torch
    from torch import nn, einsum
    import x_transformers
    from point_transformer_pytorch import PointTransformerLayer
    
    layer = PointTransformerLayer(
        dim = 7,
        pos_mlp_hidden_dim = 64,
        attn_mlp_hidden_mult = 4,
        num_neighbors = 16          # only the 16 nearest neighbors would be attended to for each point
    )
    
    feats = torch.randn(1, 5, 7)
    pos = torch.randn(1, 5, 3)
    mask = torch.ones(1, 5).bool()
    
    y = layer(feats, pos, mask = mask)
    

    However this doesn't work

    import torch
    from torch import nn, einsum
    import x_transformers
    from point_transformer_pytorch import PointTransformerLayer
    
    class PointTransformer(nn.Module):
        def __init__(self, feats, mask, neighbors = 16, layers=5, dimension=5):
            
            super().__init__()
            
            self.feats = feats
            self.mask = mask
            self.neighbors = neighbors
            
            self.layers = []
            
            for _ in range(layers):
                self.layers.append(PointTransformerLayer(
                    dim = dimension,
                    pos_mlp_hidden_dim = 64,
                    attn_mlp_hidden_mult = 4,
                    num_neighbors = self.neighbors
                ))
    
        def forward(self, pos):
            curr_pos = pos
            for layer in self.layers:
                print(curr_pos)
                curr_pos = layer(self.feats, pos, self.mask)
                print("----")
            return curr_pos
    
    model = PointTransformer(feats, mask)
    model(pos)
    

    The error I'm getting is mat1 and mat2 shapes cannot be multiplied (5x7 and 5x15)

    opened by StellaAthena 1
  • point clouds with different number of points

    point clouds with different number of points

    Great job! I have a question about the number of the points in the point cloud. Do you have any suggestion to deal with point clouds with different point. As I know, point cloud models are always applied in Shapenet which contains point clouds with 2048 points. So what can we do if the number of the point clouds is not constant?

    opened by 1999kevin 0
  • Scalar attention or vector attention in the multi-head variant

    Scalar attention or vector attention in the multi-head variant

    It seems that the implementation of the multi-head point transformer produces scalar attention scores for each head.

    https://github.com/lucidrains/point-transformer-pytorch/blob/99bc3958138d8c9d3b882e4ac50b1a18a86160fe/point_transformer_pytorch/multihead_point_transformer_pytorch.py#L62

    opened by ZikangZhou 2
  • The layer structure and mask

    The layer structure and mask

    Hi,

    Thanks for this contribution. In the implementation of attn_mlp the first linear layer increases the dimension. Is this a standard practice because I did not find any details about this in the paper. Also paper also does not describe use of mask, is this again some standard practice for attention layers?

    Thanks!!

    opened by ayushais 1
  • Invariant to cardinality?

    Invariant to cardinality?

    Dear Authors, In your paper you wrote: "The layer is invariant to permutation and cardinality and is thus inherently suited to point cloud processing."

    I do not understand this statement, because your PointTransformerLayer https://github.com/lucidrains/point-transformer-pytorch/blob/main/point_transformer_pytorch/point_transformer_pytorch.py#L31 requires the dim parameter in initialization. So it always expects dim elements in input. What if a point cloud has dim+1 points?

    Thank you in advance.

    opened by decadenza 0
  • Cost too much memory

    Cost too much memory

    I'm not sure whether I used the point-transformer correctly: I just implemented one block for training, and the data shape of (x, pos) in each gpu are both [16, 2048, 3], later I was informed that my gpu is running out of the memory(11.77 GB total capacity)

    opened by JLU-Neal 9
Releases(0.1.5)
Owner
Phil Wang
Working with Attention. It's all we need.
Phil Wang
Official and maintained implementation of the paper "OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data" [BMVC 2021].

OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data Christoph Reich, Tim Prangemeier, Özdemir Cetin & Heinz Koeppl | Pr

Christoph Reich 23 Sep 21, 2022
Neural Ensemble Search for Performant and Calibrated Predictions

Neural Ensemble Search Introduction This repo contains the code accompanying the paper: Neural Ensemble Search for Performant and Calibrated Predictio

AutoML-Freiburg-Hannover 26 Dec 12, 2022
Rate-limit-semaphore - Semaphore implementation with rate limit restriction for async-style (any core)

Rate Limit Semaphore Rate limit semaphore for async-style (any core) There are t

Yan Kurbatov 4 Jun 21, 2022
[NeurIPS 2021] Shape from Blur: Recovering Textured 3D Shape and Motion of Fast Moving Objects

[NeurIPS 2021] Shape from Blur: Recovering Textured 3D Shape and Motion of Fast Moving Objects YouTube | arXiv Prerequisites Kaolin is available here:

Denys Rozumnyi 107 Dec 26, 2022
The Noise Contrastive Estimation for softmax output written in Pytorch

An NCE implementation in pytorch About NCE Noise Contrastive Estimation (NCE) is an approximation method that is used to work around the huge computat

Kaiyu Shi 287 Nov 25, 2022
This is the source code for the experiments related to the paper Unsupervised Audio Source Separation Using Differentiable Parametric Source Models

Unsupervised Audio Source Separation Using Differentiable Parametric Source Models This is the source code for the experiments related to the paper Un

30 Oct 19, 2022
Language Models for the legal domain in Spanish done @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Spanish legal domain Language Model ⚖️ This repository contains the page for two main resources for the Spanish legal domain: A RoBERTa model: https:/

Plan de Tecnologías del Lenguaje - Gobierno de España 12 Nov 14, 2022
The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.

OC-SORT Observation-Centric SORT (OC-SORT) is a pure motion-model-based multi-object tracker. It aims to improve tracking robustness in crowded scenes

Jinkun Cao 325 Jan 05, 2023
[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

MuVER This repo contains the code and pre-trained model for our EMNLP 2021 paper: MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity

24 May 30, 2022
Codes for the compilation and visualization examples to the HIF vegetation dataset

High-impedance vegetation fault dataset This repository contains the codes that compile the "Vegetation Conduction Ignition Test Report" data, which a

1 Dec 12, 2021
Automatic deep learning for image classification.

AutoDL AutoDL automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few line

wenqi 2 Oct 12, 2022
[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction Project Page | Paper | Supplementary | Video This reposit

331 Dec 28, 2022
Implementation of the master's thesis "Temporal copying and local hallucination for video inpainting".

Temporal copying and local hallucination for video inpainting This repository contains the implementation of my master's thesis "Temporal copying and

David Álvarez de la Torre 1 Dec 02, 2022
Fbone (Flask bone) is a Flask (Python microframework) starter/template/bootstrap/boilerplate application.

Fbone (Flask bone) is a Flask (Python microframework) starter/template/bootstrap/boilerplate application.

Wilson 1.7k Dec 30, 2022
Supplementary materials to "Spin-optomechanical quantum interface enabled by an ultrasmall mechanical and optical mode volume cavity" by H. Raniwala, S. Krastanov, M. Eichenfield, and D. R. Englund, 2022

Supplementary materials to "Spin-optomechanical quantum interface enabled by an ultrasmall mechanical and optical mode volume cavity" by H. Raniwala,

Stefan Krastanov 1 Jan 17, 2022
Implementation for "Conditional entropy minimization principle for learning domain invariant representation features"

Implementation for "Conditional entropy minimization principle for learning domain invariant representation features". The code is reproduced from thi

1 Nov 02, 2022
A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate

24 Dec 26, 2022
Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

Renato Almeida de Oliveira 18 Aug 31, 2022
PyTorch implementation of UPFlow (unsupervised optical flow learning)

UPFlow: Upsampling Pyramid for Unsupervised Optical Flow Learning By Kunming Luo, Chuan Wang, Shuaicheng Liu, Haoqiang Fan, Jue Wang, Jian Sun Megvii

kunming luo 87 Dec 20, 2022
On the model-based stochastic value gradient for continuous reinforcement learning

On the model-based stochastic value gradient for continuous reinforcement learning This repository is by Brandon Amos, Samuel Stanton, Denis Yarats, a

Facebook Research 46 Dec 15, 2022