QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

Last update: Dec 29, 2022

Related tags

Deep Learning QAHOI

Overview

QAHOI

QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

Requirements

PyTorch >= 1.5.1
torchvision >= 0.6.1

pip install -r requirements.txt

Compiling CUDA operators

cd ./models/ops
sh ./make.sh
# test
python test.py

Dataset Preparation

Please follow the HICO-DET dataset preparation of GGNet.

After preparation, the data folder as follows:

data
├── hico_20160224_det
|   ├── images
|   |   ├── test2015
|   |   └── train2015
|   └── annotations
|       ├── anno_list.json
|       ├── corre_hico.npy
|       ├── file_name_to_obj_cat.json
|       ├── hoi_id_to_num.json
|       ├── hoi_list_new.json
|       ├── test_hico.json
|       └── trainval_hico.json

Evaluation

Download the model to params folder.

We test the model with NVIDIA A6000 GPU, Pytorch 1.9.0, Python 3.8 and CUDA 11.2.

Model	Full (def)	Rare (def)	None-Rare (def)	Full (ko)	Rare (ko)	None-Rare (ko)	Download
Swin-Tiny	28.47	22.44	30.27	30.99	24.83	32.84	model
Swin-Base*+	33.58	25.86	35.88	35.34	27.24	37.76	model
Swin-Large*+	35.78	29.80	37.56	37.59	31.36	39.36	model

Evaluating the model by running the following command.

--eval_extra to evaluate the spatio contribution.

mAP_default.json and mAP_ko.json will save in current folder.

Swin-Tiny

python main.py --resume params/QAHOI_swin_tiny_mul3.pth --backbone swin_tiny --num_feature_levels 3 --use_nms --eval

Swin-Base*+

python main.py --resume params/QAHOI_swin_base_384_22k_mul3.pth --backbone swin_base_384 --num_feature_levels 3 --use_nms --eval

Swin-Large*+

python main.py --resume params/QAHOI_swin_large_384_22k_mul3.pth --backbone swin_large_384 --num_feature_levels 3 --use_nms --eval

Training

Download the pre-trained swin-tiny model from Swin-Transformer to params folder.

Training QAHOI with Swin-Tiny from scratch.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_tiny \
        --pretrained params/swin_tiny_patch4_window7_224.pth \
        --output_dir logs/swin_tiny_mul3 \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --use_nms

Training QAHOI with Swin-Base*+ from scratch.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_base_384 \
        --pretrained params/swin_base_patch4_window7_224_22k.pth \
        --output_dir logs/swin_base_384_22k_mul3 \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --use_nms

Training QAHOI with Swin-Large*+ from scratch.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_large_384 \
        --pretrained params/swin_large_patch4_window12_384_22k.pth \
        --output_dir logs/swin_large_384_22k_mul3 \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --use_nms

Citation

@article{cjw,
  title={QAHOI: Query-Based Anchors for Human-Object Interaction Detection},
  author={Junwen Chen and Keiji Yanai},
  journal={arXiv preprint arXiv:2112.08647},
  year={2021}
}

QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

Related tags

Overview

QAHOI

Requirements

Dataset Preparation

Evaluation

Training

Citation

Owner

NeRF visualization library under construction

Code for EMNLP2021 paper "Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training"

PyTorch-based framework for Deep Hedging

Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning"

A plug-and-play library for neural networks written in Python

[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.

Project page of the paper 'Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual Super-resolution Network' (ECCVW 2018)

Title: Graduate-Admissions-Predictor

MASS (Mueen's Algorithm for Similarity Search) - a python 2 and 3 compatible library used for searching time series sub-sequences under z-normalized Euclidean distance for similarity.

Python module providing a framework to trace individual edges in an image using Gaussian process regression.

CTC segmentation python package

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

A nutritional label for food for thought.

Group Fisher Pruning for Practical Network Compression(ICML2021)

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

GemNet model in PyTorch, as proposed in "GemNet: Universal Directional Graph Neural Networks for Molecules" (NeurIPS 2021)

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

StorSeismic: An approach to pre-train a neural network to store seismic data features