Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

Related tags

Deep Learningqpic
Overview

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information

by Masato Tamura, Hiroki Ohashi, and Tomoaki Yoshinaga.

This repository contains the official implementation of the paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information", which is accepted to CVPR2021.

QPIC is implemented by extending the recently proposed object detector, DETR. QPIC leverages the query-based detection and attention mechanism in the transformer, and as a result, achieves high HOI detection performance with simple detection heads.

Example attention maps.

Preparation

Dependencies

Our implementation uses external libraries such as NumPy and PyTorch. You can resolve the dependencies with the following command.

pip install numpy
pip install -r requirements.txt

Note that this command may dump errors during installing pycocotools, but the errors can be ignored.

Dataset

HICO-DET

HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz) to the data directory.

Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.

qpic
 |─ data
 │   └─ hico_20160224_det
 |       |─ annotations
 |       |   |─ trainval_hico.json
 |       |   |─ test_hico.json
 |       |   └─ corre_hico.npy
 :       :

V-COCO

First clone the repository of V-COCO from here, and then follow the instruction to generate the file instances_vcoco_all_2014.json. Next, download the prior file prior.pickle from here. Place the files and make directories as follows.

qpic
 |─ data
 │   └─ v-coco
 |       |─ data
 |       |   |─ instances_vcoco_all_2014.json
 |       |   :
 |       |─ prior.pickle
 |       |─ images
 |       |   |─ train2014
 |       |   |   |─ COCO_train2014_000000000009.jpg
 |       |   |   :
 |       |   └─ val2014
 |       |       |─ COCO_val2014_000000000042.jpg
 |       |       :
 |       |─ annotations
 :       :

For our implementation, the annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.

PYTHONPATH=data/v-coco \
        python convert_vcoco_annotations.py \
        --load_path data/v-coco/data \
        --prior_path data/v-coco/prior.pickle \
        --save_path data/v-coco/annotations

Note that only Python2 can be used for this conversion because vsrl_utils.py in the v-coco repository shows a error with Python3.

V-COCO annotations with the HOIA format, corre_vcoco.npy, test_vcoco.json, and trainval_vcoco.json will be generated to annotations directory.

Pre-trained parameters

Our QPIC have to be pre-trained with the COCO object detection dataset. For the HICO-DET training, this pre-training can be omitted by using the parameters of DETR. The parameters can be downloaded from here for ResNet50, and for ResNet101. For the V-COCO training, this pre-training has to be carried out because some images of the V-COCO evaluation set are contained in the training set of DETR. We excluded the images and pre-trained QPIC for the V-COCO evaluation.

After downloading or pre-training, move the pre-trained parameters to the params directory and convert the parameters with the following command (e.g. downloaded ResNet50 parameters).

python convert_parameters.py \
        --load_path params/detr-r50-e632da11.pth \
        --save_path params/detr-r50-pre.pth

Training

After the preparation, you can start the training with the following command.

For the HICO-DET training.

python main.py \
        --pretrained params/detr-r50-pre.pth \
        --output_dir logs \
        --hoi \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --set_cost_bbox 2.5 \
        --set_cost_giou 1 \
        --bbox_loss_coef 2.5 \
        --giou_loss_coef 1

For the V-COCO training.

python main.py \
        --pretrained params/detr-r50-pre.pth \
        --output_dir logs \
        --hoi \
        --dataset_file vcoco \
        --hoi_path data/v-coco \
        --num_obj_classes 80 \
        --num_verb_classes 29 \
        --backbone resnet50 \
        --set_cost_bbox 2.5 \
        --set_cost_giou 1 \
        --bbox_loss_coef 2.5 \
        --giou_loss_coef 1

If you have multiple GPUs on your machine, you can utilize them to speed up the training. The number of GPUs is specified with the --nproc_per_node option. The following command starts the training with 8 GPUs for the HICO-DET training.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained params/detr-r50-pre.pth \
        --output_dir logs \
        --hoi \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --set_cost_bbox 2.5 \
        --set_cost_giou 1 \
        --bbox_loss_coef 2.5 \
        --giou_loss_coef 1

Evaluation

The evaluation is conducted at the end of each epoch during the training. The results are written in logs/log.txt like below:

"test_mAP": 0.29061250833779456, "test_mAP rare": 0.21910348492395765, "test_mAP non-rare": 0.31197234650036926

test_mAP, test_mAP rare, and test_mAP non-rare are the results of the default full, rare, and non-rare setting, respectively.

For the official evaluation of V-COCO, a pickle file of detection results have to be generated. You can generate the file as follows.

python generate_vcoco_official.py \
        --param_path logs/checkpoint.pth
        --save_path vcoco.pickle
        --hoi_path data/v-coco

Results

HICO-DET.

Full (D) Rare (D) Non-rare (D) Full(KO) Rare (KO) Non-rare (KO)
QPIC (ResNet50) 29.07 21.85 31.23 31.68 24.14 33.93
QPIC (ResNet101) 29.90 23.92 31.69 32.38 26.06 34.27

D: Default, KO: Known object

V-COCO.

Scenario 1 Scenario 2
QPIC (ResNet50) 58.8 61.0
QPIC (ResNet101) 58.3 60.7

Citation

Please consider citing our paper if it helps your research.

@inproceedings{tamura_cvpr2021,
author = {Tamura, Masato and Ohashi, Hiroki and Yoshinaga, Tomoaki},
title = {{QPIC}: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information},
booktitle={CVPR},
year = {2021},
}
Comments
  • Reproduction the result on VCOCO dataset

    Reproduction the result on VCOCO dataset

    Hi, I can reproduce the result on HICO-DET dataset, but just get 51 on V-COCO dataset, could you please provide the log.txt? I am not sure where I am wrong. Thanks.

    opened by SherlockHolmes221 7
  • How to get the file logs/checkpoint.pth?

    How to get the file logs/checkpoint.pth?

    Thanks for your interesting work! Now I have a question about how to get the file logs/checkpoint.pth,I have not see the file from your google drive. Desire to your reply.

    opened by Lunarli 4
  • number of decoder layers impact

    number of decoder layers impact

    Hi, thanks for interesting work. Does the number of decoder layers have a significant impact on performance like in detr? And I cant find how much v100 you've used. maybe 8?

    opened by jihwanp 3
  • Why the pretrained VCOCO model has 81 object classes?

    Why the pretrained VCOCO model has 81 object classes?

    When I tried to evaluate the VCOCO model you provide, I have to set the parameter --num_obj_classes to 81. What is the reason for this setting? And should I set --num_obj_classes 81 during training?

    Thanks for your reply!

    opened by weiyunfei 3
  • How to pretrain detector for VCOCO?

    How to pretrain detector for VCOCO?

    Hi, thanks for your excellent work. I am confused about the pre-training for the V-COCO training. In your README.md, you stated the pretraining has to be carried out for the V-COCO training since some images of the V-COCO evaluation set are contained in the training set of DETR, while the given training command just used the pretrained DETR model without any more pretraining. Should I apply a pretraining on the dataset excluding the V-COCO evaluation images or just follow your training command?

    opened by weiyunfei 3
  • Different AP Results for pre-trained VCOCO models

    Different AP Results for pre-trained VCOCO models

    Hi folks,

    First of all, thank you for sharing this repository. I would like to ask a specific question about the evaluation results of provided pre-trained V-COCO models. I followed the instructions you provided (for constructing annotation files for V-COCO and obtaining pickle files) to get the results. However, comparing with the V-COCO results in the table, I got different average role ap results. For instance, I am providing the output of R-50 QPIC, scenario 1 Role AP result in the attached screenshot. I am wondering possible reasons for the issue, could you please provide assistance about this?

    qpic_resnet50_sc1_roleap

    opened by cancam 2
  • How to convert the pretrained DETR model for V-COCO style?

    How to convert the pretrained DETR model for V-COCO style?

    Hi! Sorry for bothering again. When I tried to train a V-COCO model, I found that the object classifier of the pre-trained DETR model is only 81-way (including background), while in #6 you claimed that the V-COCO model has an 82-way object classifier (including background and a missing category id). So what should I do to convert the classifier parameters from 81 classes into 82 classes?

    opened by weiyunfei 2
  • For V-COCO result reproduction, which evaluation code should I use?

    For V-COCO result reproduction, which evaluation code should I use?

    I tried to reproduce your results on V-COCO, and I encountered some problems. I found that the official evaluation code can only achieve 56.5 for Scenario 1 and 58.6 for Scenario 2 with your pre-trained model, while with the vcoco_eval.py in your project which is in PPDM style, the result is 58.35. However, the evaluation process in vcoco_eval.py seems like Scenario 2 in the official evaluation. Which code should I use to reproduce your experiments on V-COCO? Moreover, if I need to use the vcoco_eval.py, what should I do to get the evaluation results for both scenarios?

    opened by weiyunfei 2
  • V-COCO Evaluation Error

    V-COCO Evaluation Error

    Hello,

    Many thanks for your great work.

    I am trying to evaluate your pre-trained models on V-COCO.

    1. So, I first generate the official detection pickle via:
    
    python generate_vcoco_official.py \
            --param_path ./params/qpic_resnet50_vcoco.pth \
            --save_path ./logs_vcoco/vcoco.pickle \
            --hoi_path ./data/v-coco
    
    1. Later, I use the official evaluation code via the following:
    from vsrl_eval import VCOCOeval
    
    vsrl_annot_file_s='./data/v-coco/data/vcoco/vcoco_val.json'
    split_file_s='./data/v-coco/data/splits/vcoco_val.ids'
    
    coco_file_s='./data/v-coco/data/instances_vcoco_all_2014.json'
    vcocoeval = VCOCOeval(vsrl_annot_file_s, coco_file_s, split_file_s)
    
    file_name= './logs_vcoco/vcoco.pickle'
    vcocoeval._do_eval(file_name, ovr_thresh=0.5)
    

    Please note that I adapted the latter script from VSG-Net repo. I face the following error during evaluation:

    zero-size array to reduction operation minimum which has no identity

    on assert(np.amax(rec) <= 1) within _do_agent_eval() function execution.

    I wonder if this is a common error, and how it can be mitigated?

    Many thanks.

    opened by kilickaya 1
  • Distance AP

    Distance AP

    Hi I have a question about distance-wise ap. image

    Does the hoi instance, which participated in calculating ap, correspond to distance=0.1 when the distance between human & center box is between 0.1 and 0.2?

    Or I would fully appreciate it if you could provide code for calculate distance wise ap. Thanks

    opened by jihwanp 1
  • Some useless steps in focal loss

    Some useless steps in focal loss

    image Thx for your interesting work! The neg_weights is meaningful only in heatmap based problem, e.g., Gaussian based detection. Removing those steps may avoid some misunderstanding.

    opened by YueLiao 1
  • a bug of hico dataset file?

    a bug of hico dataset file?

    Firstly,thank for your nice work?but i find small bug here https://github.com/hitachi-rd-cv/qpic/blob/main/datasets/hico.py#L62 The number of HOI turples(len(img_anno['hoi_annotation'])) should not exceed the number of query(self.num_queries) but the number of boxes(len(img_anno['annotations'])) should not exceed the number of query(self.num_queries)? Is that right? Waiting for your thought,thank you very much!!

    opened by jojolee123 0
  • How to convert .onnx model from qpic.pth

    How to convert .onnx model from qpic.pth

    I am facing some issue; onnx conversion of this model. I tried

    torch.onnx.export(pth_model, dummy_input, "onnx_model.onnx", opset_version 11) dummy_input shape is [1, 3, 720, 1280] # same with this model.

    However, the result is [array([[[nan, nan, nan], [nan, nan, nan]]], dtype=float32), array([[[nan, nan], [nan, nan]]], dtype=float32), array([[[nan, nan, nan, nan], [nan, nan, nan, nan]]], dtype=float32), array([[[nan, nan, nan, nan], [nan, nan, nan, nan]]], dtype=float32)]

    It's nan party!

    please commet how to fix it.

    opened by bj-noh 0
  • Is your architecture end-to-end HOI?

    Is your architecture end-to-end HOI?

    Hello Thanks for your implementation. Is your architecture end-to-end HOI? Does it mean that it does not require any features and feature extraction? For example, I can feed my features to your network?

    opened by mansooreh1 0
  • The influence from aux_loss

    The influence from aux_loss

    Hi authors,

    thanks for your implementation which helps my research a lot. In the supplementary materials of your paper you stated that the auxiliary loss is used following DETR. What will happen if it isn't used for HOI training? I wonder if there's any corresponding experiment on QPIC.

    opened by hwfan 0
  • custom dataset implementation

    custom dataset implementation

    Hi, Thank you so much for this awesome repo. I am currently trying a custom dataset implementation using qpic. The dataset's annotation file is in coco format. The convert_vcoco_annotations.py script converts vcoco to HOIA format. But I am trying to convert a coco format annotation to HOIA by manually adding the interactions. In this process, I am unable to understand a few things, in HOIA format : what does {subject_id , category_id, object_id} mean in "hoi_annotations" dict key, what does {category_id} mean in "annotations" dict key"

    opened by dharneeshkar004 0
A Self-Supervised Contrastive Learning Framework for Aspect Detection

AspDecSSCL A Self-Supervised Contrastive Learning Framework for Aspect Detection This repository is a pytorch implementation for the following AAAI'21

Tian Shi 30 Dec 28, 2022
TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

912 Jan 08, 2023
Set of methods to ensemble boxes from different object detection models, including implementation of "Weighted boxes fusion (WBF)" method.

Set of methods to ensemble boxes from different object detection models, including implementation of "Weighted boxes fusion (WBF)" method.

1.4k Jan 05, 2023
An implementation of "Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport"

Optex An implementation of Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport for TU Delft CS4240. You c

Hans Brouwer 33 Jan 05, 2023
Official implementation for “Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior”

Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior. The code will release soon. Implementation Python3 PyTorch=1.0 NVIDIA GPU+

FengZhang 34 Dec 04, 2022
Code for Universal Semi-Supervised Semantic Segmentation models paper accepted in ICCV 2019

USSS_ICCV19 Code for Universal Semi Supervised Semantic Segmentation accepted to ICCV 2019. Full Paper available at https://arxiv.org/abs/1811.10323.

Tarun K 68 Nov 24, 2022
Improving Calibration for Long-Tailed Recognition (CVPR2021)

MiSLAS Improving Calibration for Long-Tailed Recognition Authors: Zhisheng Zhong, Jiequan Cui, Shu Liu, Jiaya Jia [arXiv] [slide] [BibTeX] Introductio

DV Lab 116 Dec 20, 2022
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation mode

Aiden Nibali 36 Oct 30, 2022
QRec: A Python Framework for quick implementation of recommender systems (TensorFlow Based)

Introduction QRec is a Python framework for recommender systems (Supported by Python 3.7.4 and Tensorflow 1.14+) in which a number of influential and

Yu 1.4k Jan 01, 2023
Implementation of Continuous Sparsification, a method for pruning and ticket search in deep networks

Continuous Sparsification Implementation of Continuous Sparsification (CS), a method based on l_0 regularization to find sparse neural networks, propo

Pedro Savarese 23 Dec 07, 2022
Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts

DataSelection-NMT Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts Quick update: The paper got accepted o

Javad Pourmostafa 6 Jan 07, 2023
This is a simple plugin for Vim that allows you to use OpenAI Codex.

🤖 Vim Codex An AI plugin that does the work for you. This is a simple plugin for Vim that will allow you to use OpenAI Codex. To use this plugin you

Tom Dörr 195 Dec 28, 2022
Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation in PyTorch

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Ima

Xuanchi Ren 86 Dec 07, 2022
PyTorch implementation for the ICLR 2020 paper "Understanding the Limitations of Variational Mutual Information Estimators"

Smoothed Mutual Information ``Lower Bound'' Estimator PyTorch implementation for the ICLR 2020 paper Understanding the Limitations of Variational Mutu

50 Nov 09, 2022
Julia package for multiway (inverse) covariance estimation.

TensorGraphicalModels TensorGraphicalModels.jl is a suite of Julia tools for estimating high-dimensional multiway (tensor-variate) covariance and inve

Wayne Wang 3 Sep 23, 2022
GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification

GalaXC GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification @InProceedings{Saini21, author = {Saini, D. and Jain,

Extreme Classification 28 Dec 05, 2022
Implementation for the IJCAI2021 work "Beyond the Spectrum: Detecting Deepfakes via Re-synthesis"

Beyond the Spectrum Implementation for the IJCAI2021 work "Beyond the Spectrum: Detecting Deepfakes via Re-synthesis" by Yang He, Ning Yu, Margret Keu

Yang He 27 Jan 07, 2023
Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

yzf 1 Jun 12, 2022
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers Results results on COCO val Backbone Method Lr Schd PQ Config Download

155 Dec 20, 2022
这是一个yolox-keras的源码,可以用于训练自己的模型。

YOLOX:You Only Look Once目标检测模型在Keras当中的实现 目录 性能情况 Performance 实现的内容 Achievement 所需环境 Environment 小技巧的设置 TricksSet 文件下载 Download 训练步骤 How2train 预测步骤 Ho

Bubbliiiing 64 Nov 10, 2022