Simple Baselines for Human Pose Estimation and Tracking

Overview

Simple Baselines for Human Pose Estimation and Tracking

News

Introduction

This is an official pytorch implementation of Simple Baselines for Human Pose Estimation and Tracking. This work provides baseline methods that are surprisingly simple and effective, thus helpful for inspiring and evaluating new ideas for the field. State-of-the-art results are achieved on challenging benchmarks. On COCO keypoints valid dataset, our best single model achieves 74.3 of mAP. You can reproduce our results using this repo. All models are provided for research purpose.

Main Results

Results on MPII val

Arch Head Shoulder Elbow Wrist Hip Knee Ankle Mean [email protected]
256x256_pose_resnet_50_d256d256d256 96.351 95.329 88.989 83.176 88.420 83.960 79.594 88.532 33.911
384x384_pose_resnet_50_d256d256d256 96.658 95.754 89.790 84.614 88.523 84.666 79.287 89.066 38.046
256x256_pose_resnet_101_d256d256d256 96.862 95.873 89.518 84.376 88.437 84.486 80.703 89.131 34.020
384x384_pose_resnet_101_d256d256d256 96.965 95.907 90.268 85.780 89.597 85.935 82.098 90.003 38.860
256x256_pose_resnet_152_d256d256d256 97.033 95.941 90.046 84.976 89.164 85.311 81.271 89.620 35.025
384x384_pose_resnet_152_d256d256d256 96.794 95.618 90.080 86.225 89.700 86.862 82.853 90.200 39.433

Note:

  • Flip test is used.

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
256x192_pose_resnet_50_d256d256d256 0.704 0.886 0.783 0.671 0.772 0.763 0.929 0.834 0.721 0.824
384x288_pose_resnet_50_d256d256d256 0.722 0.893 0.789 0.681 0.797 0.776 0.932 0.838 0.728 0.846
256x192_pose_resnet_101_d256d256d256 0.714 0.893 0.793 0.681 0.781 0.771 0.934 0.840 0.730 0.832
384x288_pose_resnet_101_d256d256d256 0.736 0.896 0.803 0.699 0.811 0.791 0.936 0.851 0.745 0.858
256x192_pose_resnet_152_d256d256d256 0.720 0.893 0.798 0.687 0.789 0.778 0.934 0.846 0.736 0.839
384x288_pose_resnet_152_d256d256d256 0.743 0.896 0.811 0.705 0.816 0.797 0.937 0.858 0.751 0.863

Results on Caffe-style ResNet

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
256x192_pose_resnet_50_caffe_d256d256d256 0.704 0.914 0.782 0.677 0.744 0.735 0.921 0.805 0.704 0.783
256x192_pose_resnet_101_caffe_d256d256d256 0.720 0.915 0.803 0.693 0.764 0.753 0.928 0.821 0.720 0.802
256x192_pose_resnet_152_caffe_d256d256d256 0.728 0.925 0.804 0.702 0.766 0.760 0.931 0.828 0.729 0.806

Note:

  • Flip test is used.
  • Person detector has person AP of 56.4 on COCO val2017 dataset.
  • Difference between PyTorch-style and Caffe-style ResNet is the position of stride=2 convolution

Environment

The code is developed using python 3.6 on Ubuntu 16.04. NVIDIA GPUs are needed. The code is developed and tested using 4 NVIDIA P100 GPU cards. Other platforms or GPU cards are not fully tested.

Quick start

Installation

  1. Install pytorch >= v0.4.0 following official instruction.

  2. Disable cudnn for batch_norm:

    # PYTORCH=/path/to/pytorch
    # for pytorch v0.4.0
    sed -i "1194s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py
    # for pytorch v0.4.1
    sed -i "1254s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py
    

    Note that instructions like # PYTORCH=/path/to/pytorch indicate that you should pick a path where you'd like to have pytorch installed and then set an environment variable (PYTORCH in this case) accordingly.

  3. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.

  4. Install dependencies:

    pip install -r requirements.txt
    
  5. Make libs:

    cd ${POSE_ROOT}/lib
    make
    
  6. Install COCOAPI:

    # COCOAPI=/path/to/clone/cocoapi
    git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
    cd $COCOAPI/PythonAPI
    # Install into global site-packages
    make install
    # Alternatively, if you do not have permissions or prefer
    # not to install the COCO API into global site-packages
    python3 setup.py install --user
    

    Note that instructions like # COCOAPI=/path/to/install/cocoapi indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (COCOAPI in this case) accordingly.

  7. Download pytorch imagenet pretrained models from pytorch model zoo and caffe-style pretrained models from GoogleDrive.

  8. Download mpii and coco pretrained models from OneDrive or GoogleDrive. Please download them under ${POSE_ROOT}/models/pytorch, and make them look like this:

    ${POSE_ROOT}
     `-- models
         `-- pytorch
             |-- imagenet
             |   |-- resnet50-19c8e357.pth
             |   |-- resnet50-caffe.pth.tar
             |   |-- resnet101-5d3b4d8f.pth
             |   |-- resnet101-caffe.pth.tar
             |   |-- resnet152-b121ed2d.pth
             |   `-- resnet152-caffe.pth.tar
             |-- pose_coco
             |   |-- pose_resnet_101_256x192.pth.tar
             |   |-- pose_resnet_101_384x288.pth.tar
             |   |-- pose_resnet_152_256x192.pth.tar
             |   |-- pose_resnet_152_384x288.pth.tar
             |   |-- pose_resnet_50_256x192.pth.tar
             |   `-- pose_resnet_50_384x288.pth.tar
             `-- pose_mpii
                 |-- pose_resnet_101_256x256.pth.tar
                 |-- pose_resnet_101_384x384.pth.tar
                 |-- pose_resnet_152_256x256.pth.tar
                 |-- pose_resnet_152_384x384.pth.tar
                 |-- pose_resnet_50_256x256.pth.tar
                 `-- pose_resnet_50_384x384.pth.tar
    
    
  9. Init output(training model output directory) and log(tensorboard log directory) directory:

    mkdir output 
    mkdir log
    

    Your directory tree should look like this:

    ${POSE_ROOT}
    ├── data
    ├── experiments
    ├── lib
    ├── log
    ├── models
    ├── output
    ├── pose_estimation
    ├── README.md
    └── requirements.txt
    

Data preparation

For MPII data, please download from MPII Human Pose Dataset. The original annotation files are in matlab format. We have converted them into json format, you also need to download them from OneDrive or GoogleDrive. Extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- mpii
    `-- |-- annot
        |   |-- gt_valid.mat
        |   |-- test.json
        |   |-- train.json
        |   |-- trainval.json
        |   `-- valid.json
        `-- images
            |-- 000001163.jpg
            |-- 000003072.jpg

For COCO data, please download from COCO download, 2017 Train/Val is needed for COCO keypoints training and validation. We also provide person detection result of COCO val2017 to reproduce our multi-person pose estimation results. Please download from OneDrive or GoogleDrive. Download and extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        |-- person_detection_results
        |   |-- COCO_val2017_detections_AP_H_56_person.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- 000000000030.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- 000000000632.jpg
                |-- ... 

Valid on MPII using pretrained models

python pose_estimation/valid.py \
    --cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml \
    --flip-test \
    --model-file models/pytorch/pose_mpii/pose_resnet_50_256x256.pth.tar

Training on MPII

python pose_estimation/train.py \
    --cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml

Valid on COCO val2017 using pretrained models

python pose_estimation/valid.py \
    --cfg experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml \
    --flip-test \
    --model-file models/pytorch/pose_coco/pose_resnet_50_256x192.pth.tar

Training on COCO train2017

python pose_estimation/train.py \
    --cfg experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml

Other Implementations

Citation

If you use our code or models in your research, please cite with:

@inproceedings{xiao2018simple,
    author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
    title={Simple Baselines for Human Pose Estimation and Tracking},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year = {2018}
}
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images

Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images [ICCV 2021] © Mahmood Lab - This code is made avail

Mahmood Lab @ Harvard/BWH 63 Dec 01, 2022
Freecodecamp Scientific Computing with Python Certification; Solution for Challenge 2: Time Calculator

Assignment Write a function named add_time that takes in two required parameters and one optional parameter: a start time in the 12-hour clock format

Hellen Namulinda 0 Feb 26, 2022
Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

SEDE SEDE (Stack Exchange Data Explorer) is new dataset for Text-to-SQL tasks with more than 12,000 SQL queries and their natural language description

Rupert. 83 Nov 11, 2022
NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

100 Sep 28, 2022
Convert Mission Planner (ArduCopter) Waypoint Missions to Litchi CSV Format to execute on DJI Drones

Mission Planner to Litchi Convert Mission Planner (ArduCopter) Waypoint Surveys to Litchi CSV Format to execute on DJI Drones Litchi doesn't support S

Yaros 24 Dec 09, 2022
Code for our ALiBi method for transformer language models.

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation This repository contains the code and models for our paper Tra

Ofir Press 211 Dec 31, 2022
An open framework for Federated Learning.

Welcome to Intel® Open Federated Learning Federated learning is a distributed machine learning approach that enables organizations to collaborate on m

Intel Corporation 397 Dec 27, 2022
Multi Task RL Baselines

MTRL Multi Task RL Algorithms Contents Introduction Setup Usage Documentation Contributing to MTRL Community Acknowledgements Introduction M

Facebook Research 171 Jan 09, 2023
QKeras: a quantization deep learning library for Tensorflow Keras

QKeras github.com/google/qkeras QKeras 0.8 highlights: Automatic quantization using QKeras; Stochastic behavior (including stochastic rouding) is disa

Google 437 Jan 03, 2023
PyTorch implementation of Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose

Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose Release Notes The official PyTorch implementation of Neural View S

Angtian Wang 20 Oct 09, 2022
Space Ship Simulator using python

FlyOver Basic space-ship simulator using python How to run? Just double click run.py What modules do i need? All modules that i currently using is bui

0 Oct 09, 2022
Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN)

Flickr-Faces-HQ Dataset (FFHQ) Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative

NVIDIA Research Projects 2.9k Dec 28, 2022
Official PyTorch code for WACV 2022 paper "CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows"

CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows WACV 2022 preprint:https://arxiv.org/abs/2107.1

Denis 156 Dec 28, 2022
Auto-Lama combines object detection and image inpainting to automate object removals

Auto-Lama Auto-Lama combines object detection and image inpainting to automate object removals. It is build on top of DE:TR from Facebook Research and

44 Dec 09, 2022
Detectron2 is FAIR's next-generation platform for object detection and segmentation.

Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up r

Facebook Research 23.3k Jan 08, 2023
Have you ever wondered how cool it would be to have your own A.I

Have you ever wondered how cool it would be to have your own A.I. assistant Imagine how easier it would be to send emails without typing a single word, doing Wikipedia searches without opening web br

Harsh Gupta 1 Nov 09, 2021
VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations 3D-aware Image Synthesis via Learning Structural and Textura

GenForce: May Generative Force Be with You 116 Dec 26, 2022
Training BERT with Compute/Time (Academic) Budget

Training BERT with Compute/Time (Academic) Budget This repository contains scripts for pre-training and finetuning BERT-like models with limited time

Intel Labs 263 Jan 07, 2023
Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems

Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems This is our experimental code for RecSys 2021 paper "Learning

11 Jul 28, 2022
TensorFlow Implementation of Unsupervised Cross-Domain Image Generation

Domain Transfer Network (DTN) TensorFlow implementation of Unsupervised Cross-Domain Image Generation. Requirements Python 2.7 TensorFlow 0.12 Pickle

Yunjey Choi 864 Dec 30, 2022