Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

Last update: Dec 24, 2022

Overview

SimplePose

Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, accepted by AAAI-2020.

Also this repo serves as the Part B of our paper "Multi-Person Pose Estimation Based on Gaussian Response Heatmaps" (under review). The Part A is available at this link.

Update

A faster project is to be released.

Introduction

A bottom-up approach for the problem of multi-person pose estimation.

Training
Evaluation
Demo

Project Features

Implement the models using Pytorch in auto mixed-precision (using Nvidia Apex).
Support training on multiple GPUs (over 90% GPU usage rate on each GPU card).
Fast data preparing and augmentation during training (generating about 40 samples per second on signle CPU process and much more if wrapped by DataLoader Class).
Focal L2 loss.
Multi-scale supervision.
This project can also serve as a detailed practice to the green hand in Pytorch.

Prepare

Install packages:

Python=3.6, Pytorch>1.0, Nvidia Apex and other packages needed.
Download the COCO dataset.
Download the pre-trained models (default configuration: download the pretrained model snapshotted at epoch 52 provided as follow).

Download Link: BaiduCloud

Alternatively, download the pre-trained model without optimizer checkpoint only for the default configuration via GoogleDrive
Change the paths in the code according to your environment.

Run a Demo

python demo_image.py

Inference Speed

The speed of our system is tested on the MS-COCO test-dev dataset.

Inference speed of our 4-stage IMHN with 512 × 512 input on one 2080TI GPU: 38.5 FPS (100% GPU-Util).
Processing speed of the keypoint assignment algorithm part that is implemented in pure Python and a single process on Intel Xeon E5-2620 CPU: 5.2 FPS (has not been well accelerated).

Evaluation Steps

The corresponding code is in pure python without multiprocess for now.

python evaluate.py

Results on MSCOCO 2017 test-dev subset (focal L2 loss with gamma=2):

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.685
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.867
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.749
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.664
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.719
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.728
 Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.892
 Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.782
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.688
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.784

Training Steps

Before training, prepare the training data using ''SimplePose/data/coco_masks_hdf5.py''.

Multiple GUPs are recommended to use to speed up the training process, but we support different training options.

Most code has been provided already, you can train the model with.
1. 'train.py': single training process on one GPU only.
2. 'train_parallel.py': signle training process on multiple GPUs using Dataparallel.
3. 'train_distributed.py' (recommended): multiple training processes on multiple GPUs using Distributed Training:

python -m torch.distributed.launch --nproc_per_node=4 train_distributed.py

Note: The loss_model_parrel.py is for train.py and train_parallel.py, while the loss_model.py is for train_distributed.py and train_distributed_SWA.py. They are different in dividing the batch size. Please refer to the code about the different choices.

For distributed training, the real batch_size = batch_size_in_config* × GPU_Num (world_size actually). For others, the real batch_size = batch_size_in_config*. The differences come from the different mechanisms of data parallel training and distributed training.

Referred Repositories (mainly)

Recommend Repositories

Faster Version: Chun-Ming Su has rebuilt and improved the post-processing speed of this repo using C++, and the improved system can run up to 7~8 FPS using a single scale with flipping on a 2080 TI GPU. Many thanks to Chun-Ming Su.

Citation

Please kindly cite this paper in your publications if it helps your research.

@inproceedings{li2020simple,
  title={Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation.},
  author={Li, Jia and Su, Wen and Wang, Zengfu},
  booktitle={AAAI},
  pages={11354--11361},
  year={2020}
}

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

Related tags

Overview

SimplePose

Introduction

Contents

Project Features

Prepare

Run a Demo

Inference Speed

Evaluation Steps

Training Steps

Referred Repositories (mainly)

Recommend Repositories

Citation

Owner

Jia Li

Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

AAAI 2022 paper - Unifying Model Explainability and Robustness for Joint Text Classification and Rationale Extraction

NeurIPS 2021, "Fine Samples for Learning with Noisy Labels"

NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.

Dist2Dec: A Simplicial Neural Network for Homology Localization

Code for ICCV2021 paper PARE: Part Attention Regressor for 3D Human Body Estimation

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队

[Link]mareteutral - pars tradg wth M []

Tensorflow Implementation of the paper "Spectral Normalization for Generative Adversarial Networks" (ICML 2017 workshop)

Multispectral Object Detection with Yolov5

Official PyTorch Implementation of Convolutional Hough Matching Networks, CVPR 2021 (oral)

Official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION.

Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

💊 A 3D Generative Model for Structure-Based Drug Design (NeurIPS 2021)

Stochastic Scene-Aware Motion Prediction

An abstraction layer for mathematical optimization solvers.

Prososdy Morph: A python library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

Related tags

Overview

SimplePose

Introduction

Contents

Project Features

Prepare

Run a Demo

Inference Speed

Evaluation Steps

Training Steps

Referred Repositories (mainly)

Recommend Repositories

Citation

Owner

Jia Li

Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

AAAI 2022 paper - Unifying Model Explainability and Robustness for Joint Text Classification and Rationale Extraction

NeurIPS 2021, "Fine Samples for Learning with Noisy Labels"

NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.

Dist2Dec: A Simplicial Neural Network for Homology Localization

Code for ICCV2021 paper PARE: Part Attention Regressor for 3D Human Body Estimation

2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

[Link]mareteutral - pars tradg wth M []

Tensorflow Implementation of the paper "Spectral Normalization for Generative Adversarial Networks" (ICML 2017 workshop)

Multispectral Object Detection with Yolov5

Official PyTorch Implementation of Convolutional Hough Matching Networks, CVPR 2021 (oral)

Official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION.

Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

💊 A 3D Generative Model for Structure-Based Drug Design (NeurIPS 2021)

Stochastic Scene-Aware Motion Prediction

An abstraction layer for mathematical optimization solvers.

Prososdy Morph: A python library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队