PyTorch implementation of MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

Last update: Jan 02, 2023

Related tags

Overview

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

This is a PyTorch implementation of the MoCo paper:

@Article{he2019moco,
  author  = {Kaiming He and Haoqi Fan and Yuxin Wu and Saining Xie and Ross Girshick},
  title   = {Momentum Contrast for Unsupervised Visual Representation Learning},
  journal = {arXiv preprint arXiv:1911.05722},
  year    = {2019},
}

It also includes the implementation of the MoCo v2 paper:

@Article{chen2020mocov2,
  author  = {Xinlei Chen and Haoqi Fan and Ross Girshick and Kaiming He},
  title   = {Improved Baselines with Momentum Contrastive Learning},
  journal = {arXiv preprint arXiv:2003.04297},
  year    = {2020},
}

Preparation

Install PyTorch and ImageNet dataset following the official PyTorch ImageNet training code.

This repo aims to be minimal modifications on that code. Check the modifications by:

diff main_moco.py <(curl https://raw.githubusercontent.com/pytorch/examples/master/imagenet/main.py)
diff main_lincls.py <(curl https://raw.githubusercontent.com/pytorch/examples/master/imagenet/main.py)

Unsupervised Training

This implementation only supports multi-gpu, DistributedDataParallel training, which is faster and simpler; single-gpu or DataParallel training is not supported.

To do unsupervised pre-training of a ResNet-50 model on ImageNet in an 8-gpu machine, run:

python main_moco.py \
  -a resnet50 \
  --lr 0.03 \
  --batch-size 256 \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  [your imagenet-folder with train and val folders]

This script uses all the default hyper-parameters as described in the MoCo v1 paper. To run MoCo v2, set --mlp --moco-t 0.2 --aug-plus --cos.

Note: for 4-gpu training, we recommend following the linear lr scaling recipe: --lr 0.015 --batch-size 128 with 4 gpus. We got similar results using this setting.

Linear Classification

With a pre-trained model, to train a supervised linear classifier on frozen features/weights in an 8-gpu machine, run:

python main_lincls.py \
  -a resnet50 \
  --lr 30.0 \
  --batch-size 256 \
  --pretrained [your checkpoint path]/checkpoint_0199.pth.tar \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  [your imagenet-folder with train and val folders]

Linear classification results on ImageNet using this repo with 8 NVIDIA V100 GPUs :

	pre-train epochs	pre-train time	MoCo v1 top-1 acc.	MoCo v2 top-1 acc.
ResNet-50	200	53 hours	60.8±0.2	67.5±0.1

Here we run 5 trials (of pre-training and linear classification) and report mean±std: the 5 results of MoCo v1 are {60.6, 60.6, 60.7, 60.9, 61.1}, and of MoCo v2 are {67.7, 67.6, 67.4, 67.6, 67.3}.

Models

Our pre-trained ResNet-50 models can be downloaded as following:

	epochs	mlp	aug+	cos	top-1 acc.	model	md5
MoCo v1	200				60.6	download	`b251726a`
MoCo v2	200	✓	✓	✓	67.7	download	`59fd9945`
MoCo v2	800	✓	✓	✓	71.1	download	`a04e12f8`

Transferring to Object Detection

See ./detection.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

PyTorch implementation of MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

Related tags

Overview

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

Preparation

Unsupervised Training

Linear Classification

Models

Transferring to Object Detection

License

See Also

Owner

Meta Research

Implementation of the paper titled "Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees"

A deep learning library that makes face recognition efficient and effective

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Semantic Segmentation.

Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV

Official repository for the ISBI 2021 paper Transformer Assisted Convolutional Neural Network for Cell Instance Segmentation

Code for Learning Manifold Patch-Based Representations of Man-Made Shapes, in ICLR 2021.

Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth [Paper]

Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

Unofficial PyTorch implementation of Guided Dropout

This project aim to create multi-label classification annotation tool to boost annotation speed and make it more easier.

Deep Learning pipeline for motor-imagery classification.

This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

Code for generating a single image pretraining dataset

Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation in TensorFlow 2