Replication of Pix2Seq with Pretrained Model

Last update: Nov 22, 2022

Related tags

Overview

Pretrained-Pix2Seq

We provide the pre-trained model of Pix2Seq. This version contains new data augmentation. The model is trained for 300 epochs and can acheive 37 mAP without beam search or neucles search.

Installation

Install PyTorch 1.5+ and torchvision 0.6+ (recommend torch1.8.1 torchvision 0.8.0)

Install pycocotools (for evaluation on COCO):

pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

That's it, should be good to train and evaluate detection models.

Data preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Training

First link coco dataset to the project folder

ln -s /path/to/coco ./coco

Training

sh train.sh --model pix2seq --output_dir /path/to/save

Evaluation

sh train.sh --model pix2seq --output_dir /path/to/save --resume /path/to/checkpoints --eval

COCO

Method	backbone	Epoch	Batch Size	AP	AP50	AP75	Weights
Pix2Seq	R50	300	32	37.0	53.4	39.4	weight

Contributor

Qiu Han, Peng Gao, Jingqiu Zhou(Beam Search)

Acknowledegement

Pix2Seq, DETR

Replication of Pix2Seq with Pretrained Model

Related tags

Overview

Pretrained-Pix2Seq

Installation

Data preparation

Training

COCO

Contributor

Acknowledegement

Owner

peng gao

Code for: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021

Pytorch implementation of the paper Improving Text-to-Image Synthesis Using Contrastive Learning

Surrogate- and Invariance-Boosted Contrastive Learning (SIB-CL)

Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"

The Pytorch code of "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification", CVPR 2022 (Oral).

Paper list of log-based anomaly detection

Qlib is an AI-oriented quantitative investment platform

GenGNN: A Generic FPGA Framework for Graph Neural Network Acceleration

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

FinEAS: Financial Embedding Analysis of Sentiment 📈

a curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries.

Source code for our EMNLP'21 paper 《Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning》

Differentiable Quantum Chemistry (only Differentiable Density Functional Theory and Hartree Fock at the moment)

Official Pytorch implementation for video neural representation (NeRV)

Simple, efficient and flexible vision toolbox for mxnet framework.

This repo implements a 3D segmentation task for an airport baggage dataset.

Object detection evaluation metrics using Python.

Paaster is a secure by default end-to-end encrypted pastebin built with the objective of simplicity.

DFM: A Performance Baseline for Deep Feature Matching