PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Overview

PICK-PyTorch

***** Updated on Feb 6th, 2021: Train Ticket dataset is now available for academic research. You can download from Google Drive or OneDrive. It contains 1,530 synthetic images and 320 real images for training, and 80 real images for testing. Please refer to our paper for more details about how to sample training/testing set from EATEN and generate the corresponding annotations.*****

***** Updated on Sep 17th, 2020: A training example on the large-scale document understanding dataset, DocBank, is now available. Please refer to examples/DocBank/README.md for more details. Thanks TengQi Ye for this contribution.*****

PyTorch reimplementation of "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020). This project is different from our original implementation.

Introduction

PICK is a framework that is effective and robust in handling complex documents layout for Key Information Extraction (KIE) by combining graph learning with graph convolution operation, yielding a richer semantic representation containing the textual and visual features and global layout without ambiguity. Overall architecture shown follows.

Overall

Requirements

  • python = 3.6
  • torchvision = 0.6.1
  • tabulate = 0.8.7
  • overrides = 3.0.0
  • opencv_python = 4.3.0.36
  • numpy = 1.16.4
  • pandas = 1.0.5
  • allennlp = 1.0.0
  • torchtext = 0.6.0
  • tqdm = 4.47.0
  • torch = 1.5.1
pip install -r requirements.txt

Usage

Distributed training with config files

Modify the configurations in config.json and dist_train.sh files, then run:

bash dist_train.sh

The application will be launched via launch.py on a 4 GPU node with one process per GPU (recommend).

This is equivalent to

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -c config.json -d 1,2,3,4 --local_world_size 4

and is equivalent to specify indices of available GPUs by CUDA_VISIBLE_DEVICES instead of -d args

CUDA_VISIBLE_DEVICES=1,2,3,4 python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -c config.json --local_world_size 4

Similarly, it can be launched with a single process that spans all 4 GPUs (if node has 4 available GPUs) using (don't recommend):

CUDA_VISIBLE_DEVICES=1,2,3,4 python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -c config.json --local_world_size 1

Using Multiple Node

You can enable multi-node multi-GPU training by setting nnodes and node_rank args of the commandline line on every node. e.g., 2 nodes 4 gpus run as follows

Node 1, ip: 192.168.0.10, then run on node 1 as follows

CUDA_VISIBLE_DEVICES=1,2,3,4 python -m torch.distributed.launch --nnodes=2 --node_rank=0 --nproc_per_node=4 \
--master_addr=192.168.0.10 --master_port=5555 \
train.py -c config.json --local_world_size 4  

Node 2, ip: 192.168.0.15, then run on node 2 as follows

CUDA_VISIBLE_DEVICES=2,4,6,7 python -m torch.distributed.launch --nnodes=2 --node_rank=1 --nproc_per_node=4 \
--master_addr=192.168.0.10 --master_port=5555 \
train.py -c config.json --local_world_size 4  

Resuming from checkpoints

You can resume from a previously saved checkpoint by:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -d 1,2,3,4 --local_world_size 4 --resume path/to/checkpoint

Debug mode on one GPU/CPU training with config files

This option of training mode can debug code without distributed way. -dist must set to false to turn off distributed mode. -d specify which one gpu will be used.

python train.py -c config.json -d 1 -dist false

Testing from checkpoints

You can test from a previously saved checkpoint by:

python test.py --checkpoint path/to/checkpoint --boxes_transcripts path/to/boxes_transcripts \
               --images_path path/to/images_path --output_folder path/to/output_folder \
               --gpu 0 --batch_size 2

Customization

Training custom datasets

You can train your own datasets following the steps outlined below.

  1. Prepare the correct format of files as provided in data folder.
    • Please see data/README.md an instruction how to prepare the data in required format for PICK.
  2. Modify train_dataset and validation_dataset args in config.json file, including files_name, images_folder, boxes_and_transcripts_folder, entities_folder, iob_tagging_type and resized_image_size.
  3. Modify Entities_list in utils/entities_list.py file according to the entity type of your dataset.
  4. Modify keys.txt in utils/keys.txt file if needed according to the vocabulary of your dataset.
  5. Modify MAX_BOXES_NUM and MAX_TRANSCRIPT_LEN in data_tuils/documents.py file if needed.

Note: The self-build datasets our paper used cannot be shared for patient privacy and proprietary issues.

Checkpoints

You can specify the name of the training session in config.json files:

"name": "PICK_Default",
"run_id": "test"

The checkpoints will be saved in save_dir/name/run_id_timestamp/checkpoint_epoch_n, with timestamp in mmdd_HHMMSS format.

A copy of config.json file will be saved in the same folder.

Note: checkpoints contain:

{
  'arch': arch,
  'epoch': epoch,
  'state_dict': self.model.state_dict(),
  'optimizer': self.optimizer.state_dict(),
  'monitor_best': self.monitor_best,
  'config': self.config
}

Tensorboard Visualization

This project supports Tensorboard visualization by using either torch.utils.tensorboard or TensorboardX.

  1. Install

    If you are using pytorch 1.1 or higher, install tensorboard by 'pip install tensorboard>=1.14.0'.

    Otherwise, you should install tensorboardx. Follow installation guide in TensorboardX.

  2. Run training

    Make sure that tensorboard option in the config file is turned on.

     "tensorboard" : true
    
  3. Open Tensorboard server

    Type tensorboard --logdir saved/log/ at the project root, then server will open at http://localhost:6006

By default, values of loss will be logged. If you need more visualizations, use add_scalar('tag', data), add_image('tag', image), etc in the trainer._train_epoch method. add_something() methods in this project are basically wrappers for those of tensorboardX.SummaryWriter and torch.utils.tensorboard.SummaryWriter modules.

Note: You don't have to specify current steps, since WriterTensorboard class defined at logger/visualization.py will track current steps.

Results on Train Ticket

example

TODOs

  • Dataset cache mechanism to speed up training loop
  • Multi-node multi-gpu setup (DistributedDataParallel)

Citations

If you find this code useful please cite our paper:

@inproceedings{Yu2020PICKPK,
  title={{PICK}: Processing Key Information Extraction from Documents using 
  Improved Graph Learning-Convolutional Networks},
  author={Wenwen Yu and Ning Lu and Xianbiao Qi and Ping Gong and Rong Xiao},
  booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
  year={2020}
}

License

This project is licensed under the MIT License. See LICENSE for more details.

Acknowledgements

This project structure takes example by PyTorch Template Project.

Owner
Wenwen Yu
Ph.D. student at Huazhong University of Science and Technology
Wenwen Yu
Interpretation of T cell states using reference single-cell atlases

Interpretation of T cell states using reference single-cell atlases ProjecTILs is a computational method to project scRNA-seq data into reference sing

Cancer Systems Immunology Lab 139 Jan 03, 2023
An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

CV Lab @ Yonsei University 35 Oct 26, 2022
[NeurIPS 2021] "Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems"

Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems Introduction Multi-agent control i

VITA 6 May 05, 2022
Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model a

Google 3.2k Dec 31, 2022
Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

THUDM 101 Dec 16, 2022
chainladder - Property and Casualty Loss Reserving in Python

chainladder (python) chainladder - Property and Casualty Loss Reserving in Python This package gets inspiration from the popular R ChainLadder package

Casualty Actuarial Society 130 Dec 07, 2022
Addition of pseudotorsion caclulation eta, theta, eta', and theta' to barnaba package

Addition to Original Barnaba Code: This is modified version of Barnaba package to calculate RNA pseudotorsion angles eta, theta, eta', and theta'. Ple

Mandar Kulkarni 1 Jan 11, 2022
An end-to-end regression problem of predicting the price of properties in Bangalore.

Bangalore-House-Price-Prediction An end-to-end regression problem of predicting the price of properties in Bangalore. Deployed in Heroku using Flask.

Shruti Balan 1 Nov 25, 2022
PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch.

snn-localization repo PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch. Install Dependencies Orig

Sami BARCHID 1 Jan 06, 2022
To provide 100 JAX exercises over different sections structured as a course or tutorials to teach and learn for beginners, intermediates as well as experts

JaxTon 💯 JAX exercises Mission 🚀 To provide 100 JAX exercises over different sections structured as a course or tutorials to teach and learn for beg

Rohan Rao 512 Jan 01, 2023
Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"

Focal Transformer This is the official implementation of our Focal Transformer -- "Focal Self-attention for Local-Global Interactions in Vision Transf

Microsoft 486 Dec 20, 2022
A BaSiC Tool for Background and Shading Correction of Optical Microscopy Images

BaSiC Matlab code accompanying A BaSiC Tool for Background and Shading Correction of Optical Microscopy Images by Tingying Peng, Kurt Thorn, Timm Schr

Marr Lab 34 Dec 18, 2022
Scales, Chords, and Cadences: Practical Music Theory for MIR Researchers

ISMIR-musicTheoryTutorial This repository has slides and Jupyter notebooks for the ISMIR 2021 tutorial Scales, Chords, and Cadences: Practical Music T

Johanna Devaney 58 Oct 11, 2022
Library for 8-bit optimizers and quantization routines.

bitsandbytes Bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers and quantization functions. Paper -- V

Facebook Research 687 Jan 04, 2023
TLoL (Python Module) - League of Legends Deep Learning AI (Research and Development)

TLoL-py - League of Legends Deep Learning Library TLoL-py is the Python component of the TLoL League of Legends deep learning library. It provides a s

7 Nov 29, 2022
A PyTorch Implementation of "Neural Arithmetic Logic Units"

Neural Arithmetic Logic Units [WIP] This is a PyTorch implementation of Neural Arithmetic Logic Units by Andrew Trask, Felix Hill, Scott Reed, Jack Ra

Kevin Zakka 181 Nov 18, 2022
TensorFlow implementation of Elastic Weight Consolidation

Elastic weight consolidation Introduction A TensorFlow implementation of elastic weight consolidation as presented in Overcoming catastrophic forgetti

James Stokes 67 Oct 11, 2022
GUPNet - Geometry Uncertainty Projection Network for Monocular 3D Object Detection

GUPNet This is the official implementation of "Geometry Uncertainty Projection Network for Monocular 3D Object Detection". citation If you find our wo

Yan Lu 103 Dec 28, 2022
Starter Code for VALUE benchmark

StarterCode for VALUE Benchmark This is the starter code for VALUE Benchmark [website], [paper]. This repository currently supports all baseline model

VALUE Benchmark 73 Dec 09, 2022
Official PyTorch implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

U-GAT-IT — Official PyTorch Implementation : Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Imag

Hyeonwoo Kang 2.4k Jan 04, 2023