ARU-Net - Deep Learning Chinese Word Segment

Last update: Sep 12, 2022

Related tags

Overview

ARU-Net: A Neural Pixel Labeler for Layout Analysis of Historical Documents

Introduction
Installation
Demo
Training

Introduction

This is the Tensorflow code corresponding to A Two-Stage Method for Text Line Detection in Historical Documents . This repo contains the neural pixel labeling part described in the paper. It contains the so-called ARU-Net (among others) which is basically an extended version of the well known U-Net [2]. Besides the model and the basic workflow to train and test models, different data augmentation strategies are implemented to reduce the amound of training data needed. The repo's features are summarized below:

Inference Demo
- Trained and freezed tensorflow graph included
- Easy to reuse for own inference tests
Workflow
- Full training workflow to parametrize and train your own models
- Contains different models, data augmentation strategies, loss functions
- Training on specific GPU, this enables the training of several models on a multi GPU system in parallel
- Easy validation for trained model either using classical or ema-shadow weights

Please cite [1] if you find this repo useful and/or use this software for own work.

Installation

Use python 2.7
Any version of tensorflow version > 1.0 should be ok.
Python packages: matplotlib (>=1.3.1), pillow (>=2.1.0), scipy (>=1.0.0), scikit-image (>=0.13.1), click (>=5.x)
Clone the Repo
Done

Demo

To run the demo follow:

Open a shell
Make sure Tensorflow is available, e.g., go to docker environment, activate conda, ...
Navigate to the repo folder YOUR_PATH/ARU-Net/
Run:

python run_demo_inference.py

The demo will load a trained model and perform inference for five sample images of the cBad test set [3], [4]. The network was trained to predict the position of baselines and separators for the begining and end of each text line. After running the python script you should see a matplot window. To go to the next image just close it.

Example

The example images are sampled from the cBad test set [3], [4]. One image along with its results are shown below.

Training

This section describes step-by-step the procedure to train your own model.

Train data:

The following describes how the training data should look like:

The images along with its pixel ground truth have to be in the same folder
For each image: X.jpg, there have to be images named X_GT0.jpg, X_GT1.jpg, X_GT2.jpg, ... (for each channel to be predicted one GT image)
Each ground truth image is binary and contains ones at positions where the corresponding class is present and zeros otherwise (see demo_images/demo_traindata for a sample)
Generate a list containing row-wise the absolute pathes to the images (just the document images not the GT ones)

Val data:

The following describes how the validation data should look like:

See train data

Train the model:

The following describes how to train a model:

Have a look at the pix_lab/main/train_aru.py script
Parametrize it like you wish (have a look at the data_provider, cost and optimizer scripts to see all parameters)
Setting the correct paths, adapting the number of output classes and using the default parametrization should work fine for a first training
Run:

python -u pix_lab/main/train_aru.py &> info.log

Validate the model:

The following describes how to validate a trained model:

Train and val losses are printed in info.log
To validate the checkpoints using the classical weights as well as its ema-shadows, adapt and run:

pix_lab/main/validate_ckpt.py

Comments

If you are interested in a related problem, this repo could maybe help you as well. The ARU-Net can be used for each pixel labeling task, besides the baseline detection task, it can be easily used for, e.g., binarization, page segmentation, ... purposes.

References

Please cite [1] if using this code.

A Two-Stage Method for Text Line Detection in Historical Documents

[1] T. Grüning, G. Leifert, T. Strauß, R. Labahn, A Two-Stage Method for Text Line Detection in Historical Documents

@article{Gruning2018,
arxivId = {1802.03345},
author = {Gr{\"{u}}ning, Tobias and Leifert, Gundram and Strau{\ss}, Tobias and Labahn, Roger},
title = {{A Two-Stage Method for Text Line Detection in Historical Documents}},
url = {http://arxiv.org/abs/1802.03345},
year = {2018}
}

U-Net: Convolutional Networks for Biomedical Image Segmentation

[2] O. Ronneberger, P, Fischer, T, Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation

@article{Ronneberger2015,
arxivId = {1505.04597},
author = {Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas},
journal = {Miccai},
pages = {234--241},
title = {{U-Net: Convolutional Networks for Biomedical Image Segmentation}},
year = {2015}
}

READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

[3] T. Grüning, R. Labahn, M. Diem, F. Kleber, S. Fiel, READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

@article{Gruning2017,
arxivId = {1705.03311},
author = {Gr{\"{u}}ning, Tobias and Labahn, Roger and Diem, Markus and Kleber, Florian and Fiel, Stefan},
title = {{READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents}},
url = {http://arxiv.org/abs/1705.03311},
year = {2017}
}

A Robust and Binarization-Free Approach for Text Line Detection in Historical Documents

[4] M. Diem, F. Kleber, S. Fiel, T. Grüning, B. Gatos, ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)

@misc{Diem2017,
author = {Diem, Markus and Kleber, Florian and Fiel, Stefan and Gr{\"{u}}ning, Tobias and Gatos, Basilis},
doi = {10.5281/zenodo.257972},
title = {ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)},
year = {2017}
}

ARU-Net - Deep Learning Chinese Word Segment

Related tags

Overview

ARU-Net: A Neural Pixel Labeler for Layout Analysis of Historical Documents

Contents

Introduction

Installation

Demo

Example

Training

Train data:

Val data:

Train the model:

Validate the model:

Comments

References

A Two-Stage Method for Text Line Detection in Historical Documents

U-Net: Convolutional Networks for Biomedical Image Segmentation

READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

A Robust and Binarization-Free Approach for Text Line Detection in Historical Documents

Owner

Handwriting Recognition System based on a deep Convolutional Recurrent Neural Network architecture

Fatigue Driving Detection Based on Dlib

BNF Globalization Code (CVPR 2016)

Camera Intrinsic Calibration and Hand-Eye Calibration in Pybullet

A python screen recorder for low-end computers, provides high quality video output.

【Auto】原神⭐钓鱼辅助工具 | 自动收竿、校准游标 | ✨您只需要抛出鱼竿，我们会帮你完成一切✨

Official code for :rocket: Unsupervised Change Detection of Extreme Events Using ML On-Board :rocket:

A python script based on opencv and paddleocr, which can automatically pick up tasks, make cookies, and receive rewards in the Destiny 2 Dawning Oven

Repository of conference publications and source code for first-/ second-authored papers published at NeurIPS, ICML, and ICLR.

A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Library used to deskew a scanned document

Motion Detection Squid Game with OpenCV Python

Can We Find Neurons that Cause Unrealistic Images in Deep Generative Networks?

This is a passport scanning web service to help you scan, identify and validate your passport created with a simple and flexible design and ready to be integrated right into your system!

Reference Code for AAAI-20 paper "Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels"

Dirty, ugly, and hopefully useful OCR of Facebook Papers docs released by Gizmodo

(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

Distort a video using Seam Carving (video) and Vibrato effect (sound)

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

Textboxes : Image Text Detection Model : python package (tensorflow)