ARU-Net - Deep Learning Chinese Word Segment

Overview

ARU-Net: A Neural Pixel Labeler for Layout Analysis of Historical Documents

Contents

Introduction

This is the Tensorflow code corresponding to A Two-Stage Method for Text Line Detection in Historical Documents . This repo contains the neural pixel labeling part described in the paper. It contains the so-called ARU-Net (among others) which is basically an extended version of the well known U-Net [2]. Besides the model and the basic workflow to train and test models, different data augmentation strategies are implemented to reduce the amound of training data needed. The repo's features are summarized below:

  • Inference Demo
    • Trained and freezed tensorflow graph included
    • Easy to reuse for own inference tests
  • Workflow
    • Full training workflow to parametrize and train your own models
    • Contains different models, data augmentation strategies, loss functions
    • Training on specific GPU, this enables the training of several models on a multi GPU system in parallel
    • Easy validation for trained model either using classical or ema-shadow weights

Please cite [1] if you find this repo useful and/or use this software for own work.

Installation

  1. Use python 2.7
  2. Any version of tensorflow version > 1.0 should be ok.
  3. Python packages: matplotlib (>=1.3.1), pillow (>=2.1.0), scipy (>=1.0.0), scikit-image (>=0.13.1), click (>=5.x)
  4. Clone the Repo
  5. Done

Demo

To run the demo follow:

  1. Open a shell
  2. Make sure Tensorflow is available, e.g., go to docker environment, activate conda, ...
  3. Navigate to the repo folder YOUR_PATH/ARU-Net/
  4. Run:
python run_demo_inference.py 

The demo will load a trained model and perform inference for five sample images of the cBad test set [3], [4]. The network was trained to predict the position of baselines and separators for the begining and end of each text line. After running the python script you should see a matplot window. To go to the next image just close it.

Example

The example images are sampled from the cBad test set [3], [4]. One image along with its results are shown below.

image_1 image_2 image_3

Training

This section describes step-by-step the procedure to train your own model.

Train data:

The following describes how the training data should look like:

  • The images along with its pixel ground truth have to be in the same folder
  • For each image: X.jpg, there have to be images named X_GT0.jpg, X_GT1.jpg, X_GT2.jpg, ... (for each channel to be predicted one GT image)
  • Each ground truth image is binary and contains ones at positions where the corresponding class is present and zeros otherwise (see demo_images/demo_traindata for a sample)
  • Generate a list containing row-wise the absolute pathes to the images (just the document images not the GT ones)

Val data:

The following describes how the validation data should look like:

Train the model:

The following describes how to train a model:

  • Have a look at the pix_lab/main/train_aru.py script
  • Parametrize it like you wish (have a look at the data_provider, cost and optimizer scripts to see all parameters)
  • Setting the correct paths, adapting the number of output classes and using the default parametrization should work fine for a first training
  • Run:
python -u pix_lab/main/train_aru.py &> info.log 

Validate the model:

The following describes how to validate a trained model:

  • Train and val losses are printed in info.log
  • To validate the checkpoints using the classical weights as well as its ema-shadows, adapt and run:
pix_lab/main/validate_ckpt.py

Comments

If you are interested in a related problem, this repo could maybe help you as well. The ARU-Net can be used for each pixel labeling task, besides the baseline detection task, it can be easily used for, e.g., binarization, page segmentation, ... purposes.

References

Please cite [1] if using this code.

A Two-Stage Method for Text Line Detection in Historical Documents

[1] T. Grüning, G. Leifert, T. Strauß, R. Labahn, A Two-Stage Method for Text Line Detection in Historical Documents

@article{Gruning2018,
arxivId = {1802.03345},
author = {Gr{\"{u}}ning, Tobias and Leifert, Gundram and Strau{\ss}, Tobias and Labahn, Roger},
title = {{A Two-Stage Method for Text Line Detection in Historical Documents}},
url = {http://arxiv.org/abs/1802.03345},
year = {2018}
}

U-Net: Convolutional Networks for Biomedical Image Segmentation

[2] O. Ronneberger, P, Fischer, T, Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation

@article{Ronneberger2015,
arxivId = {1505.04597},
author = {Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas},
journal = {Miccai},
pages = {234--241},
title = {{U-Net: Convolutional Networks for Biomedical Image Segmentation}},
year = {2015}
}

READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

[3] T. Grüning, R. Labahn, M. Diem, F. Kleber, S. Fiel, READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

@article{Gruning2017,
arxivId = {1705.03311},
author = {Gr{\"{u}}ning, Tobias and Labahn, Roger and Diem, Markus and Kleber, Florian and Fiel, Stefan},
title = {{READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents}},
url = {http://arxiv.org/abs/1705.03311},
year = {2017}
}

A Robust and Binarization-Free Approach for Text Line Detection in Historical Documents

[4] M. Diem, F. Kleber, S. Fiel, T. Grüning, B. Gatos, ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)

@misc{Diem2017,
author = {Diem, Markus and Kleber, Florian and Fiel, Stefan and Gr{\"{u}}ning, Tobias and Gatos, Basilis},
doi = {10.5281/zenodo.257972},
title = {ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)},
year = {2017}
}
Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract

Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract Toolset U^2-Net is used for background removal Textcleaner is used for image cleaning

3 Jul 13, 2022
Indonesian ID Card OCR using tesseract OCR

KTP OCR Indonesian ID Card OCR using tesseract OCR KTP OCR is python-flask with tesseract web application to convert Indonesian ID Card to text / JSON

Revan Muhammad Dafa 5 Dec 06, 2021
Deep learning based page layout analysis

Deep Learning Based Page Layout Analyze This is a Python implementaion of page layout analyze tool. The goal of page layout analyze is to segment page

186 Dec 29, 2022
Kornia is a open source differentiable computer vision library for PyTorch.

Open Source Differentiable Computer Vision Library

kornia 7.6k Jan 06, 2023
CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)

CUTIE TensorFlow implementation of the paper "CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor." Xiaohu

Zhao,Xiaohui 147 Dec 20, 2022
EQFace: An implementation of EQFace: A Simple Explicit Quality Network for Face Recognition

EQFace: A Simple Explicit Quality Network for Face Recognition The first face recognition network that generates explicit face quality online.

DeepCam Shenzhen 141 Dec 31, 2022
天池2021"全球人工智能技术创新大赛"【赛道一】:医学影像报告异常检测 - 第三名解决方案

天池2021"全球人工智能技术创新大赛"【赛道一】:医学影像报告异常检测 比赛链接 个人博客记录 目录结构 ├── final------------------------------------决赛方案PPT ├── preliminary_contest--------------------

19 Aug 17, 2022
Document Layout Analysis Projects

Layout_Analysis Introduction This is an implementation of RLSA and X-Y Cut with OpenCV Dependencies OpenCV 3.0+ How to use Compile with g++ : g++ -std

22 Dec 08, 2022
A real-time dolly zoom camera effect

Dolly-Zoom I've always been amazed by the gradual perspective change of dolly zoom, and I have some experience in python and OpenCV, so I decided to c

Dylan Kai Lau 52 Dec 08, 2022
Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

Dataset and Code for RealVSR Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme Xi Yang, Wangmeng Xiang,

Xi Yang 91 Nov 22, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

DewarpNet This repository contains the codes for DewarpNet training. Recent Updates [May, 2020] Added evaluation images and an important note about Ma

<a href=[email protected]"> 354 Jan 01, 2023
An application of high resolution GANs to dewarp images of perturbed documents

Docuwarp This project is focused on dewarping document images through the usage of pix2pixHD, a GAN that is useful for general image to image translat

Thomas Huang 97 Dec 25, 2022
a micro OCR network with 0.07mb params.

MicroOCR a micro OCR network with 0.07mb params. Layer (type) Output Shape Param # Conv2d-1 [-1, 64, 8,

william 29 Aug 06, 2022
Web interface for browsing arXiv papers

Currently, arxivbox considers only major computer vision and machine learning conferences

Ankan Kumar Bhunia 12 Sep 11, 2022
2 telegram-bots: for image recognition and for text generation

💻 📱 Telegram_Bots 🔎 & 📖 2 telegram-bots: for image recognition and for text generation. About Image recognition bot: User sends a photo and bot de

Marina Polukoshko 1 Jan 27, 2022
Awesome anomaly detection in medical images

A curated list of awesome anomaly detection works in medical imaging, inspired by the other awesome-* initiatives.

Kang Zhou 57 Dec 19, 2022
Ocular is a state-of-the-art historical OCR system.

Ocular Ocular is a state-of-the-art historical OCR system. Its primary features are: Unsupervised learning of unknown fonts: requires only document im

228 Dec 30, 2022
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Damian Panek 176 Nov 28, 2022
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Multi-Type-TD-TSR Check it out on Source Code of our Paper: Multi-Type-TD-TSR Extracting Tables from Document Images using a Multi-stage Pipeline for

Pascal Fischer 178 Dec 27, 2022