Table recognition inside douments using neural networks

Overview

TableTrainNet

A simple project for training and testing table recognition in documents.

This project was developed to make a neural network which recognizes tables inside documents. I needed an "intelligent" ocr for work, which could automatically recognize tables to treat them separately.

General overview

The project uses the pre-trained neural network offered by Tensorflow. In addition, a config file was used, according to the choosen pre-trained model, to train with object detections tensorflow API

The datasets was taken from:

Required libraries

Before we go on make sure you have everything installed to be able to use the project:

  • Python 3
  • Tensorflow (tested on r1.8)
  • Its object-detection API (remember to install COCO API. If you are on Windows see at the bottom of the readme)
  • Pillow
  • opencv-python
  • pandas
  • pyprind (useful for process bars)

Project pipeline

The project is made up of different parts that acts together as a pipeline.

Take confidence with costants

I have prepared two "costants" files: dataset_costants.py and inference_constants.py. The first contains all those costants that are useful to use to create dataset, the second to make inference with the frozen graph. If you just want to run the project you should modify only those two files.

Transform the images from RGB to single-channel 8-bit grayscale jpeg images

Since colors are not useful for table detection, we can convert all the images in .jpeg 8-bit single channel images. This) transformation is still under testing. Use python dataset/img_to_jpeg.py after setting dataset_costants.py:

  • DPI_EXTRACTION: output quality of the images;
  • PATH_TO_IMAGES: path/to/datase/images;
  • IMAGES_EXTENSION: extension of the extracted images. The only one tested is .jpeg.

Prepare the dataset for Tensorflow

The dataset was take from ICDAR 2017 POD Competition . It comes with a xml notation file with formulas, images and tables per image. Tensorflow instead can build its own TFRecord from csv informations, so we need to convert the xml files into a csv one. Use python dataset/generate_database_csv.py to do this conversion after setting dataset_costants.py:

  • TRAIN_CSV_NAME: name for .csv train output file;
  • TEST_CSV_NAME: name for .csv test output file;
  • TRAIN_CSV_TO_PATH: folder path for TRAIN_CSV_NAME;
  • TEST_CSV_TO_PATH: folder path for TEST_CSV_NAME;
  • ANNOTATIONS_EXTENSION: extension of annotations. In our case is .xml;
  • TRAINING_PERCENTAGE: percentage of images for training
  • TEST_PERCENTAGE: percentage of images for testing
  • TABLE_DICT: dictionary for data labels. For this project there is no reason to change it;
  • MIN_WIDTH_BOX, MIN_HEIGHT_BOX: minimum dimension to consider a box valid; Some networks don't digest well little boxes, so I put this check.

Generate TF records file

csv files and images are ready: now we need to create our TF record file to feed Tensorflow. Use python generate_tf_records.py to create the train and test.record files that we will need later. No need to configure dataset_costants.py

Train the network

Inside trained_models there are some folders. In each one there are two files, a .config and a .txt one. The first contains a tensorflow configuration, that has to be personalized:

  • fine_tune_checkpoint: path to the frozen graph from pre-trained tensorflow models networks;
  • tf_record_input_reader: path to the train.record and test.record file we created before;
  • label_map_path: path to the labels of your dataset.

The latter contains the command to launch from tensorflow/models/research/object-detection and follows this pattern:

python model_main.py \
--pipeline_config_path=path/to/your_config_file.config \
--model_dir=here/we/save/our/model" \ 
--num_train_steps=num_of_iterations \
--alsologtostderr

Other options are inside tensorflow/models/research/object-detection/model_main.py

Prepare frozen graph

When the net has finished the training, you can export a frozen graph to make inference. Tensorflow offers the utility: from tensorflow/models/research/object-detection run:

python export_inference_graph.py \ 
--input_type=image_tensor \
--pipeline_config_path=path/to/automatically/created/pipeline.config \ 
--trained_checkpoint_prefix=path/to/last/model.ckpt-xxx \
--output_directory=path/to/output/dir

Test your graph!

Now that you have your graph you can try it out: Run inference_with_net.py and set inference_costants.py:

  • PATHS_TO_TEST_IMAGE: path list to all the test images;
  • BMP_IMAGE_TEST_TO_PATH: path to which save test output files;
  • PATHS_TO_LABELS: path to .pbtxt label file;
  • MAX_NUM_BOXES: max number of boxes to be considered;
  • MIN_SCORE: minimum score of boxes to be considered;

Then it will be generated a result image for every combination of:

  • PATHS_TO_CKPTS: list path to all frozen graph you want to test;

In addition it will print a "merged" version of the boxes, in which all the best vertically overlapping boxes are merged together to gain accuracy. TEST_SCORES is a list of numbers that tells the program which scores must be merged together.

The procedure is better described in inference_with_net.py.

For every execution a .log file will be produced.

Common issues while installing Tensorflow models

TypeError: can't pickle dict_values objects

This comment will probably solve your problem.

Windows build and python3 support for COCO API dataset

This clone will provide a working source for COCO API in Windows and Python3

Owner
Giovanni Cavallin
Giovanni Cavallin
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

Jacobo José Guijarro Villalba 75 Oct 21, 2022
Volume Control using OpenCV

Gesture-Volume-Control Volume Control using OpenCV Here i made volume control using Python and OpenCV in which we can control the volume of our laptop

Mudit Sinha 3 Oct 10, 2021
fishington.io bot with OpenCV and NumPy

fishington.io-bot fishington.io bot with using OpenCV and NumPy bot can continue to fishing fully automatically how to use Open cmd in fishington.io-b

Bahadır Araz 77 Jan 02, 2023
A toolbox of scene text detection and recognition

FudanOCR This toolbox contains the implementations of the following papers: Scene Text Telescope: Text-Focused Scene Image Super-Resolution [Chen et a

FudanVIC Team 170 Dec 26, 2022
Bu uygulamada Python ve Opencv kullanarak bilgisayar kamerasından yüz tespiti yapıyoruz.

opencv_yuz_bulma Bu uygulamada Python ve Opencv kullanarak bilgisayar kamerasından yüz tespiti yapıyoruz. Bilgisarın kendi kamerasını kullanmak için;

Ahmet Haydar Ornek 6 Apr 16, 2022
Open Source Computer Vision Library

OpenCV: Open Source Computer Vision Library Resources Homepage: https://opencv.org Courses: https://opencv.org/courses Docs: https://docs.opencv.org/m

OpenCV 65.7k Jan 03, 2023
Forked from argman/EAST for the ICPR MTWI 2018 CHALLENGE

EAST_ICPR: EAST for ICPR MTWI 2018 CHALLENGE Introduction This is a repository forked from argman/EAST for the ICPR MTWI 2018 CHALLENGE. Origin Reposi

Haozheng Li 157 Aug 23, 2022
The CIS OCR PostCorrectionTool

The CIS OCR Post Correction Tool PoCoTo Source code for the Java-based PoCoTo client enabling fast interactive batch corrections of complete OCR error

CIS OCR Group 36 Dec 15, 2022
TextBoxes++: A Single-Shot Oriented Scene Text Detector

TextBoxes++: A Single-Shot Oriented Scene Text Detector Introduction This is an application for scene text detection (TextBoxes++) and recognition (CR

Minghui Liao 930 Jan 04, 2023
Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Sign Language Recognition Service This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform s

Martin Lønne 1 Jan 08, 2022
ARU-Net - Deep Learning Chinese Word Segment

ARU-Net: A Neural Pixel Labeler for Layout Analysis of Historical Documents Contents Introduction Installation Demo Training Introduction This is the

128 Sep 12, 2022
Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

ROCA: Robust CAD Model Alignment and Retrieval from a Single Image (CVPR 2022) Code release of our paper ROCA. Check out our video, paper, and website

123 Dec 25, 2022
Create single line SVG illustrations from your pictures

Create single line SVG illustrations from your pictures

Javier Bórquez 686 Dec 26, 2022
Select range and every time the screen changes, OCR is activated.

ASOCR(Auto Screen OCR) Select range and every time you press Space key, OCR is activated. 範囲を選ぶと、あなたがスペースキーを押すたびに、画面が変わる度にOCRが起動します。 usage1: simple OC

1 Feb 13, 2022
An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come

An Agnostic Object Detection Framework IceVision is the first agnostic computer vision framework to offer a curated collection with hundreds of high-q

airctic 790 Jan 05, 2023
RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

RepMLP RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition Released the code of RepMLP together with an example o

260 Jan 03, 2023
Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des

Marek Mauder 127 Dec 03, 2022
A python script based on opencv and paddleocr, which can automatically pick up tasks, make cookies, and receive rewards in the Destiny 2 Dawning Oven

A python script based on opencv and paddleocr, which can automatically pick up tasks, make cookies, and receive rewards in the Destiny 2 Dawning Oven

1 Dec 22, 2021
This tool will help you convert your text to handwriting xD

So your teacher asked you to upload written assignments? Hate writing assigments? This tool will help you convert your text to handwriting xD

Saurabh Daware 4.2k Jan 07, 2023
A fastai/PyTorch package for unpaired image-to-image translation.

Unpaired image-to-image translation A fastai/PyTorch package for unpaired image-to-image translation currently with CycleGAN implementation. This is a

Tanishq Abraham 120 Dec 02, 2022