Table recognition inside douments using neural networks

Overview

TableTrainNet

A simple project for training and testing table recognition in documents.

This project was developed to make a neural network which recognizes tables inside documents. I needed an "intelligent" ocr for work, which could automatically recognize tables to treat them separately.

General overview

The project uses the pre-trained neural network offered by Tensorflow. In addition, a config file was used, according to the choosen pre-trained model, to train with object detections tensorflow API

The datasets was taken from:

Required libraries

Before we go on make sure you have everything installed to be able to use the project:

  • Python 3
  • Tensorflow (tested on r1.8)
  • Its object-detection API (remember to install COCO API. If you are on Windows see at the bottom of the readme)
  • Pillow
  • opencv-python
  • pandas
  • pyprind (useful for process bars)

Project pipeline

The project is made up of different parts that acts together as a pipeline.

Take confidence with costants

I have prepared two "costants" files: dataset_costants.py and inference_constants.py. The first contains all those costants that are useful to use to create dataset, the second to make inference with the frozen graph. If you just want to run the project you should modify only those two files.

Transform the images from RGB to single-channel 8-bit grayscale jpeg images

Since colors are not useful for table detection, we can convert all the images in .jpeg 8-bit single channel images. This) transformation is still under testing. Use python dataset/img_to_jpeg.py after setting dataset_costants.py:

  • DPI_EXTRACTION: output quality of the images;
  • PATH_TO_IMAGES: path/to/datase/images;
  • IMAGES_EXTENSION: extension of the extracted images. The only one tested is .jpeg.

Prepare the dataset for Tensorflow

The dataset was take from ICDAR 2017 POD Competition . It comes with a xml notation file with formulas, images and tables per image. Tensorflow instead can build its own TFRecord from csv informations, so we need to convert the xml files into a csv one. Use python dataset/generate_database_csv.py to do this conversion after setting dataset_costants.py:

  • TRAIN_CSV_NAME: name for .csv train output file;
  • TEST_CSV_NAME: name for .csv test output file;
  • TRAIN_CSV_TO_PATH: folder path for TRAIN_CSV_NAME;
  • TEST_CSV_TO_PATH: folder path for TEST_CSV_NAME;
  • ANNOTATIONS_EXTENSION: extension of annotations. In our case is .xml;
  • TRAINING_PERCENTAGE: percentage of images for training
  • TEST_PERCENTAGE: percentage of images for testing
  • TABLE_DICT: dictionary for data labels. For this project there is no reason to change it;
  • MIN_WIDTH_BOX, MIN_HEIGHT_BOX: minimum dimension to consider a box valid; Some networks don't digest well little boxes, so I put this check.

Generate TF records file

csv files and images are ready: now we need to create our TF record file to feed Tensorflow. Use python generate_tf_records.py to create the train and test.record files that we will need later. No need to configure dataset_costants.py

Train the network

Inside trained_models there are some folders. In each one there are two files, a .config and a .txt one. The first contains a tensorflow configuration, that has to be personalized:

  • fine_tune_checkpoint: path to the frozen graph from pre-trained tensorflow models networks;
  • tf_record_input_reader: path to the train.record and test.record file we created before;
  • label_map_path: path to the labels of your dataset.

The latter contains the command to launch from tensorflow/models/research/object-detection and follows this pattern:

python model_main.py \
--pipeline_config_path=path/to/your_config_file.config \
--model_dir=here/we/save/our/model" \ 
--num_train_steps=num_of_iterations \
--alsologtostderr

Other options are inside tensorflow/models/research/object-detection/model_main.py

Prepare frozen graph

When the net has finished the training, you can export a frozen graph to make inference. Tensorflow offers the utility: from tensorflow/models/research/object-detection run:

python export_inference_graph.py \ 
--input_type=image_tensor \
--pipeline_config_path=path/to/automatically/created/pipeline.config \ 
--trained_checkpoint_prefix=path/to/last/model.ckpt-xxx \
--output_directory=path/to/output/dir

Test your graph!

Now that you have your graph you can try it out: Run inference_with_net.py and set inference_costants.py:

  • PATHS_TO_TEST_IMAGE: path list to all the test images;
  • BMP_IMAGE_TEST_TO_PATH: path to which save test output files;
  • PATHS_TO_LABELS: path to .pbtxt label file;
  • MAX_NUM_BOXES: max number of boxes to be considered;
  • MIN_SCORE: minimum score of boxes to be considered;

Then it will be generated a result image for every combination of:

  • PATHS_TO_CKPTS: list path to all frozen graph you want to test;

In addition it will print a "merged" version of the boxes, in which all the best vertically overlapping boxes are merged together to gain accuracy. TEST_SCORES is a list of numbers that tells the program which scores must be merged together.

The procedure is better described in inference_with_net.py.

For every execution a .log file will be produced.

Common issues while installing Tensorflow models

TypeError: can't pickle dict_values objects

This comment will probably solve your problem.

Windows build and python3 support for COCO API dataset

This clone will provide a working source for COCO API in Windows and Python3

Owner
Giovanni Cavallin
Giovanni Cavallin
A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).

OCR Resources This repository contains a collection of resources (including the papers and datasets) of OCR (Optical Character Recognition). Contents

Zuming Huang 363 Jan 03, 2023
A little but useful tool to explore OCR data extracted with `pytesseract` and `opencv`

Screenshot OCR Tool Extracting data from screen time screenshots in iOS and Android. We are exploring 3 options: Simple OCR with no text position usin

Gabriele Marini 1 Dec 07, 2021
Introduction to image processing, most used and popular functions of OpenCV

👀 OpenCV 101 Introduction to image processing, most used and popular functions of OpenCV go here.

Vusal Ismayilov 3 Jul 02, 2022
This is a real life mario project using python and mediapipe

real-life-mario This is a real life mario project using python and mediapipe How to run to run this just run - realMario.py file requirements This req

Programminghut 42 Dec 22, 2022
ERQA - Edge Restoration Quality Assessment

ERQA - a full-reference quality metric designed to analyze how good image and video restoration methods (SR, deblurring, denoising, etc) are restoring real details.

MSU Video Group 27 Dec 17, 2022
Read Japanese manga inside browser with selectable text.

mokuro Read Japanese manga with selectable text inside a browser. See demo: https://kha-white.github.io/manga-demo mokuro_demo.mp4 Demo contains excer

Maciej Budyś 170 Dec 27, 2022
零样本学习测评基准,中文版

ZeroCLUE 零样本学习测评基准,中文版 零样本学习是AI识别方法之一。 简单来说就是识别从未见过的数据类别,即训练的分类器不仅仅能够识别出训练集中已有的数据类别, 还可以对于来自未见过的类别的数据进行区分。 这是一个很有用的功能,使得计算机能够具有知识迁移的能力,并无需任何训练数据, 很符合现

CLUE benchmark 27 Dec 10, 2022
An expandable and scalable OCR pipeline

Overview Nidaba is the central controller for the entire OGL OCR pipeline. It oversees and automates the process of converting raw images into citable

81 Jan 04, 2023
BNF Globalization Code (CVPR 2016)

Boundary Neural Fields Globalization This is the code for Boundary Neural Fields globalization method. The technical report of the method can be found

25 Apr 15, 2022
Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB

dio-live-textract2 Repositório de código para o live coding do dia 05/10/2021 sobre extração de dados estruturados e gravação em banco de dados a part

hugoportela 0 Jan 19, 2022
A toolbox of scene text detection and recognition

FudanOCR This toolbox contains the implementations of the following papers: Scene Text Telescope: Text-Focused Scene Image Super-Resolution [Chen et a

FudanVIC Team 170 Dec 26, 2022
7th place solution

SIIM-FISABIO-RSNA-COVID-19-Detection 7th place solution Validation: We used iterative-stratification with 5 folds (https://github.com/trent-b/iterativ

11 Jul 17, 2022
Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

OCRopus 3.2k Dec 31, 2022
Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

Ramana Subramanyam 76 Dec 06, 2022
Implementation of EAST scene text detector in Keras

EAST: An Efficient and Accurate Scene Text Detector This is a Keras implementation of EAST based on a Tensorflow implementation made by argman. The or

Jan Zdenek 208 Nov 15, 2022
Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Paper source Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 News We update RRPN in pytorch 1.0! View

428 Nov 22, 2022
Hand Detection and Finger Detection on Live Feed

Hand-Detection-On-Live-Feed Hand Detection and Finger Detection on Live Feed Getting Started Install the dependencies $ git clone https://github.com/c

Chauhan Mahaveer 2 Jan 02, 2022
BD-ALL-DIGIT - This Is Bangladeshi All Sim Cloner Tools

BANGLADESHI ALL SIM CLONER TOOLS INSTALL TOOL ON TERMUX $ apt update $ apt upgra

MAHADI HASAN AFRIDI 2 Jan 19, 2022
A tool to enhance your old/damaged pictures built using python & opencv.

Breathe Life into your Old Pictures Table of Contents About The Project Getting Started Prerequisites Usage Contact Acknowledgments About The Project

Shah Anwaar Khalid 5 Dec 16, 2021
Natural language detection

Detect the language of text. What’s so cool about franc? franc can support more languages(†) than any other library franc is packaged with support for

Titus 3.8k Jan 02, 2023