Table recognition inside douments using neural networks

Overview

TableTrainNet

A simple project for training and testing table recognition in documents.

This project was developed to make a neural network which recognizes tables inside documents. I needed an "intelligent" ocr for work, which could automatically recognize tables to treat them separately.

General overview

The project uses the pre-trained neural network offered by Tensorflow. In addition, a config file was used, according to the choosen pre-trained model, to train with object detections tensorflow API

The datasets was taken from:

Required libraries

Before we go on make sure you have everything installed to be able to use the project:

  • Python 3
  • Tensorflow (tested on r1.8)
  • Its object-detection API (remember to install COCO API. If you are on Windows see at the bottom of the readme)
  • Pillow
  • opencv-python
  • pandas
  • pyprind (useful for process bars)

Project pipeline

The project is made up of different parts that acts together as a pipeline.

Take confidence with costants

I have prepared two "costants" files: dataset_costants.py and inference_constants.py. The first contains all those costants that are useful to use to create dataset, the second to make inference with the frozen graph. If you just want to run the project you should modify only those two files.

Transform the images from RGB to single-channel 8-bit grayscale jpeg images

Since colors are not useful for table detection, we can convert all the images in .jpeg 8-bit single channel images. This) transformation is still under testing. Use python dataset/img_to_jpeg.py after setting dataset_costants.py:

  • DPI_EXTRACTION: output quality of the images;
  • PATH_TO_IMAGES: path/to/datase/images;
  • IMAGES_EXTENSION: extension of the extracted images. The only one tested is .jpeg.

Prepare the dataset for Tensorflow

The dataset was take from ICDAR 2017 POD Competition . It comes with a xml notation file with formulas, images and tables per image. Tensorflow instead can build its own TFRecord from csv informations, so we need to convert the xml files into a csv one. Use python dataset/generate_database_csv.py to do this conversion after setting dataset_costants.py:

  • TRAIN_CSV_NAME: name for .csv train output file;
  • TEST_CSV_NAME: name for .csv test output file;
  • TRAIN_CSV_TO_PATH: folder path for TRAIN_CSV_NAME;
  • TEST_CSV_TO_PATH: folder path for TEST_CSV_NAME;
  • ANNOTATIONS_EXTENSION: extension of annotations. In our case is .xml;
  • TRAINING_PERCENTAGE: percentage of images for training
  • TEST_PERCENTAGE: percentage of images for testing
  • TABLE_DICT: dictionary for data labels. For this project there is no reason to change it;
  • MIN_WIDTH_BOX, MIN_HEIGHT_BOX: minimum dimension to consider a box valid; Some networks don't digest well little boxes, so I put this check.

Generate TF records file

csv files and images are ready: now we need to create our TF record file to feed Tensorflow. Use python generate_tf_records.py to create the train and test.record files that we will need later. No need to configure dataset_costants.py

Train the network

Inside trained_models there are some folders. In each one there are two files, a .config and a .txt one. The first contains a tensorflow configuration, that has to be personalized:

  • fine_tune_checkpoint: path to the frozen graph from pre-trained tensorflow models networks;
  • tf_record_input_reader: path to the train.record and test.record file we created before;
  • label_map_path: path to the labels of your dataset.

The latter contains the command to launch from tensorflow/models/research/object-detection and follows this pattern:

python model_main.py \
--pipeline_config_path=path/to/your_config_file.config \
--model_dir=here/we/save/our/model" \ 
--num_train_steps=num_of_iterations \
--alsologtostderr

Other options are inside tensorflow/models/research/object-detection/model_main.py

Prepare frozen graph

When the net has finished the training, you can export a frozen graph to make inference. Tensorflow offers the utility: from tensorflow/models/research/object-detection run:

python export_inference_graph.py \ 
--input_type=image_tensor \
--pipeline_config_path=path/to/automatically/created/pipeline.config \ 
--trained_checkpoint_prefix=path/to/last/model.ckpt-xxx \
--output_directory=path/to/output/dir

Test your graph!

Now that you have your graph you can try it out: Run inference_with_net.py and set inference_costants.py:

  • PATHS_TO_TEST_IMAGE: path list to all the test images;
  • BMP_IMAGE_TEST_TO_PATH: path to which save test output files;
  • PATHS_TO_LABELS: path to .pbtxt label file;
  • MAX_NUM_BOXES: max number of boxes to be considered;
  • MIN_SCORE: minimum score of boxes to be considered;

Then it will be generated a result image for every combination of:

  • PATHS_TO_CKPTS: list path to all frozen graph you want to test;

In addition it will print a "merged" version of the boxes, in which all the best vertically overlapping boxes are merged together to gain accuracy. TEST_SCORES is a list of numbers that tells the program which scores must be merged together.

The procedure is better described in inference_with_net.py.

For every execution a .log file will be produced.

Common issues while installing Tensorflow models

TypeError: can't pickle dict_values objects

This comment will probably solve your problem.

Windows build and python3 support for COCO API dataset

This clone will provide a working source for COCO API in Windows and Python3

Owner
Giovanni Cavallin
Giovanni Cavallin
Volume Control using OpenCV

Gesture-Volume-Control Volume Control using OpenCV Here i made volume control using Python and OpenCV in which we can control the volume of our laptop

Mudit Sinha 3 Oct 10, 2021
This is used to convert a string to an Image with Handwritten Characters.

Text-to-Handwriting-using-python This is used to convert a string to an Image with Handwritten Characters. text_to_handwriting(string: str, save_to: s

Akashdeep Mahata 3 Aug 15, 2022
Generates a message from the infamous Jerma Impostor image

Generate your very own jerma sus imposter message. Modes: Default Mode: Only supports the characters " ", !, a, b, c, d, e, h, i, m, n, o, p, q, r, s,

Giorno420 1 Oct 27, 2022
YOLOv5 in DOTA with CSL_label.(Oriented Object Detection)(Rotation Detection)(Rotated BBox)

YOLOv5_DOTA_OBB YOLOv5 in DOTA_OBB dataset with CSL_label.(Oriented Object Detection) Datasets and pretrained checkpoint Datasets : DOTA Pretrained Ch

1.1k Dec 30, 2022
MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition Python 2.7 Python 3.6 MORAN is a network with rectification mechanism for

Canjie Luo 595 Dec 27, 2022
Python bindings for JIGSAW: a Delaunay-based unstructured mesh generator.

JIGSAW: An unstructured mesh generator JIGSAW is an unstructured mesh generator and tessellation library; designed to generate high-quality triangulat

Darren Engwirda 26 Dec 13, 2022
Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera.

Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera. Fingertip location is mapped to RGB images to control the mouse cursor.

Ravi Sharma 71 Dec 20, 2022
Simple SDF mesh generation in Python

Generate 3D meshes based on SDFs (signed distance functions) with a dirt simple Python API.

Michael Fogleman 1.1k Jan 08, 2023
Table Extraction Tool

Tree Structure - Table Extraction Fonduer has been successfully extended to perform information extraction from richly formatted data such as tables.

HazyResearch 88 Jun 02, 2022
An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link Contents: Introduc

dengdan 484 Dec 07, 2022
Repositório para registro de estudo da biblioteca opencv (Python)

OpenCV (Python) Objetivo do Repositório: Registrar avanços no estudo da biblioteca opencv. O repositório estará aberto a qualquer pessoa e há tambem u

1 Jun 14, 2022
An interactive interface for using OpenCV's GrabCut algorithm for image segmentation.

Interactive GrabCut An interactive interface for using OpenCV's GrabCut algorithm for image segmentation. Setup Install dependencies: pip install nump

Jason Y. Zhang 16 Oct 10, 2022
This repository summarized computer vision theories.

This repository summarized computer vision theories.

3 Feb 04, 2022
Text language identification using Wikipedia data

Text language identification using Wikipedia data The aim of this project is to provide high-quality language detection over all the web's languages.

Vsevolod Dyomkin 28 Jul 09, 2022
Train custom VR face tracking parameters

Pal Buddy Guy: The anipal's best friend This is a small script to improve upon the tracking capabilities of the Vive Pro Eye and facial tracker. You c

7 Dec 12, 2021
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

SMCG Code for the paper "Controllable Video Captioning with an Exemplar Sentence" Introduction We investigate a novel and challenging task, namely con

10 Dec 04, 2022
An Implementation of the FOTS: Fast Oriented Text Spotting with a Unified Network

FOTS: Fast Oriented Text Spotting with a Unified Network Introduction This is a pytorch re-implementation of FOTS: Fast Oriented Text Spotting with a

GeorgeJoe 171 Aug 04, 2022
Repository of conference publications and source code for first-/ second-authored papers published at NeurIPS, ICML, and ICLR.

Repository of conference publications and source code for first-/ second-authored papers published at NeurIPS, ICML, and ICLR.

Daniel Jarrett 26 Jun 17, 2021
Morphological edge detection or object's boundary detection using erosion and dialation in OpenCV python

Morphologycal-edge-detection-using-erosion-and-dialation the task is to detect object boundary using erosion or dialation . Here, use the kernel or st

Tamzid hasan 3 Nov 25, 2022
Lightning Fast Language Prediction 🚀

whatthelang Lightning Fast Language Prediction 🚀 Dependencies The dependencies can be installed using the requirements.txt file: $ pip install -r req

Indix 152 Oct 16, 2022