Form Segmentation

Let's explore how we can extract text from any forms / scanned pages.

Objectives

The goal is to find an algorithm that can extract the maximum information from a given page (jpg format). So, we can feed it to another system. (Business logic, neural network, classifier, etc.) The overall process may not be perfect. But it would be great if it can find enough information to identify the type of document and the involve identities.

Parse any form / scanned page and extract any text data (printed text and handwriting text). So, no prior knowledge of the layout / structure of the document.
Automatic extraction process (no human interaction. So, it can scale out)
Somehow fast (or the ability to speed up the task with more machines or CPU)

Challenges

There are many challenges to overcome. But the main problem is to identify which part of the form contains text.

Some other challenges:

Black Border Removal
ICR (Intelligent Character Recognition): recognize and convert hand-drawn characters into text
Scanned page (Detect edges and apply a perspective transform to obtain the top-down view of the document)
Remove noise (blur, OTSU, adaptivethreshold with opencv)
Shape detection and extraction
OCR (Not a real issue since we can use : Tesseract 4 great for printed text)
Handwriting recognition
Minimize errors

Let's explore how we can extract text from forms

Related tags

Overview

Form Segmentation

Objectives

Challenges

Owner

Philip Doxakis

Python bindings for JIGSAW: a Delaunay-based unstructured mesh generator.

STEFANN: Scene Text Editor using Font Adaptive Neural Network

a deep learning model for page layout analysis / segmentation.

Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Educational application aimed at automating user-defined workflows for the mobile game, "Granblue Fantasy", using a variety of CV technologies in the backend such as OpenCV, PyAutoGUI and EasyOCR and a frontend coded in Typescript.

Using computer vision method to recognize and calcutate the features of the architecture.

The world's simplest facial recognition api for Python and the command line

Ocular is a state-of-the-art historical OCR system.

Document Layout Analysis

Convert Text-to Handwriting Using Python

Slice a single image into multiple pieces and create a dataset from them

A python program to block out your face

Demo processor to illustrate OCR-D Python API

A tool combining EasyOCR and LaMa to automatically detect text and replace it with an inpainted background.

Corner-based Region Proposal Network

Thresholding-and-masking-using-OpenCV - Image Thresholding is used for image segmentation

Handwritten Character Recognition using CNN

A synthetic data generator for text recognition

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

This repository summarized computer vision theories.