ocroseg - This is a deep learning model for page layout analysis / segmentation.

Overview

ocroseg

This is a deep learning model for page layout analysis / segmentation.

There are many different ways in which you can train and run it, but by default, it will simply return the text lines in a page image.

Segmentation

Segmentation is carried out using the ocroseg.Segmenter class. This needs a model that you can download or train yourself.

%%bash
model=lowskew-000000259-011440.pt
test -f $model || wget --quiet -nd https://storage.googleapis.com/tmb-models/$model
%pylab inline
rc("image", cmap="gray", interpolation="bicubic")
figsize(10, 10)
Populating the interactive namespace from numpy and matplotlib

The Segmenter object handles page segmentation using a DL model.

import ocroseg
seg = ocroseg.Segmenter("lowskew-000000259-011440.pt")
seg.model
Sequential(
  (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
  (2): ReLU()
  (3): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
  (6): ReLU()
  (7): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (8): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (9): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
  (10): ReLU()
  (11): LSTM2(
    (hlstm): RowwiseLSTM(
      (lstm): LSTM(64, 32, bidirectional=1)
    )
    (vlstm): RowwiseLSTM(
      (lstm): LSTM(64, 32, bidirectional=1)
    )
  )
  (12): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1))
  (13): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
  (14): ReLU()
  (15): LSTM2(
    (hlstm): RowwiseLSTM(
      (lstm): LSTM(32, 32, bidirectional=1)
    )
    (vlstm): RowwiseLSTM(
      (lstm): LSTM(64, 32, bidirectional=1)
    )
  )
  (16): Conv2d(64, 1, kernel_size=(1, 1), stride=(1, 1))
  (17): Sigmoid()
)

Let's segment a page with this.

image = 1.0 - imread("testdata/W1P0.png")[:2000]
print image.shape
imshow(image)
(2000, 2592)





<matplotlib.image.AxesImage at 0x7f6078b09690>

png

The extract_textlines method returns a list of text line images, bounding boxes, etc.

lines = seg.extract_textlines(image)
imshow(lines[0]['image'])
<matplotlib.image.AxesImage at 0x7f60781c05d0>

png

The segmenter accomplishes this by predicting seeds for each text line. With a bit of mathematical morphology, these seeds are then extended into a text line segmentation.

imshow(seg.lines)
<matplotlib.image.AxesImage at 0x7f60781a5510>

png

Training

The text line segmenter is trained using pairs of page images and line images stored in tar files.

%%bash
tar -ztvf testdata/framedlines.tgz | sed 6q
-rw-rw-r-- tmb/tmb      110404 2017-03-19 16:47 A001BIN.framed.png
-rw-rw-r-- tmb/tmb       10985 2017-03-16 16:15 A001BIN.lines.png
-rw-rw-r-- tmb/tmb       74671 2017-03-19 16:47 A002BIN.framed.png
-rw-rw-r-- tmb/tmb        8528 2017-03-16 16:15 A002BIN.lines.png
-rw-rw-r-- tmb/tmb      147716 2017-03-19 16:47 A003BIN.framed.png
-rw-rw-r-- tmb/tmb       12023 2017-03-16 16:15 A003BIN.lines.png


tar: write error
from dlinputs import tarrecords
sample = tarrecords.tariterator(open("testdata/framedlines.tgz")).next()
subplot(121); imshow(sample["framed.png"])
subplot(122); imshow(sample["lines.png"])
<matplotlib.image.AxesImage at 0x7f60e3d9bc10>

png

There are also some tools for data augmentation.

Generally, you can train these kinds of segmenters on any kind of image data, though they work best on properly binarized, rotation and skew-normalized page images. Note that by conventions, pages are white on black. You need to make sure that the model you load matches the kinds of pages you are trying to segment.

The actual models used are pretty complex and require LSTMs to function well, but for demonstration purposes, let's define and use a tiny layout analysis model. Look in bigmodel.py for a realistic model.

%%writefile tinymodel.py
def make_model():
    r = 3
    model = nn.Sequential(
        nn.Conv2d(1, 8, r, padding=r//2),
        nn.ReLU(),
        nn.MaxPool2d(2, 2),
        nn.Conv2d(8, 1, r, padding=r//2),
        nn.Sigmoid()
    )
    return model
Writing tinymodel.py
%%bash
./ocroseg-train -d testdata/framedlines.tgz --maxtrain 10 -M tinymodel.py --display 0
raw sample:
__key__ 'A001BIN'
__source__ 'testdata/framedlines.tgz'
lines.png float32 (3300, 2592)
png float32 (3300, 2592)

preprocessed sample:
__key__ <type 'list'> ['A002BIN']
__source__ <type 'list'> ['testdata/framedlines.tgz']
input float32 (1, 3300, 2592, 1)
mask float32 (1, 3300, 2592, 1)
output float32 (1, 3300, 2592, 1)

ntrain 0
model:
Sequential(
  (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU()
  (2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (3): Conv2d(8, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (4): Sigmoid()
)

0 0 ['A006BIN'] 0.24655306 ['A006BIN'] 0.31490618 0.55315816 lr 0.03
1 1 ['A007BIN'] 0.24404158 ['A007BIN'] 0.30752876 0.54983306 lr 0.03
2 2 ['A004BIN'] 0.24024434 ['A004BIN'] 0.31007746 0.54046077 lr 0.03
3 3 ['A008BIN'] 0.23756175 ['A008BIN'] 0.30573484 0.5392694 lr 0.03
4 4 ['A00LBIN'] 0.22300518 ['A00LBIN'] 0.28594157 0.52989864 lr 0.03
5 5 ['A00MBIN'] 0.22032338 ['A00MBIN'] 0.28086954 0.52204597 lr 0.03
6 6 ['A00DBIN'] 0.22794804 ['A00DBIN'] 0.27466372 0.512208 lr 0.03
7 7 ['A009BIN'] 0.22404794 ['A009BIN'] 0.27621177 0.51116604 lr 0.03
8 8 ['A001BIN'] 0.22008553 ['A001BIN'] 0.27836022 0.5008192 lr 0.03
9 9 ['A00IBIN'] 0.21842314 ['A00IBIN'] 0.26755702 0.4992323 lr 0.03
Owner
NVIDIA Research Projects
NVIDIA Research Projects
Localization of thoracic abnormalities model based on VinBigData (top 1%)

Repository contains the code for 2nd place solution of VinBigData Chest X-ray Abnormalities Detection competition. The goal of competition was to auto

33 May 24, 2022
scene-linear test images

Scene-Referred Image Collection A collection of OpenEXR Scene-Referred images, encoded as max 2048px width, DWAA 80 compression. All exrs are encoded

Gralk Klorggson 7 Aug 25, 2022
Page to PAGE Layout Analysis Tool

P2PaLA Page to PAGE Layout Analysis (P2PaLA) is a toolkit for Document Layout Analysis based on Neural Networks. 💥 Try our new DEMO for online baseli

Lorenzo Quirós Díaz 180 Nov 24, 2022
Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

hocr-tools About About the code Installation System-wide with pip System-wide from source virtualenv Available Programs hocr-check -- check the hOCR f

OCRopus 285 Dec 08, 2022
Assignment work with webcam

work with webcam : Press key 1 to use emojy on your face Press key 2 to use lip and eye on your face Press key 3 to checkered your face Press key 4 to

Hanane Kheirandish 2 May 31, 2022
Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

kornia 7.6k Jan 04, 2023
Automatically fishes for you while you are afk :)

Dank-memer-afk-script A simple and quick way to make easy money in Dank Memer! How to use Open a discord channel which has the Dank Memer bot enabled.

Pranav Doshi 9 Nov 11, 2022
Awesome Spectral Indices in Python.

Awesome Spectral Indices in Python: Numpy | Pandas | GeoPandas | Xarray | Earth Engine | Planetary Computer | Dask GitHub: https://github.com/davemlz/

David Montero Loaiza 98 Jan 02, 2023
A fastai/PyTorch package for unpaired image-to-image translation.

Unpaired image-to-image translation A fastai/PyTorch package for unpaired image-to-image translation currently with CycleGAN implementation. This is a

Tanishq Abraham 120 Dec 02, 2022
Recognizing cropped text in natural images.

ASTER: Attentional Scene Text Recognizer with Flexible Rectification ASTER is an accurate scene text recognizer with flexible rectification mechanism.

Baoguang Shi 681 Jan 02, 2023
A curated list of resources dedicated to scene text localization and recognition

Scene Text Localization & Recognition Resources A curated list of resources dedicated to scene text localization and recognition. Any suggestions and

CarlosTao 1.6k Dec 22, 2022
TableBank: A Benchmark Dataset for Table Detection and Recognition

TableBank TableBank is a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on th

844 Jan 04, 2023
nofacedb/faceprocessor is a face recognition engine for NoFaceDB program complex.

faceprocessor nofacedb/faceprocessor is a face recognition engine for NoFaceDB program complex. Tech faceprocessor uses a number of open source projec

NoFaceDB 3 Sep 06, 2021
PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector

Description This is a PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector. Only RBOX part is implemented. Using dice loss

365 Dec 20, 2022
Augmenting Anchors by the Detector Itself

Augmenting Anchors by the Detector Itself Introduction It is difficult to determine the scale and aspect ratio of anchors for anchor-based object dete

4 Nov 06, 2022
Textboxes : Image Text Detection Model : python package (tensorflow)

shinTB Abstract A python package for use Textboxes : Image Text Detection Model implemented by tensorflow, cv2 Textboxes Paper Review in Korean (My Bl

Jayne Shin (신재인) 91 Dec 15, 2022
QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021)

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 119 Dec 02, 2022
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

Jacobo José Guijarro Villalba 75 Oct 21, 2022
TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.

FOTS: Fast Oriented Text Spotting with a Unified Network I am still working on this repo. updates and detailed instructions are coming soon! Table of

Masao Taketani 52 Nov 11, 2022
PyQT5 app that colorize black & white pictures using CNN(use pre-trained model which was made with OpenCV)

About PyQT5 app that colorize black & white pictures using CNN(use pre-trained model which was made with OpenCV) Colorizor Приложение для проекта Yand

1 Apr 04, 2022