This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

Overview

DeepSceneTextReader

This is a c++ project deploying a deep scene text reading pipeline. It reads text from natural scene images.

Prerequsites

The project is written in c++ using tensorflow computational framework. It is tested using tensorflow 1.4. Newer version should be ok too, but not tested. Please install:

Please check this project on how to build project using tensorflow with cmake: https://github.com/cjweeks/tensorflow-cmake It greatly helped the progress of building this project. When building tensorflow library, please be careful since we need to use opencv. Looks like there is still problem when including tensorflow and opencv together. It will make opencv unable to read image. Check out this issue: https://github.com/tensorflow/tensorflow/issues/14267 The answer by allenlavoie solved my problem, so I paste it here:

"In the meantime, as long as you're not using any custom ops you can build libtensorflow_cc.so with bazel build --config=monolithic, which will condense everything together into one shared object (no libtensorflow_framework dependence) and seal off non-TensorFlow symbols. That shared object will have protocol buffer symbols."

Status

Currently two pretrained model is provided. One for scene text detection, and one for scene text recognition. More model will be provided. Note that the current model is not so robust. U can easily change to ur trained model. The models will be continuously updated.

build process

cd build

cmake ..

make

It will create an excutable named DetectText in bin folder.

Usage:

The excutable could be excuted in three modes: (1) Detect (2) Recognize (3) Detect and Recognize

Detect

Download the pretrained detector model and put it in model/

./DetectText --detector_graph='model/Detector_model.pb'
--image_filename='test_images/test_img1.jpg' --mode='detect' --output_filename='results/output_image.jpg'

Recognize

Download the pretrained recognizer model and put it in model/ Download the dictionary file and put it in model

./DetectText --recognizer_graph='model/Recognizer_model.pb'
--image_filename='test_images/recognize_image1.jpg' --mode='recognize'
--im_height=32 --im_width=128

Detect and Recognize

Download the pretrained detector and recognizer model and put it in model/ as described previously.

./DetectText --recognizer_graph=$recognizer_graph --detector_graph='model/Detector_model.pb'
--image_filename='model/Recognizer_model.pb' --mode='detect_and_read' --output_filename='results/output_image.jpg'

Model Description

Detector

  1. Faster RCNN Detector Model The detector is trained with modified tensorflow [object detector api]: (https://github.com/tensorflow/models/tree/master/research/object_detection) I modify it by changing the proposal scheme to regress to the 4 coordinates of the oriented bounding box rather than regular rectangular bounding box. Check out this repo for the training code. Pretrained model: FasterRCNN_detector_model.pb

  2. R2CNN will be updated. See R2CNN for details. The code is also modified with tnesorflow [object detector api]: (https://github.com/tensorflow/models/tree/master/research/object_detection) The training code will be released soon.

Recognizer

  1. CTC scene text recognizer. The recognizer model follows the famous scene text recognition CRNN model

  2. Spatial Attention OCR will be updated soon. It is based on GoogleOCR

Detect and Recognize

The whole scene text reading pipeline detects the text and rotate it horizontally and read it with recognizer. The pipeline is here:

Pretrained Models

You can play with the code with provided pretrained models.
They are not fully optimized yet, but could be used for being familiar with the code.
Check them out here: models

You will find two detection models called: (1) FasterRCNN_detector_model.pb (2) R2CNN_detector_model.pb
Two recognition models with their charset: (1) Recognizer_model.pb + charset_full.txt and (2)Recognizer_model_case_insen.pb + charset_case_insen.txt.
Full charset means English letters + digit and case insen means case insensitive English letters + digit. Let me know if u have any problens using them.

Reference and Related Projects

Contact:

Owner
Dafang He
Ph.D student at the Penn State University. Focusing on machine learning and computer vision.
Dafang He
Satoshi is a discord bot template in python using discord.py that allow you to track some live crypto prices with your own discord bot.

Satoshi ~ DiscordCryptoBot Satoshi is a simple python discord bot using discord.py that allow you to track your favorites cryptos prices with your own

Théo 2 Sep 15, 2022
This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

Chandru 2 Feb 20, 2022
GDB python tool to pretty print and debug c++ xtensor containers

gdb_xt2np GDB python tool to pretty print, examine, and debug c++ Xtensor containers. Xtensor is a c++ library for scientific computing using multidim

Christopher Burke 4 Oct 29, 2021
OCR, Scene-Text-Understanding, Text Recognition

Scene-Text-Understanding Survey [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper [2014-Front.Comput.Sci] Scene Text Detection and

Alan Tang 354 Dec 12, 2022
The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Mask TextSpotter A Pytorch implementation of Mask TextSpotter along with its extension can be find here Introduction This is the official implementati

Pengyuan Lyu 261 Nov 21, 2022
A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes A PyTorch implement of TextSnake: A Flexible Representation for Detecting

Prince Wang 417 Dec 12, 2022
A Python wrapper for Google Tesseract

Python Tesseract Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and "read" the text embedded i

Matthias A Lee 4.6k Jan 06, 2023
Text page dewarping using a "cubic sheet" model

page_dewarp Page dewarping and thresholding using a "cubic sheet" model - see full writeup at https://mzucker.github.io/2016/08/15/page-dewarping.html

Matt Zucker 1.2k Dec 29, 2022
Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

Streaming speaker diarization Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation by Juan Manuel Coria, Hervé

Juanma Coria 185 Jan 01, 2023
governance proposal to make fei redeemable for eth

Feil Proposal 🌲 Abstract Migrate all ETH from Fei protocol-controlled value into Yearn ETH Vault. Allow redemptions of outstanding FEI for yvETH. At

13 Mar 31, 2022
In this project we will be using the live feed coming from the webcam to create a virtual mouse with complete functionalities.

Virtual Mouse Using OpenCV In this project we will be using the live feed coming from the webcam to create a virtual mouse using hand tracking. Projec

Hassan Shahzad 8 Dec 20, 2022
An expandable and scalable OCR pipeline

Overview Nidaba is the central controller for the entire OGL OCR pipeline. It oversees and automates the process of converting raw images into citable

81 Jan 04, 2023
This is used to convert a string to an Image with Handwritten Characters.

Text-to-Handwriting-using-python This is used to convert a string to an Image with Handwritten Characters. text_to_handwriting(string: str, save_to: s

Akashdeep Mahata 3 Aug 15, 2022
原神风花节自动弹琴辅助

GenshinAutoPlayBalladsofBreeze 原神风花节自动弹琴辅助(已适配1920*1080分辨率) 本程序基于opencv图像识别技术,不存在任何封号。 因为正确率取决于你的cpu性能,10900k都不一定全对。 由于图像识别存在误差,根本无法确定出错时间。更不用说被检测到了。

晓轩 20 Oct 27, 2022
Some Boring Research About Products Recognition 、Duplicate Img Detection、Img Stitch、OCR

Products Recognition 介绍 商品识别,围绕在复杂的商场零售场景中,识别出货架图像中的商品信息。主要组成部分: 重复图像检测。【更新进度 4/10】 图像拼接。【更新进度 0/10】 目标检测。【更新进度 0/10】 商品识别。【更新进度 1/10】 OCR。【更新进度 1/10】

zhenjieWang 18 Jan 27, 2022
Recognizing cropped text in natural images.

ASTER: Attentional Scene Text Recognizer with Flexible Rectification ASTER is an accurate scene text recognizer with flexible rectification mechanism.

Baoguang Shi 681 Jan 02, 2023
Fusion 360 Add-in that creates a pair of toothed curves that can be used to split a body and create two pieces that slide and lock together.

Fusion-360-Add-In-PuzzleSpline Fusion 360 Add-in that creates a pair of toothed curves that can be used to split a body and create two pieces that sli

Michiel van Wessem 1 Nov 15, 2021
Document manipulation detection with python

image manipulation detection task: -- tianchi function image segmentation salie

JiaKui Hu 3 Aug 22, 2022
Repositório para registro de estudo da biblioteca opencv (Python)

OpenCV (Python) Objetivo do Repositório: Registrar avanços no estudo da biblioteca opencv. O repositório estará aberto a qualquer pessoa e há tambem u

1 Jun 14, 2022
Awesome Spectral Indices in Python.

Awesome Spectral Indices in Python: Numpy | Pandas | GeoPandas | Xarray | Earth Engine | Planetary Computer | Dask GitHub: https://github.com/davemlz/

David Montero Loaiza 98 Jan 02, 2023