A curated list of resources dedicated to scene text localization and recognition

Overview

Awesome

Scene Text Localization & Recognition Resources

A curated list of resources dedicated to scene text localization and recognition. Any suggestions and pull requests are welcome.

Papers & Code

Overview

  • [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper
  • [2014-Front.Comput.Sci] Scene Text Detection and Recognition: Recent Advances and Future Trends paper

Visual Geometry Group, University of Oxford

CUHK & SIAT

  • [2016-arXiv] Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network paper
  • [2016-AAAI] Reading Scene Text in Deep Convolutional Sequences paper
  • [2016-TIP] Text-Attentional Convolutional Neural Networks for Scene Text Detection paper
  • [2014-ECCV] Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees paper

Media and Communication Lab, HUST

  • [2016-CVPR] Robust scene text recognition with automatic rectification paper
  • [2016-CVPR] Multi-oriented text detection with fully convolutional networks paper
  • [2015-CoRR] An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition paper code github

AI Lab, Stanford

  • [2012-ICPR, Wang] End-to-End Text Recognition with Convolutional Neural Networks paper code SVHN Dataset
  • [2012-PhD thesis, David Wu] End-to-End Text Recognition with Convolutional Neural Networks paper

Others

  • [2018-CVPR] FOTS: Fast Oriented Text Spotting With a Unified Network paper
  • [2018-IJCAI] IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection paper
  • [2018-AAAI] PixelLink: Detecting Scene Text via Instance Segmentation paper code
  • [2018-AAAI] SEE: Towards Semi-Supervised End-to-End Scene Text Recognition paper code
  • [2017-arXiv] Fused Text Segmentation Networks for Multi-oriented Scene Text Detection paper
  • [2017-arXiv] WeText: Scene Text Detection under Weak Supervision paper
  • [2017-ICCV] Single Shot Text Detector with Regional Attention paper
  • [2017-ICCV] WordSup: Exploiting Word Annotations for Character based Text Detection paper
  • [2017-arXiv] R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection paper
  • [2017-CVPR] EAST: An Efficient and Accurate Scene Text Detector paper code
  • [2017-arXiv] Cascaded Segmentation-Detection Networks for Word-Level Text Spottingpaper
  • [2017-arXiv] Deep Direct Regression for Multi-Oriented Scene Text Detectionpaper
  • [2017-CVPR] Detecting oriented text in natural images by linking segments paper code
  • [2017-CVPR] Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detectionpaper
  • [2017-arXiv] Arbitrary-Oriented Scene Text Detection via Rotation Proposals paper
  • [2017-AAAI] TextBoxes: A Fast Text Detector with a Single Deep Neural Network paper code
  • [2017-ICCV] Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework paper code
  • [2016-CVPR] Recursive Recurrent Nets with Attention Modeling for OCR in the Wild paper
  • [2016-arXiv] COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images paper
  • [2016-arXiv] DeepText:A Unified Framework for Text Proposal Generation and Text Detection in Natural Images paper
  • [2015 ICDAR] Object Proposals for Text Extraction in the Wild paper code
  • [2014-TPAMI] Word Spotting and Recognition with Embedded Attributes paper homepage code

Datasets

  • MLT 2017 2017

    • 7200 training, 1800 validation images
    • Bounding box, text transcription, and script annotations
    • Task: text detection, script identification
  • COCO-Text (Computer Vision Group, Cornell) 2016

    • 63,686 images, 173,589 text instances, 3 fine-grained text attributes.
    • Task: text location and recognition
    • COCO-Text API
  • Synthetic Word Dataset (Oxford, VGG) 2014

    • 9 million images covering 90k English words
    • Task: text recognition, segmentation
    • download
  • IIIT 5K-Words 2012

    • 5000 images from Scene Texts and born-digital (2k training and 3k testing images)
    • Each image is a cropped word image of scene text with case-insensitive labels
    • Task: text recognition
    • download
  • StanfordSynth(Stanford, AI Group) 2012

    • Small single-character images of 62 characters (0-9, a-z, A-Z)
    • Task: text recognition
    • download
  • MSRA Text Detection 500 Database (MSRA-TD500) 2012

    • 500 natural images(resolutions of the images vary from 1296x864 to 1920x1280)
    • Chinese, English or mixture of both
    • Task: text detection
  • Street View Text (SVT) 2010

    • 350 high resolution images (average size 1260 × 860) (100 images for training and 250 images for testing)
    • Only word level bounding boxes are provided with case-insensitive labels
    • Task: text location
  • KAIST Scene_Text Database 2010

    • 3000 images of indoor and outdoor scenes containing text
    • Korean, English (Number), and Mixed (Korean + English + Number)
    • Task: text location, segmantation and recognition
  • Chars74k 2009

    • Over 74K images from natural images, as well as a set of synthetically generated characters
    • Small single-character images of 62 characters (0-9, a-z, A-Z)
    • Task: text recognition
  • ICDAR Benchmark Datasets

Dataset Discription Competition Paper
ICDAR 2015 1000 training images and 500 testing images paper link
ICDAR 2013 229 training images and 233 testing images paper link
ICDAR 2011 229 training images and 255 testing images paper link
ICDAR 2005 1001 training images and 489 testing images paper link
ICDAR 2003 181 training images and 251 testing images(word level and character level) paper link

Blogs

Owner
CarlosTao
CarlosTao
[python3.6] 运用tf实现自然场景文字检测,keras/pytorch实现ctpn+crnn+ctc实现不定长场景文字OCR识别

本文基于tensorflow、keras/pytorch实现对自然场景的文字检测及端到端的OCR中文文字识别 update20190706 为解决本项目中对数学公式预测的准确性,做了其他的改进和尝试,效果还不错,https://github.com/xiaofengShi/Image2Katex 希

xiaofeng 2.7k Dec 25, 2022
This project is basically to draw lines with your hand, using python, opencv, mediapipe.

Paint Opencv 📷 This project is basically to draw lines with your hand, using python, opencv, mediapipe. Screenshoots 📱 Tools ⚙️ Python Opencv Mediap

Williams Ismael Bobadilla Torres 3 Nov 17, 2021
A dataset handling library for computer vision datasets in LOST-fromat

A dataset handling library for computer vision datasets in LOST-fromat

8 Dec 15, 2022
PAGE XML format collection for document image page content and more

PAGE-XML PAGE XML format collection for document image page content and more For an introduction, please see the following publication: http://www.pri

PRImA Research Lab 46 Nov 14, 2022
Detect textlines in document images

Textline Detection Detect textlines in document images Introduction This tool performs border, region and textline detection from document image data

QURATOR-SPK 70 Jun 30, 2022
Neural search engine for AI papers

Papers search Neural search engine for ML papers. Demo Usage is simple: input an abstract, get the matching papers. The following demo also showcases

Giancarlo Fissore 44 Dec 24, 2022
WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching Code based on our WACV 2022 Accepted Paper: https://arxiv.org/pdf/

Andres 13 Dec 17, 2022
learn how to use Gesture Control to change the volume of a computer

Volume-Control-using-gesture In this project we are going to learn how to use Gesture Control to change the volume of a computer. We first look into h

Diwas Pandey 49 Sep 22, 2022
👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Quick Info this library tries to solve language detection of very short words and phrases, even shorter than tweets makes use of both statistical and

Peter M. Stahl 532 Dec 28, 2022
Detect the mathematical formula from the given picture and the same formula is extracted and converted into the latex code

Mathematical formulae extractor The goal of this project is to create a learning based system that takes an image of a math formula and returns corres

6 May 22, 2022
A simple OCR API server, seriously easy to be deployed by Docker, on Heroku as well

ocrserver Simple OCR server, as a small working sample for gosseract. Try now here https://ocr-example.herokuapp.com/, and deploy your own now. Deploy

Hiromu OCHIAI 541 Dec 28, 2022
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Multi-Type-TD-TSR Check it out on Source Code of our Paper: Multi-Type-TD-TSR Extracting Tables from Document Images using a Multi-stage Pipeline for

Pascal Fischer 178 Dec 27, 2022
This repo contains several opencv projects done while learning opencv in python.

opencv-projects-python This repo contains both several opencv projects done while learning opencv by python and opencv learning resources [Basic conce

Fatin Shadab 2 Nov 03, 2022
This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

Chandru 2 Feb 20, 2022
A selectional auto-encoder approach for document image binarization

The code of this repository was used for the following publication. If you find this code useful please cite our paper: @article{Gallego2019, title =

Javier Gallego 89 Nov 18, 2022
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

SMCG Code for the paper "Controllable Video Captioning with an Exemplar Sentence" Introduction We investigate a novel and challenging task, namely con

10 Dec 04, 2022
A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約

Scene Text Localization & Recognition Resources Read this institute-wise: English, 简体中文. Read this year-wise: English, 简体中文. Tags: [STL] (Scene Text L

Karl Lok (Zhaokai Luo) 901 Dec 11, 2022
Visual Attention based OCR

Attention-OCR Authours: Qi Guo and Yuntian Deng Visual Attention based OCR. The model first runs a sliding CNN on the image (images are resized to hei

Yuntian Deng 1.1k Jan 02, 2023
Python package for handwriting and sketching in Jupyter cells

ipysketch A Python package for handwriting and sketching in Jupyter notebooks. Usage A movie is worth a thousand pictures is worth a million words...

Matthias Baer 16 Jan 05, 2023
Virtualdragdrop - Virtual Drag and Drop Using OpenCV and Arduino

Virtualdragdrop - Virtual Drag and Drop Using OpenCV and Arduino

Rizky Dermawan 4 Mar 10, 2022