This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector.

Overview

EAST: An Efficient and Accurate Scene Text Detector

Description:

This version will be updated soon, please pay attention to this work. The motivation of this version is to build a easy-training model. This version can automatically update best_model by comparing current hmean and the former. At the same time, we can see evaluation info about every sample easily.

  • 1.train
  • 2.predict
  • 3.compress
  • 4.compute Hmean(if Hmean is higher than before, update best_weight.pkl)
  • 5.visualization(blue, green, red)
  • 6.multi-scale test (update soon) multi-scale vis. (vis with score, scales)

Thanks

The version is ported from argman/EAST, from Tensorflow to Pytorch

Check On Website

If you have no confidence of the result of our program, you could use submit.zip to submit on website,then you can see result of every image.

Performance

  • right -- green || wrong -- red || miss -- blue visualization visualization

  • recall/precision/hmean for every test image hmean

Introduction

This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector. The features are summarized blow:

  • Only RBOX part is implemented.
  • A fast Locality-Aware NMS in C++ provided by the paper's author.(g++/gcc version 6.0 + will be ok)
  • Evalution see here for the detailed results.
  • Differences from original paper
    • Use ResNet-50 rather than PVANET
    • Use dice loss (optimize IoU of segmentation) rather than balanced cross entropy
    • Use linear learning rate decay rather than staged learning rate decay

Thanks for the author's (@zxytim) help! Please cite his paper if you find this useful.

Contents

  1. Installation
  2. Download
  3. Prepare dataset/pretrain
  4. Test
  5. Train
  6. Examples

Installation

  1. Any version of pytorch version > 0.4.0 should be ok.

Download

  1. Pretrained model is not provided temporarily. Web site is updating now, please continue to pay attention

Prepare dataset/pretrain weight

[1]. dataset(you need to prepare for dataset for train and test) suggestions: you could do a soft-link to root_to_this_program/dataset/train/img/*.jpg

  • -- train ./dataset/train/img/img_###.jpg ./dataset/train/gt/img_###.txt (you need to change name)
  • -- test ./data/test/img_###.jpg (img only)
  • -- gt.zip ./result/gt.zip(ICDAR15 gt.zip is avaliable on website

** Note: you can download dataset here

[2]. pretrained

  • In config.py set resume True and set checkpoint path/to/weight/file
  • I will provide pretrianed weight soon

[3]. check GPUs and CPUs you can use following to check aviliable gpu, this is for train

watch -n 0.1 nvidia-smi

then, you will see 2,3 is avaliable, modify config.py gpu_ids = [0,1], gpu = 2, and modify run.sh - CUDA_VISIBLE_DEVICES=2,3

Train

If you want to train the model, you should provide the dataset path in config.py and run

sh run.py

** Note: you should modify run.sh to specify your gpu id

If you have more than one gpu, you can pass gpu ids to gpu_list(like gpu_list=0,1,2,3) in config.py

** Note: you should change the gt text file of icdar2015's filename to img_*.txt instead of gt_img_*.txt(or you can change the code in icdar.py), and some extra characters should be removed from the file. See the examples in training_samples/**

Test

By default, we set train-eval process into integer. If you want to use eval independently, you can do it by yourself. Any question can contact me.

Examples

Here are some test examples on icdar2015, enjoy the beautiful text boxes! image_1 image_2 image_3 image_4 image_5

Owner
Dejia Song
Computer Vision & Machine Learning
Dejia Song
Natural language detection

Detect the language of text. What’s so cool about franc? franc can support more languages(†) than any other library franc is packaged with support for

Titus 3.8k Jan 02, 2023
Face_mosaic - Mosaic blur processing is applied to multiple faces appearing in the video

動機 face_recognitionを使用して得られる顔座標は長方形であり、この座標をそのまま用いてぼかし処理を行った場合得られる画像は醜い。 それに対してモ

Yoshitsugu Kesamaru 6 Feb 03, 2022
TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.

FOTS: Fast Oriented Text Spotting with a Unified Network I am still working on this repo. updates and detailed instructions are coming soon! Table of

Masao Taketani 52 Nov 11, 2022
An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicing

ZATCA (Fatoora) QR-Code Implementation An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicin

TheAwiteb 28 Nov 03, 2022
Python tool that takes the OCR.space JSON output as input and draws a text overlay on top of the image.

OCR.space OCR Result Checker = Draw OCR overlay on top of image Python tool that takes the OCR.space JSON output as input, and draws an overlay on to

a9t9 4 Oct 18, 2022
Slice a single image into multiple pieces and create a dataset from them

OpenCV Image to Dataset Converter Slice a single image of Persian digits into mu

Meysam Parvizi 14 Dec 29, 2022
InverseRenderNet: Learning single image inverse rendering, CVPR 2019.

InverseRenderNet: Learning single image inverse rendering !! Check out our new work InverseRenderNet++ paper and code, which improves the inverse rend

Ye Yu 141 Dec 20, 2022
Controlling the computer volume with your hands // OpenCV

HandsControll-AI Controlling the computer volume with your hands // OpenCV Step 1 git clone https://github.com/Hayk-21/HandsControll-AI.git pip instal

Hayk 1 Nov 04, 2021
governance proposal to make fei redeemable for eth

Feil Proposal 🌲 Abstract Migrate all ETH from Fei protocol-controlled value into Yearn ETH Vault. Allow redemptions of outstanding FEI for yvETH. At

13 Mar 31, 2022
OCR of Chicago 1909 Renumbering Plan

Requirements: Python 3 (probably at least 3.4) pipenv (pip3 install pipenv) tesseract (brew install tesseract, at least if you have a mac and homebrew

ted whalen 2 Nov 21, 2021
PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application in real time.

PyNeuro PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application

Zach Wang 45 Dec 30, 2022
This is a tensorflow re-implementation of PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network.My blog:

PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network Introduction This is a tensorflow re-implementation of PSENet: Shape Robu

Michael liu 498 Dec 30, 2022
SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

NVIDIA Research Projects 31 Nov 22, 2022
Code for the paper: Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution

Fusformer Code for the paper: "Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution" Plateform Python 3.8.5 + Pytor

Jin-Fan Hu (胡锦帆) 11 Dec 12, 2022
Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Total-Text-Dataset (Official site) Updated on April 29, 2020 (Detection leaderboard is updated - highlighted E2E methods. Thank you shine-lcy.) Update

Chee Seng Chan 671 Dec 27, 2022
Framework for the Complete Gaze Tracking Pipeline

Framework for the Complete Gaze Tracking Pipeline The figure below shows a general representation of the camera-to-screen gaze tracking pipeline [1].

Pascal 20 Jan 06, 2023
ERQA - Edge Restoration Quality Assessment

ERQA - a full-reference quality metric designed to analyze how good image and video restoration methods (SR, deblurring, denoising, etc) are restoring real details.

MSU Video Group 27 Dec 17, 2022
Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

OCRopus 3.2k Dec 31, 2022
OCR, Scene-Text-Understanding, Text Recognition

Scene-Text-Understanding Survey [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper [2014-Front.Comput.Sci] Scene Text Detection and

Alan Tang 354 Dec 12, 2022
Forked from argman/EAST for the ICPR MTWI 2018 CHALLENGE

EAST_ICPR: EAST for ICPR MTWI 2018 CHALLENGE Introduction This is a repository forked from argman/EAST for the ICPR MTWI 2018 CHALLENGE. Origin Reposi

Haozheng Li 157 Aug 23, 2022