Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

Last update: Jan 05, 2022

Related tags

Computer Vision SSTDNet

Overview

SSTDNet

Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight' using pytorch.

This code is work for general object detection problem. not for (oriented) text detection problem. I will probably update to handle oriented bounding box as soon as possible :)

[How to use]

you need dataset.

dataset structure is..

/train/0.jpg, /train/0.txt, /valid/0.jpg, /valid/0.txt, ....
0.txt contain position and label of objects like below

(xmin, ymin, xmax, ymax, label)

1273.0 935.0 1407.0 1017.0 v1

911.0 893.0 979.0 953.0 v1

984.0 889.0 1053.0 948.0 v1
To encode label name to integer number, you should define labels in the 'class_lable_map.xlsx"
v1 1

v2 2

....
* start from 1. not from 0. 0 will be background (in the loss.py).

need some settings for dataset reader.

- see train.py. you can find some code for reading dataset
```
  'trainset = ListDataset(root="../train", gt_extension=".txt", labelmap_path="class_label_map.xlsx", is_train=True, transform=transform, input_image_size=512, num_crops=n_crops, original_img_size=2048)'
  
```
- you should set the 'input_image_size' and 'original_img_size'. 'input_image_size' is size of (cropped) image for train. And 'original_img_size' is size of (original) image. I made this parameter to handle high resolution image. if you don't need crop function, -1 for num_crops.
Train with your dataset!
you should define some parameter like learning rate, which optimizer to use, size of batch etc.

Owner

HotaekHan

HotaekHan

GitHub Repository

Primary QPDF source code and documentation

QPDF QPDF is a command-line tool and C++ library that performs content-preserving transformations on PDF files. It supports linearization, encryption,

2.2k Jan 04, 2023

EQFace: An implementation of EQFace: A Simple Explicit Quality Network for Face Recognition

EQFace: A Simple Explicit Quality Network for Face Recognition The first face recognition network that generates explicit face quality online.

141 Dec 31, 2022

This repository contains codes on how to handle mouse event using OpenCV

Handling-Mouse-Click-Events-Using-OpenCV This repository contains codes on how t

3 Feb 15, 2022

Text to QR-CODE

QR CODE GENERATO USING PYTHON Author : RAFIK BOUDALIA. Installation Use the package manager pip to install foobar. pip install pyqrcode Usage from tki

2 Oct 13, 2021

Python bindings for JIGSAW: a Delaunay-based unstructured mesh generator.

JIGSAW: An unstructured mesh generator JIGSAW is an unstructured mesh generator and tessellation library; designed to generate high-quality triangulat

26 Dec 13, 2022

OCR engine for all the languages

Description kraken is a turn-key OCR system optimized for historical and non-Latin script material. kraken's main features are: Fully trainable layout

431 Jan 04, 2023

Automatically fishes for you while you are afk :)

Dank-memer-afk-script A simple and quick way to make easy money in Dank Memer! How to use Open a discord channel which has the Dank Memer bot enabled.

9 Nov 11, 2022

This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

422 Jan 03, 2023

Detect textlines in document images

Textline Detection Detect textlines in document images Introduction This tool performs border, region and textline detection from document image data

70 Jun 30, 2022

Contextual speed detection for python

Speed Prediction using Optical Flow and 2D CNN About the challenge: Comma.AI Speed Challenge This challenge was developed by Comma.AI to predict the s

2 Dec 16, 2021

A simple document layout analysis using Python-OpenCV

Run the application: python main.py *Note: For first time running the application, create a folder named "output". The application is a simple documen

109 Dec 12, 2022

Aloception is a set of package for computer vision: aloscene, alodataset, alonet.

Aloception is a set of package for computer vision: aloscene, alodataset, alonet.

86 Dec 28, 2022

CellProfiler is a open-source application for biological image analysis

CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automaticall

732 Dec 23, 2022

Pixie - A full-featured 2D graphics library for Python

Pixie - A full-featured 2D graphics library for Python Pixie is a 2D graphics library similar to Cairo and Skia. pip install pixie-python Features: Ty

65 Dec 30, 2022

Give a solution to recognize MaoYan font.

猫眼字体识别该 github repo 在于帮助xjtlu的同学们识别猫眼的扭曲字体。已经打包上传至 pypi ，可以使用 pip 直接安装。猫眼字体的识别不出来的原理与解决思路在采茶上使用方法： import MaoYanFontRecognize

4 Jun 30, 2022

IMGUR5K handwriting set. It is a handwritten in-the-wild dataset, which contains challenging real world handwritten samples from different writers.The dataset is shared as a set of image urls with annotations. This code downloads the images and verifies the hash to the image to avoid data contamination.

IMGUR5K Handwriting Dataset To run the code for downloading the urls and generate corresponding annotations : Usage: python download_imgur5k.py --data

213 Dec 26, 2022

👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Quick Info this library tries to solve language detection of very short words and phrases, even shorter than tweets makes use of both statistical and

532 Dec 28, 2022

text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

3.3k Dec 30, 2022

[python3.6] 运用tf实现自然场景文字检测,keras/pytorch实现ctpn+crnn+ctc实现不定长场景文字OCR识别

本文基于tensorflow、keras/pytorch实现对自然场景的文字检测及端到端的OCR中文文字识别 update20190706 为解决本项目中对数学公式预测的准确性，做了其他的改进和尝试，效果还不错，https://github.com/xiaofengShi/Image2Katex 希

2.7k Dec 25, 2022

Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

PPE ✨ Repository for our CVPR'2022 paper: Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-

34 Nov 28, 2022