Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Overview

Scene-Text-Detection-with-SPCNET

Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.08605] with tensorflow.

参考代码

网络实现主要借鉴Keras版本的Mask-RCNN,训练数据接口参考了argman/EAST.论文作者在知乎的文章介绍SPCNet.

训练

1、训练数据准备

训练数据放在data/下,训练数据准备在data/icdar.py:

data

icdar2017

Annotaions //image_1.txt
JPEGImages //image_1.jpg
train.txt //存储训练图片的名称,例如:image_1

2、参数修改

修改./train.py中的学习率、batch、模型存储路径等参数,如果需要调整网络参数,在nets/config.py中修改。

3、执行训练

python train.py

代码运行环境:Python2.7 tensorflow-gpu1.13 单张1080Ti

测试

修改demo.py中的模型文件夹路径、测试图片路径,然后执行python demo.py

测试结果:论文中还有一些地方我也不确定,因此目前没有在公开数据集测试。值得注意的是,按照原文中的训练说明,最好在多卡上训练,请加大你的batch size.

值得注意的地方

1、global text segmentation(gts)的训练

计算gts训练时损失函数时,我采用的方法是将feature pyramid的各个level产生的gts分别与全局mask gt计算softmax loss,然后取平均作为Loss_gts。因为没找到与原文关于这一块的描述,因此可能是其他的计算方法:每个level准备不同的mask_gt、将多个level的gts预测融合计算loss等等。感兴趣的可以去问问作者或者自己试试。

2、实现Rescore 时gts的选取

计算predict box对应的pyramid level,然后选取对应的gts计算。还有一种思路是:融合P2,P3,P4,P5的gts,然后计算box rescore.

3、Bounding Box的生成

MASK RCNN中是先对输出的box进行阈值过滤以及NMS,然后将剩余的回归之后的box对应的rois送入mask branch计算mask,目的是减少计算量同时获得更准确的mask。SPCNet为了减小FP与FN,对Inference流程做了修改:先对模型输出的box与mask进行Rescore,然后经过threshold filter,再对剩下的mask求Bounding Box,然后利用Poly NMS减少重叠,输出剩下的。

在目前代码(nets/models.py utils.py)里:是先对模型输出的box与mask进行Rescore,然后经过threshold filter与NMS,再对剩下的mask求Bounding Box,然后直接输出。

A community-supported supercharged version of paperless: scan, index and archive all your physical documents

Paperless-ngx Paperless-ngx is a document management system that transforms your physical documents into a searchable online archive so you can keep,

5.2k Jan 04, 2023
Image Smoothing and Blurring Using OpenCV

Image-Smoothing-and-Blurring-Using-OpenCV This repository contains codes for performing image smoothing and blurring using OpenCV. There are different

Happy N. Monday 3 Feb 15, 2022
This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

SCUT-CTW1500 Datasets We have updated annotations for both train and test set. Train: 1000 images [images][annos] Additional point annotation for each

Yuliang Liu 600 Dec 18, 2022
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

81 Dec 01, 2022
Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des

Marek Mauder 127 Dec 03, 2022
3点クリックで円を指定し、極座標変換を行うサンプルプログラム

click-warpPolar 3点クリックで円を指定し、極座標変換を行うサンプルプログラムです。 Requirements OpenCV 3.4.2 or Later Usage 実行方法は以下です。 起動後、マウスで3点をクリックし円を指定してください。 python click-warpPol

KazuhitoTakahashi 17 Dec 30, 2022
A buffered and threaded wrapper for the OpenCV VideoCapture object. Can speed up video decoding significantly. Supports

A buffered and threaded wrapper for the OpenCV VideoCapture object. Can speed up video decoding significantly. Supports "with"-syntax.

Patrice Matz 0 Oct 30, 2021
The papers published in top-tier AI conferences in recent years.

AI-conference-papers The papers published in top-tier AI conferences in recent years. Paper table AAAI ICLR CVPR ICML ICCV ECCV NIPS 2019 ✔️ ✔️ ✔️ ✔️

Jinbae Park 6 Dec 09, 2022
一款基于Qt与OpenCV的仿真数字示波器

一款基于Qt与OpenCV的仿真数字示波器

郭赟 4 Nov 02, 2022
Computer vision applications project (Flask and OpenCV)

Computer Vision Applications Project This project is at it's initial phase. This is all about the implementation of different computer vision techniqu

Suryam Thapa 1 Jan 26, 2022
A dataset handling library for computer vision datasets in LOST-fromat

A dataset handling library for computer vision datasets in LOST-fromat

8 Dec 15, 2022
基于图像识别的开源RPA工具,理论上可以支持所有windows软件和网页的自动化

SimpleRPA 基于图像识别的开源RPA工具,理论上可以支持所有windows软件和网页的自动化 简介 SimpleRPA是一款python语言编写的开源RPA工具(桌面自动控制工具),用户可以通过配置yaml格式的文件,来实现桌面软件的自动化控制,简化繁杂重复的工作,比如运营人员给用户发消息,

Song Hui 7 Jun 26, 2022
Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

CRAFT: Character-Region Awareness For Text detection Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector | Paper |

188 Dec 28, 2022
graph learning code for ogb

The final code for OGB Installation Requirements: ogb=1.3.1 torch=1.7.0 torch-geometric=1.7.0 torch-scatter=2.0.6 torch-sparse=0.6.9 Baseline models T

PierreHao 20 Nov 10, 2022
Textboxes_plusplus implementation with Tensorflow (python)

TextBoxes++-TensorFlow TextBoxes++ re-implementation using tensorflow. This project is greatly inspired by slim project And many functions are modifie

81 Dec 07, 2022
Code for the paper: Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution

Fusformer Code for the paper: "Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution" Plateform Python 3.8.5 + Pytor

Jin-Fan Hu (胡锦帆) 11 Dec 12, 2022
pyntcloud is a Python library for working with 3D point clouds.

pyntcloud is a Python library for working with 3D point clouds.

David de la Iglesia Castro 1.2k Jan 07, 2023
TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection Introduction The code and trained models of: TextField: Learning A Deep

Yukang Wang 101 Dec 12, 2022
EQFace: An implementation of EQFace: A Simple Explicit Quality Network for Face Recognition

EQFace: A Simple Explicit Quality Network for Face Recognition The first face recognition network that generates explicit face quality online.

DeepCam Shenzhen 141 Dec 31, 2022
Virtual Zoom Gesture using OpenCV

Virtual_Zoom_Gesture I have created a virtual zoom gesture where we can Zoom in and Zoom out any image and even we can move that image anywhere on the

Mudit Sinha 2 Dec 26, 2021