Classical OCR DCNN reproduction based on PaddlePaddle framework.

Last update: Nov 12, 2021

Related tags

Overview

Paddle-SVHN

Classical OCR DCNN reproduction based on PaddlePaddle framework.

This project reproduces Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks based on the paddlepaddle framework and participates in the Baidu paper reproduction competition. The AIStudio link is provided as follow:

link

Results_Compared

SVHN Dataset

Methods	Model Download	Batch Size	Learning Rate	Patience	Decay Step	Decay Rate	Training Speed (FPS)	Accuracy
Pytorch_SVHN	torch_model	512	0.16	100	625	0.9	~1700	95.65%
PaddlePaddle_SVHN	paddle_model	1024	0.01	100	625	0.9	~1700	95.65%

Introduction

The main idea of this exercise is to study the evolvement of the state of the art and main work along topic of visual attention model. There are two datasets that are studied: augmented MNIST and SVHN. The former dataset focused on canonical problem — handwritten digits recognition, but with cluttering and translation, the latter focus on real world problem — street view house number (SVHN) transcription. In this exercise, the following papers are studied in the way of developing a good intuition to choose a proper model to tackle each of the above challenges.

For more detail, please refer to this blog

Recommended environment

Python 3.6+
paddlepaddle-gpu 2.0.2
nccl 2.0+
editdistance
visdom
h5py
protobuf
lmdb

Install

Install env

Install paddle following the official tutorial.

pip install visdom
pip install h5py
pip install protobuf
pip install lmdb

Dataset

Download SVHN Dataset format 1

Extract to data folder, now your folder structure should be like below:

SVHNClassifier
    - data
        - extra
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - test
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - train
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat

Usage

(Optional) Take a glance at original images with bounding boxes
```
Open `draw_bbox.ipynb` in Jupyter
```

Convert to LMDB format

$ python convert_to_lmdb.py --data_dir ./data

(Optional) Test for reading LMDBs

Open `read_lmdb_sample.ipynb` in Jupyter

Train

$ python train.py --data_dir ./data --logdir ./logs

Retrain if you need

$ python train.py --data_dir ./data --logdir ./logs_retrain --restore_checkpoint ./logs/model-100.pth

Evaluate

$ python eval.py --data_dir ./data ./logs/model-100.pth

Visualize

$ python -m visdom.server
$ python visualize.py --logdir ./logs

Infer

$ python infer.py --checkpoint=./logs/model-100.pth ./images/test1.png

Clean

$ rm -rf ./logs
or
$ rm -rf ./logs_retrain

Classical OCR DCNN reproduction based on PaddlePaddle framework.

Related tags

Overview

Paddle-SVHN

Results_Compared

Introduction

Recommended environment

Install

Install env

Dataset

Usage

Owner

Whisper is a file-based time-series database format for Graphite.

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Official implementation of the paper "Steganographer Detection via a Similarity Accumulation Graph Convolutional Network"

Editing a classifier by rewriting its prediction rules

This repository stores the code to reproduce the results published in "TiWS-iForest: Isolation Forest in Weakly Supervised and Tiny ML scenarios"

A python tutorial on bayesian modeling techniques (PyMC3)

Code for "Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance" at NeurIPS 2021

This repository contains code for the paper "Decoupling Representation and Classifier for Long-Tailed Recognition", published at ICLR 2020

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings

내가 보려고 정리한 <프로그래밍 기초 Ⅰ> / organized for me

Implementation of Memformer, a Memory-augmented Transformer, in Pytorch

An All-MLP solution for Vision, from Google AI

"Learning and Analyzing Generation Order for Undirected Sequence Models" in Findings of EMNLP, 2021

Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models Benchmark and Efficient Evaluation

docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Simultaneous NMT/MMT framework in PyTorch

Code to reproduce the results in "Visually Grounded Reasoning across Languages and Cultures", EMNLP 2021.

LightningFSL: Pytorch-Lightning implementations of Few-Shot Learning models.

Create Own QR code with Python