A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Last update: Jan 03, 2023

Overview

SVHNClassifier-PyTorch

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

If you're interested in C++ inference, move HERE

Results

Steps	GPU	Batch Size	Learning Rate	Patience	Decay Step	Decay Rate	Training Speed (FPS)	Accuracy
54000	GTX 1080 Ti	512	0.16	100	625	0.9	~1700	95.65%

Sample

$ python infer.py -c=./logs/model-54000.pth ./images/test-75.png
length: 2
digits: 7 5 10 10 10

$ python infer.py -c=./logs/model-54000.pth ./images/test-190.png
length: 3
digits: 1 9 0 10 10

Loss

Requirements

Python 3.6
torch 1.0
torchvision 0.2.1
visdom
```
$ pip install visdom
```

h5py

In Ubuntu:
$ sudo apt-get install libhdf5-dev
$ sudo pip install h5py

protobuf
```
$ pip install protobuf
```
lmdb
```
$ pip install lmdb
```

Setup

Clone the source code

$ git clone https://github.com/potterhsu/SVHNClassifier-PyTorch
$ cd SVHNClassifier-PyTorch

Download SVHN Dataset format 1

Extract to data folder, now your folder structure should be like below:

SVHNClassifier
    - data
        - extra
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - test
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - train
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat

Usage

(Optional) Take a glance at original images with bounding boxes
```
Open `draw_bbox.ipynb` in Jupyter
```

Convert to LMDB format

$ python convert_to_lmdb.py --data_dir ./data

(Optional) Test for reading LMDBs

Open `read_lmdb_sample.ipynb` in Jupyter

Train

$ python train.py --data_dir ./data --logdir ./logs

Retrain if you need

$ python train.py --data_dir ./data --logdir ./logs_retrain --restore_checkpoint ./logs/model-100.pth

Evaluate

$ python eval.py --data_dir ./data ./logs/model-100.pth

Visualize

$ python -m visdom.server
$ python visualize.py --logdir ./logs

Infer

$ python infer.py --checkpoint=./logs/model-100.pth ./images/test1.png

Clean

$ rm -rf ./logs
or
$ rm -rf ./logs_retrain

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Related tags

Overview

SVHNClassifier-PyTorch

Results

Sample

Loss

Requirements

Setup

Usage

Owner

Potter Hsu

PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

Temporal-Relational CrossTransformers

PyJokes - Joking around with Python library pyjokes

Using Language Model to Bootstrap Human Activity Recognition Ambient Sensors Based in Smart Homes

Source Code for Simulations in the Publication "Can the brain use waves to solve planning problems?"

Implementation of ICCV19 Paper "Learning Two-View Correspondences and Geometry Using Order-Aware Network"

Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

Boosted neural network for tabular data

D-NeRF: Neural Radiance Fields for Dynamic Scenes

Bayesian Generative Adversarial Networks in Tensorflow

Official implementation of Sparse Transformer-based Action Recognition

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

This repository contains tutorials for the py4DSTEM Python package

Fuzzification helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-art fuzzing techniques

Code accompanying "Dynamic Neural Relational Inference" from CVPR 2020

Mining-the-Social-Web-3rd-Edition - The official online compendium for Mining the Social Web, 3rd Edition (O'Reilly, 2018)

Moer Grounded Image Captioning by Distilling Image-Text Matching Model

Iris prediction model is used to classify iris species created julia's DecisionTree, DataFrames, JLD2, PlotlyJS and Statistics packages.

End-to-end Temporal Action Detection with Transformer. [Under review]

🏃‍♀️ A curated list about human motion capture, analysis and synthesis.