Classical OCR DCNN reproduction based on PaddlePaddle framework.

Overview

Paddle-SVHN

Classical OCR DCNN reproduction based on PaddlePaddle framework.

This project reproduces Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks based on the paddlepaddle framework and participates in the Baidu paper reproduction competition. The AIStudio link is provided as follow:

link

Results_Compared

SVHN Dataset

Methods Model Download Batch Size Learning Rate Patience Decay Step Decay Rate Training Speed (FPS) Accuracy
Pytorch_SVHN torch_model 512 0.16 100 625 0.9 ~1700 95.65%
PaddlePaddle_SVHN paddle_model 1024 0.01 100 625 0.9 ~1700 95.65%

Introduction

The main idea of this exercise is to study the evolvement of the state of the art and main work along topic of visual attention model. There are two datasets that are studied: augmented MNIST and SVHN. The former dataset focused on canonical problem  —  handwritten digits recognition, but with cluttering and translation, the latter focus on real world problem  —  street view house number (SVHN) transcription. In this exercise, the following papers are studied in the way of developing a good intuition to choose a proper model to tackle each of the above challenges.

For more detail, please refer to this blog

Recommended environment

Python 3.6+
paddlepaddle-gpu 2.0.2
nccl 2.0+
editdistance
visdom
h5py
protobuf
lmdb

Install

Install env

Install paddle following the official tutorial.

pip install visdom
pip install h5py
pip install protobuf
pip install lmdb

Dataset

  1. Download SVHN Dataset format 1

  2. Extract to data folder, now your folder structure should be like below:

    SVHNClassifier
        - data
            - extra
                - 1.png 
                - 2.png
                - ...
                - digitStruct.mat
            - test
                - 1.png 
                - 2.png
                - ...
                - digitStruct.mat
            - train
                - 1.png 
                - 2.png
                - ...
                - digitStruct.mat
    

Usage

  1. (Optional) Take a glance at original images with bounding boxes

    Open `draw_bbox.ipynb` in Jupyter
    
  2. Convert to LMDB format

    $ python convert_to_lmdb.py --data_dir ./data
    
  3. (Optional) Test for reading LMDBs

    Open `read_lmdb_sample.ipynb` in Jupyter
    
  4. Train

    $ python train.py --data_dir ./data --logdir ./logs
    
  5. Retrain if you need

    $ python train.py --data_dir ./data --logdir ./logs_retrain --restore_checkpoint ./logs/model-100.pth
    
  6. Evaluate

    $ python eval.py --data_dir ./data ./logs/model-100.pth
    
  7. Visualize

    $ python -m visdom.server
    $ python visualize.py --logdir ./logs
    
  8. Infer

    $ python infer.py --checkpoint=./logs/model-100.pth ./images/test1.png
    
  9. Clean

    $ rm -rf ./logs
    or
    $ rm -rf ./logs_retrain
    
Owner
Dreams Are Messages From The Deep🪐
Exploring whether attention is necessary for vision transformers

Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet Paper/Report TL;DR We replace the attention layer in a v

Luke Melas-Kyriazi 461 Jan 07, 2023
Histology images query (unsupervised)

110-1-NTU-DBME5028-Histology-images-query Final Project: Histology images query (unsupervised) Kaggle: https://www.kaggle.com/c/histology-images-query

1 Jan 05, 2022
A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks A Transformer-based library for SocialNLP classification tasks. Currently

298 Jan 07, 2023
Reinforcement learning algorithms in RLlib

raylab Reinforcement learning algorithms in RLlib and PyTorch. Installation pip install raylab Quickstart Raylab provides agents and environments to b

Ângelo 50 Sep 08, 2022
The object detection pipeline is based on Ultralytics YOLOv5

AYOLOv2 The main goal of this repository is to rewrite the object detection pipeline with a better code structure for better portability and adaptabil

153 Dec 22, 2022
🇰🇷 Text to Image in Korean

KoDALLE Utilizing pretrained language model’s token embedding layer and position embedding layer as DALLE’s text encoder. Background Training DALLE mo

HappyFace 74 Sep 22, 2022
Official implementation of SIGIR'2021 paper: "Sequential Recommendation with Graph Neural Networks".

SURGE: Sequential Recommendation with Graph Neural Networks This is our TensorFlow implementation for the paper: Sequential Recommendation with Graph

FIB LAB, Tsinghua University 53 Dec 26, 2022
Companion repo of the UCC 2021 paper "Predictive Auto-scaling with OpenStack Monasca"

Predictive Auto-scaling with OpenStack Monasca Giacomo Lanciano*, Filippo Galli, Tommaso Cucinotta, Davide Bacciu, Andrea Passarella 2021 IEEE/ACM 14t

Giacomo Lanciano 0 Dec 07, 2022
A concise but complete implementation of CLIP with various experimental improvements from recent papers

x-clip (wip) A concise but complete implementation of CLIP with various experimental improvements from recent papers Install $ pip install x-clip Usag

Phil Wang 515 Dec 26, 2022
To build a regression model to predict the concrete compressive strength based on the different features in the training data.

Cement-Strength-Prediction Problem Statement To build a regression model to predict the concrete compressive strength based on the different features

Ashish Kumar 4 Jun 11, 2022
Semantic Segmentation for Aerial Imagery using Convolutional Neural Network

This repo has been deprecated because whole things are re-implemented by using Chainer and I did refactoring for many codes. So please check this newe

Shunta Saito 27 Sep 23, 2022
Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! Very tiny! Stock Market Financial Technical Analysis Python library . Quant Trading automation or cryptocoin exchange

MyTT Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! to Stock Market Financial Technical Analysis Python

dev 34 Dec 27, 2022
Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images

Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images [ICCV 2021] © Mahmood Lab - This code is made avail

Mahmood Lab @ Harvard/BWH 63 Dec 01, 2022
Implements Gradient Centralization and allows it to use as a Python package in TensorFlow

Gradient Centralization TensorFlow This Python package implements Gradient Centralization in TensorFlow, a simple and effective optimization technique

Rishit Dagli 101 Nov 01, 2022
Supplementary code for the experiments described in the 2021 ISMIR submission: Leveraging Hierarchical Structures for Few Shot Musical Instrument Recognition.

Music Trees Supplementary code for the experiments described in the 2021 ISMIR submission: Leveraging Hierarchical Structures for Few Shot Musical Ins

Hugo Flores García 32 Nov 22, 2022
Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech This repository is the official implementation of "Meta-TTS: Meta-Learning for Few

Sung-Feng Huang 128 Dec 25, 2022
GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

22 Dec 12, 2022
In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results from as little as 16 seconds of target data.

Neural Instrument Cloning In this project we combine techniques from neural voice cloning and musical instrument synthesis to achieve good results fro

Erland 127 Dec 23, 2022
Code for the paper "Improved Techniques for Training GANs"

Status: Archive (code is provided as-is, no updates expected) improved-gan code for the paper "Improved Techniques for Training GANs" MNIST, SVHN, CIF

OpenAI 2.2k Jan 01, 2023
A Python package for faster, safer, and simpler ML processes

Bender 🤖 A Python package for faster, safer, and simpler ML processes. Why use bender? Bender will make your machine learning processes, faster, safe

Otovo 6 Dec 13, 2022