Optical character recognition for Japanese text, with the main focus being Japanese manga

Last update: Jan 01, 2023

Overview

Manga OCR

Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Transformers' Vision Encoder Decoder framework.

Manga OCR can be used as a general purpose printed Japanese OCR, but its main goal was to provide a high quality text recognition, robust against various scenarios specific to manga:

both vertical and horizontal text
text with furigana
text overlaid on images
wide variety of fonts and font styles
low quality images

Unlike many OCR models, Manga OCR supports recognizing multi-line text in a single forward pass, so that text bubbles found in manga can be processed at once, without splitting them into lines.

Code for training and synthetic data generation will be released soon.

Installation

You need Python 3.6, 3.7, 3.8 or 3.9. Unfortunately, PyTorch does not support Python 3.10 yet.

If you want to run with GPU, install PyTorch as described here, otherwise this step can be skipped.

Run in command line:

pip3 install manga-ocr

Usage

Python API

from manga_ocr import MangaOcr

mocr = MangaOcr()
text = mocr('/path/to/img')

import PIL.Image

from manga_ocr import MangaOcr

mocr = MangaOcr()
img = PIL.Image.open('/path/to/img')
text = mocr(img)

Running in the background

Manga OCR can run in the background and process new images as they appear.

You might use a tool like ShareX to manually capture a region of the screen and let the OCR read it either from the system clipboard, or a specified directory. By default, Manga OCR will write recognized text to clipboard, from which it can be read by a dictionary like Yomichan. Reading images from clipboard works only on Windows and macOS, on Linux you should read from a directory instead.

Your full setup for reading manga in Japanese with a dictionary might look like this:

capture region with ShareX -> write image to clipboard -> Manga OCR -> write text to clipboard -> Yomichan

manga_ocr_demo.mp4

To read images from clipboard and write recognized texts to clipboard, run in command line:
```
manga_ocr
```
To read images from ShareX's screenshot folder, run in command line:
```
manga_ocr "/path/to/sharex/screenshot/folder"
```

When running for the first time, downloading the model (~400 MB) might take a few minutes. The OCR is ready to use after OCR ready message appears in the logs.

To see other options, run in command line:
```
manga_ocr --help
```

If manga_ocr doesn't work, you might also try replacing it with python -m manga_ocr.

Usage tips

OCR supports multi-line text, but the longer the text, the more likely some errors are to occur. If the recognition failed for some part of a longer text, you might try to run it on a smaller portion of the image.
The model was trained specifically to handle manga well, but should do a decent job on other types of printed text, such as novels or video games. It probably won't be able to handle handwritten text though.
The model always attempts to recognize some text on the image, even if there is none. Because it uses a transformer decoder (and therefore has some understanding of the Japanese language), it might even "dream up" some realistically looking sentences! This shouldn't be a problem for most use cases, but it might get improved in the next version.

Examples

Here are some cherry-picked examples showing the capability of the model.

image	Manga OCR result
	素直にあやまるしか
	立川で見た〝穴〟の下の巨大な眼は：
	実戦剣術も一流です
	第３０話重苦しい闇の奥で静かに呼吸づきながら
	よかったじゃないわよ！何逃げてるのよ！！早くあいつを退治してよ！
	ぎゃっ
	ピンポーーン
	ＬＩＮＫ！私達７人の力でガノンの塔の結界をやぶります
	ファイアパンチ
	少し黙っている
	わかるかな〜？
	警察にも先生にも町中の人達に！！

Acknowledgments

This project was done with the usage of Manga109-s dataset.

Optical character recognition for Japanese text, with the main focus being Japanese manga

Related tags

Overview

Manga OCR

Installation

Usage

Python API

Running in the background

Usage tips

Examples

Acknowledgments

Owner

Maciej Budyś

OCR system for Arabic language that converts images of typed text to machine-encoded text.

Optical character recognition for Japanese text, with the main focus being Japanese manga

The world's simplest facial recognition api for Python and the command line

This repository provides train＆test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

Assignment work with webcam

📷 Face Recognition using Haar-Cascade Classifier, OpenCV, and Python

Python-based tools for document analysis and OCR

textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention

caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

This is a Computer vision package that makes its easy to run Image processing and AI functions. At the core it uses OpenCV and Mediapipe libraries.

This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

Rubik's Cube in pygame with OpenGL

YOLOv5 in DOTA with CSL_label.(Oriented Object Detection)（Rotation Detection）（Rotated BBox）

Automatically download multiple papers by keywords in CVPR

This is a repository to learn and get more computer vision skills, make robotics projects integrating the computer vision as a perception tool and create a lot of awesome advanced controllers for the robots of the future.

Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

Automatic Number Plate Recognition (ANPR) is a highly accurate system capable of reading vehicle number plates without human intervention

CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Distilling Knowledge via Knowledge Review, CVPR 2021