Natural language detection

Last update: Jan 02, 2023

Overview

Detect the language of text.

What’s so cool about franc?

franc can support more languages^(†) than any other library
franc is packaged with support for 82, 187, or 406 languages
franc has a CLI

† - Based on the UDHR, the most translated document in the world.

What’s not so cool about franc?

franc supports many languages, which means it’s easily confused on small samples. Make sure to pass it big documents to get reliable results.

Install

npm:

npm install franc

This installs the franc package, with support for 187 languages (languages which have 1 million or more speakers). franc-min (82 languages, 8m or more speakers) and franc-all (all 406 possible languages) are also available. Finally, use franc-cli to install the CLI.

Browser builds for franc-min, franc, and franc-all are available on GitHub Releases.

Use

var franc = require('franc')

franc('Alle menslike wesens word vry') // => 'afr'
franc('এটি একটি ভাষা একক IBM স্ক্রিপ্ট') // => 'ben'
franc('Alle menneske er fødde til fridom') // => 'nno'

franc('') // => 'und' (language code that stands for undetermined)

// You can change what’s too short (default: 10):
franc('the') // => 'und'
franc('the', {minLength: 3}) // => 'sco'

`.all`

console.log(franc.all('O Brasil caiu 26 posições'))

Yields:

[ [ 'por', 1 ],
  [ 'src', 0.8797557538750587 ],
  [ 'glg', 0.8708313762329732 ],
  [ 'snn', 0.8633161108501644 ],
  [ 'bos', 0.8172851103804604 ],
  ... 116 more items ]

`only`

console.log(franc.all('O Brasil caiu 26 posições', {only: ['por', 'spa']}))

Yields:

[ [ 'por', 1 ], [ 'spa', 0.799906059182715 ] ]

`ignore`

console.log(franc.all('O Brasil caiu 26 posições', {ignore: ['src', 'glg']}))

Yields:

[ [ 'por', 1 ],
  [ 'snn', 0.8633161108501644 ],
  [ 'bos', 0.8172851103804604 ],
  [ 'hrv', 0.8107092531705026 ],
  [ 'lav', 0.810239549084077 ],
  ... 114 more items ]

CLI

Install:

npm install franc-cli --global

Use:

CLI to detect the language of text

Usage: franc [options] <string>

Options:

  -h, --help                    output usage information
  -v, --version                 output version number
  -m, --min-length <number>     minimum length to accept
  -o, --only <string>           allow languages
  -i, --ignore <string>         disallow languages
  -a, --all                     display all guesses

Usage:

# output language
$ franc "Alle menslike wesens word vry"
# afr

# output language from stdin (expects utf8)
$ echo "এটি একটি ভাষা একক IBM স্ক্রিপ্ট" | franc
# ben

# ignore certain languages
$ franc --ignore por,glg "O Brasil caiu 26 posições"
# src

# output language from stdin with only
$ echo "Alle mennesker er født frie og" | franc --only nob,dan
# nob

Supported languages

Package	Languages	Speakers
`franc-min`	82	8M or more
`franc`	187	1M or more
`franc-all`	406	-

Language code

Note that franc returns ISO 639-3 codes (three letter codes). Not ISO 639-1 or ISO 639-2. See also GH-10 and GH-30.

To get more info about the languages represented by ISO 639-3, use iso-639-3. There is also an index available to map ISO 639-3 to ISO 639-1 codes, iso-639-3/to-1.json, but note that not all 639-3 codes can be represented in 639-1.

Ports

Franc has been ported to several other programming languages.

Elixir — paasaa
Erlang — efranc
Go — franco, whatlanggo
R — franc
Rust — whatlang-rs
Dart — francd

The works franc is derived from have themselves also been ported to other languages.

Derivation

Franc is a derivative work from guess-language (Python, LGPL), guesslanguage (C++, LGPL), and Language::Guess (Perl, GPL). Their creators granted me the rights to distribute franc under the MIT license: respectively, Kent S. Johnson, Jacob R. Rideout, and Maciej Ceglowski.

License

MIT © Titus Wormer

Comments

Add support for BCP 47 and output IANA language subtags
By default, Franc returns ISO-639-3 three-letter language tags, as listed in the Supported Languages table.

We would like Franc to alternatively support outputting IANA language subtags as an option, in compliance with the W3C recommendation for specifying the value of the lang attribute in HTML (and the xml:lang attribute in XML) documents.

(Two- and three-letter) IANA language codes are used as the primary language subtags in the language tag syntax as defined by the IETF’s BCP 47, which may be further specified by adding subtags for “extended language”, script, region, dialect variants, etc. (RFC 5646 describes the syntax in full). The addition of such more fine-grained secondary qualifiers are, I guess, out of Franc’s scope, but it would be very helpful nevertheless when Franc would be able to at least return the IANA primary language tags, which suffice, if used stand-alone, to be still in compliance with the spec.

On the Web — as the IETF and W3C agree — IANA language subtags and BCP 47 seem to be the de facto industry standard (at least more so than ISO 639-3). Moreover, the naming convention for TeX hyphenation pattern files (such as used by i.a. OpenOffice) use ISO-8859-2 codes, which overlap better with IANA language subtags, too.

If Franc would output IANA language subtags, then the return values could be used as-is, and without any further post-processing or re-mapping, in, for example CSS rules, specifying hyphenation:

@media print { :lang(nl) { hyphenate-patterns: url(hyphenation/hyph-nl.pat); } }

@wooorm :

What is the rationale for Franc to default on ISO-639-3 (only)? Is it a “better” standard, and, if so, why?

If you would agree it would be a good idea for Franc to support BCP 47 and outputting IANA language subtags as an available option, then how would you prefer it to be implemented and accept a PR? (We’d happily contribute.) Would it suffice to add and map them in data/support.json?
opened by rhythmus 12
Reference of source document

It seems that NONE of the languages have sources to the data.json 3-gram model. Is it possible to provide document sources for each language such that we can review the material, and possibly generate 2-grams and 4-grams (or 2/3 or 3/4 or 2/3/4-gram combos) models?

opened by DonaldTsang 10
Problems with franc and Uzbek (uzb, uzn, uzs)

I have implemented and found that uzbek (my native) language is not working properly. I tested with large data-sets. Can I make contribution? Also, there is some issue on naming convention of language code here, 'uzn' (Nothern Uzbek) language has never been in linguistics. But I wonder how it became ISO 639 identifier.

opened by muminoff 10

BUG: Basic tests show that franc is extremely inaccurate

> franc.all('Hola amiga', { only: [ 'eng', 'spa', 'por', 'ita', 'fra' ] })
[
  [ 'spa', 1 ],
  [ 'ita', 0.9323770491803278 ],
  [ 'fra', 0.5942622950819672 ],
  [ 'por', 0.5368852459016393 ],
  [ 'eng', 0 ]
]
> franc.all('Hola mi amiga', { only: [ 'eng', 'spa', 'por', 'ita', 'fra' ] })
[
  [ 'ita', 1 ],
  [ 'spa', 0.6840958605664488 ],
  [ 'fra', 0.6318082788671024 ],
  [ 'por', 0.08714596949891062 ],
  [ 'eng', 0 ]
]
> franc.all('Ciao amico!', { only: [ 'eng', 'spa', 'por', 'ita', 'fra' ] })
[
  [ 'spa', 1 ],
  [ 'por', 0.9940758293838863 ],
  [ 'ita', 0.9170616113744076 ],
  [ 'eng', 0.6232227488151658 ],
  [ 'fra', 0.46563981042654023 ]
]

These are all completely incorrect accuracies.

opened by niftylettuce 8

Make MAX_LENGTH an options parameter

Hello!

First of all, thank you for this wonderful project.

It seems that franc limits the text sample to analyse to a hard-coded 2048 chars in these lines

https://github.com/wooorm/franc/blob/5842af9c1a74ffb47ebe3307bfc61cf29b6e842e/packages/franc/index.js#L21 https://github.com/wooorm/franc/blob/5842af9c1a74ffb47ebe3307bfc61cf29b6e842e/packages/franc/index.js#L93

Could this MAX_LENGTH const be part of options? It seems to me this is due to speed reasons, but I care more about accuracy than speed.

I am reading web pages that have parts in more than one language, and need to detect the most used language, but maybe the first 2048 characters are in the less used language.

Sorry if I misinterpreted the code and is not doing what I thought

opened by porkopek 8
Explain the output of 'all'

The results of 'all' consist of the language code and a score number. I've guessed that the lowest number is the detected language, but what can be learned from the score number? Doesn't seem to be documented.

I'm looking to detect the language of job titles in English and French only (because Canada) and I was getting results all over the place using just franc(jobTitle) but whitelisting english and french then applying a threshold to the score I was able to tune in a much more accurate result (still a 3.92% error rate over 1020 job titles, but it was in the 25% range before the threshold). Is this a good use for the score or am I just getting lucky?

opened by stockholmux 8
Problems with latin alphabet languages

A term like yellow flicker beat suggest german, english (correct) quite far below.

Can you explain how this would work?

I would like to use franc in combination with a spell checker, first detecting the language and then looking up correct words with a spell checker using the identified language.

opened by djui 8
Some Japanese are detected as Chinese mandarin

Hi, I see something strange about Japanese detection,

if I put a translated text from google translate to Japanese: 裁判の周辺のラオスにUターンした元元兵士

the lib detects it and returns 'jpn', but if I put a Japanese text from yahoo japan or amazon japan: ここ最近、よく拡散されたつぶやきや画像をまとめてご紹介。気になるも

it returns 'cmn', does anyone know why?

opened by ThisIsRoy1 7
Consistency on ISO standards for easier integration.
Revisiting #10 I think its great that you support other languages not found in any of the ISO standards.

But to those that can be found, the fact that Franc sometimes returns the 2T and others the 2B , makes it really hard to map without huge lists.

For instance:

arm matches 2B for Armenian but not 2T nor 3 which are 'hye'

ces, on the other hand, matches 2T and 3 while 2B is 'cze'

So it makes for difficult integration with standards that you return one or the other without consistency.

I agree that with languages you wouldn't find, then we must find a solution and it is great! But for those that match, adhering to one or the other would be very helpful.

Thanks, best regards, Rafa.
opened by RafaPolit 6

Getting weird results

Hey @wooorm am I doing something wrong here?

> apps.forEach(app => console.log(franc(app.description), app.description))

eng A universal clipboard managing app that makes it easy to access your clipboard from anywhere on any device
fra 5EPlay CSGO Client
nob Open-source Markdown editor built for desktop
eng Communication tool to optimize the connection between people
vmw Wireless HDMI
eng An RSS and Atom feed aggregator
eng A work collaboration product that brings conversation to your files.
src Pristine Twitter app
dan A Simple Friendly Markdown Note.
nno An open source trading platform
eng A hackable text editor for the 21 st Century
eng One workspace open to all designers and developers
nya A place to work + a way to work
cat An experimental P2P browser
sco Focused team communications
sco Bitbloq is a tool to help children to learn and create programs for a microcontroller or robot, and to load them easily.
eng A simple File Encryption application for Windows. Encrypt your bits.
eng Markdown editor witch clarity +1
eng Text editor with the power or Markdown
eng Open-sourced note app for programmers
sco Web browser that automatically blocks ads and trackers
bug Facebook Messenger app
dan Markdown editor for Mac / Windows / Linux
fra Desktop build status notifications
sco Group chat for global teams
src Your rubik's cube solves
sco Orthodox web file manager with console and editor
cat Game development tools
sco RPG style coding application
deu Modern browser without tabs
eng Your personal galaxy of inspiration
sco A menubar/taskbar Gmail App for Windows, macOS and Linux.

opened by zeke 6

Inaccurate detection examples

Here are just a few inaccuracies I've come across testing this package:

franc('iphone unlocked') // returns 'ibb' instead of 'eng'
franc('new refrigerator') // returns 'dan' instead of 'eng'
franc('макбук копмьютер очень хороший') // returns 'kir' instead of 'rus'

opened by demisx 6

Improved accuracy for small documents
I'd like to play with patching franc, or making some alternative to it, that can detect the language of small documents much more accurately.

First of all is this something that could be interesting to merge into franc itself?

Secondly I'm almost clueless about language classification, could trying the following things make sense?

Storing more than 300 trigrams, maybe 400 or so.

Using quadgrams or bigrams rather than trigrams.

Extracting the trigrams from a longer and more diverse document than the UDHR.

From a shallow reading of this paper on n-grams it sounds to me like ngrams may be fundamentally not well suited for short documents because there just isn't enough data to reconstruct the top 300 or whatever ngrams reliably from that, maybe 🤔.

CLD3 seems to feed unigrams bigrams and trigrams to some neural network and that seems to work much better for smaller texts somehow, I'm not sure how or why, maybe that's the way to go.

Any other ideas that I should try?
opened by fabiospampinato 19
Probability normalization

Currently franc to me often returns a probability close to 1 for many languages, IMO all these probabilities should be normalized to add up to 1.

Also there seems to always be a language at the top with probability 1, this makes it difficult to judge how sure the "model" is about the detection, which would be another interesting point of data to have.

opened by fabiospampinato 3
Some Chinese sentences are detected as Japanese
sentence 1

特別推薦的必訪店家「ヤマシロヤ」，雖然不在阿美橫町上，但就位於JR上野站廣小路口對面

jpn 1 google translate result is Chinese correctly

sentence 2

特別推薦的必訪店家，雖然不在阿美橫町上，但就位於JR上野站廣小路口對面

cmn 1 google translate result is Chinese correctly

Sentence 1 almost are Chinese characters and contains 5 Katakana characters. But its result is jpn incorrectly.

Sentence 2 are Chinese characters fully, and its result is cmn correctly.

Maybe the result is related to #77
opened by kewang 3
Use languages' alphabets to make detection more accurate

Что это за язык? is a Russian sentence, which is detected as Bulgarian (bul 1, rus 0.938953488372093, mkd 0.9353197674418605). However, neither Bulgarian nor Macedonian have the letters э and ы in their alphabets.

Same with Чекаю цієї хвилини., which is Ukrainian, but is detected as Northern Uzbek with probability 1 whereas Ukrainian gets only 0.33999999999999997. However, the letters є and ї are used only in Ukrainian whereas the Uzbek Cyrillic alphabet doesn't include as many as five letters from this sentence, namely: ю, ц, і, є and ї.

I know that Franc is supposed to be not good with short input strings, but taking alphabets into account seems to be a promising way to improve the accuracy.

opened by thorn0 15

Releases(6.1.0)

6.1.0(Nov 23, 2022)
9a00ced Regenerate

284872d Update unicode to v15

0d248a8 Fix snn

Full Changelog: https://github.com/wooorm/franc/compare/6.0.0...6.1.0
Source code(tar.gz)
Source code(zip)
[email protected](Nov 23, 2022)

See 6.1.0
Source code(tar.gz)
Source code(zip)
[email protected](Nov 23, 2022)

See 6.1.0
Source code(tar.gz)
Source code(zip)
6.0.0(Aug 15, 2021)
017c33f 556d2a7 5ae9a9f 24e0496 Use ESM
// From CommonJS var franc = require('franc') franc('') franc.all('') // To ESM import {franc, francAll} from 'franc' franc('') francAll('')

Learn more about ESM in this guide

af47a96 4f9a958 ec59466 5024a47 647a5d4 Improve data

Source code(tar.gz)
Source code(zip)
[email protected](Aug 15, 2021)

See 6.0.0
Source code(tar.gz)
Source code(zip)
[email protected](Aug 15, 2021)

See 6.0.0
Source code(tar.gz)
Source code(zip)
[email protected](Aug 15, 2021)

See 6.0.0
Source code(tar.gz)
Source code(zip)
[email protected](Oct 12, 2020)
8eb3545 update meow

Source code(tar.gz)
Source code(zip)
[email protected](May 10, 2020)
c553076 Fix empty string seen as stdin

Source code(tar.gz)
Source code(zip)
5.0.0(Jan 27, 2020)
5c2b249 Update script expressions

012bf50 Regenerate

Source code(tar.gz)
Source code(zip)
[email protected](Jan 27, 2020)
5c2b249 Update script expressions

012bf50 Regenerate

Source code(tar.gz)
Source code(zip)
[email protected](Jan 27, 2020)
895e61a cli: update franc

e7f7b43 cli: update meow

Source code(tar.gz)
Source code(zip)
[email protected](Jan 27, 2020)
5c2b249 Update script expressions

012bf50 Regenerate

Source code(tar.gz)
Source code(zip)
4.1.1(Oct 28, 2019)
b64e364 Fix incorrect results for Japanese

Source code(tar.gz)
Source code(zip)
[email protected](Oct 28, 2019)

See [email protected]
Source code(tar.gz)
Source code(zip)
fr[email protected](Oct 28, 2019)

See [email protected]
Source code(tar.gz)
Source code(zip)
[email protected](Apr 30, 2019)
dbb8cfd Rename (dis)allow options to only and ignore

8320d96 Fix matching if a script can only be in one language

e830188 Add docs on how to use ISO 639-3 codes

3e647ae Regenerate

6dc7372 Fix incorrect udhr entry by removing rup

Source code(tar.gz)
Source code(zip)
[email protected](Apr 30, 2019)
dbb8cfd Rename (dis)allow options to only and ignore

8320d96 Fix matching if a script can only be in one language

e830188 Add docs on how to use ISO 639-3 codes

3e647ae Regenerate

6dc7372 Fix incorrect udhr entry by removing rup

Source code(tar.gz)
Source code(zip)
[email protected](Apr 30, 2019)
dbb8cfd Rename (dis)allow options to only and ignore

8320d96 Fix matching if a script can only be in one language

e830188 Add docs on how to use ISO 639-3 codes

3e647ae Regenerate

6dc7372 Fix incorrect udhr entry by removing rup

Source code(tar.gz)
Source code(zip)
[email protected](Apr 30, 2019)
dbb8cfd Rename (dis)allow options to only and ignore

8320d96 Fix matching if a script can only be in one language

e830188 Add docs on how to use ISO 639-3 codes

3e647ae Regenerate

6dc7372 Fix incorrect udhr entry by removing rup

Source code(tar.gz)
Source code(zip)
[email protected](Apr 30, 2018)
21a016a Update udhr

eb7c0a9 Refactor code-style

Source code(tar.gz)
Source code(zip)
franc-all.js(515.88 KB)
franc-min.js(105.80 KB)
franc.js(240.21 KB)
[email protected](Apr 30, 2018)
21a016a Update udhr

eb7c0a9 Refactor code-style

Source code(tar.gz)
Source code(zip)
franc-all.js(515.88 KB)
franc-min.js(105.80 KB)
franc.js(240.21 KB)
[email protected](Apr 30, 2018)
274f644 update meow

9a8a65b update franc

eb7c0a9 Refactor code-style

Source code(tar.gz)
Source code(zip)
franc-all.js(515.88 KB)
franc-min.js(105.80 KB)
franc.js(240.21 KB)
[email protected](Apr 30, 2018)
21a016a Update udhr

eb7c0a9 Refactor code-style

Source code(tar.gz)
Source code(zip)
franc-all.js(515.88 KB)
franc-min.js(105.80 KB)
franc.js(240.21 KB)
[email protected](Jan 21, 2018)
7950d65 update meow

Source code(tar.gz)
Source code(zip)
franc-all.js(496.07 KB)
franc-min.js(103.93 KB)
franc.js(237.16 KB)
3.1.1(Jul 27, 2017)
4299aa4 Update trigram-utils

Source code(tar.gz)
Source code(zip)
franc-all.js(496.07 KB)
franc-min.js(103.93 KB)
franc.js(237.16 KB)
[email protected](Jul 27, 2017)
9f4d350 Update udhr

Source code(tar.gz)
Source code(zip)
franc-all.js(496.07 KB)
franc-min.js(103.93 KB)
franc.js(237.16 KB)
3.1.0(Apr 2, 2017)
a4d881e Add support for more languages

250f431 Fix bug where cli wasn’t linked

Source code(tar.gz)
Source code(zip)
franc-all.js(492.07 KB)
franc-min.js(103.65 KB)
franc.js(236.89 KB)
3.0.0(Mar 6, 2017)
633d050 Use meow for CLI

791eb3d Rewrite project to use a monorepo

Source code(tar.gz)
Source code(zip)
franc-all.js(480.73 KB)
franc-min.js(96.54 KB)
franc.js(220.34 KB)
2.0.0(Mar 31, 2016)
Add GitHub Releases deployment to Travis (d67575a)

Remove distribution files from source (9fb527a)

Remove Bower, Component, Duo support (00b4e51)

Update udhr, trigrams (7598e5a)

Source code(tar.gz)
Source code(zip)
franc-all.js(433.34 KB)
franc-most.js(220.34 KB)
franc.js(96.54 KB)

Owner

Titus

🐧 Making it easier for developers to develop · core team @unifiedjs · full-time OSS · syntax trees, markdown, markup, natural language 🐧

GitHub Repository https://wooorm.com/franc/

一键翻译各类图片内文字

一键翻译各类图片内文字针对群内、各个图站上大量不太可能会有人去翻译的图片设计，让我这种日语小白能够勉强看懂图片主要支持日语，不过也能识别汉语和小写英文支持简单的涂白和嵌字

574 Dec 28, 2022

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

933 Dec 29, 2022

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text

572 Jan 05, 2023

Amazing 3D explosion animation using Pygame module.

3D Explosion Animation 💣 💥 🔥 Amazing explosion animation with Pygame. 💣 Explosion physics An Explosion instance is made of a set of Particle objec

12 Mar 11, 2022

One Metrics Library to Rule Them All!

onemetric Installation Install onemetric from PyPI (recommended): pip install onemetric Install onemetric from the GitHub source: git clone https://gi

49 Jan 03, 2023

OpenCV-Erlang/Elixir bindings

evision [WIP] : OS : arch Build Status Ubuntu 20.04 arm64 Ubuntu 20.04 armv7 Ubuntu 20.04 s390x Ubuntu 20.04 ppc64le Ubuntu 20.04 x86_64 macOS 11 Big

194 Jan 05, 2023

This is a repository to learn and get more computer vision skills, make robotics projects integrating the computer vision as a perception tool and create a lot of awesome advanced controllers for the robots of the future.

This is a repository to learn and get more computer vision skills, make robotics projects integrating the computer vision as a perception tool and create a lot of awesome advanced controllers for the

17 Nov 03, 2022

OpenGait is a flexible and extensible gait recognition project

A flexible and extensible framework for gait recognition. You can focus on designing your own models and comparing with state-of-the-arts easily with the help of OpenGait.

335 Dec 22, 2022

An Implementation of the FOTS: Fast Oriented Text Spotting with a Unified Network

FOTS: Fast Oriented Text Spotting with a Unified Network Introduction This is a pytorch re-implementation of FOTS: Fast Oriented Text Spotting with a

171 Aug 04, 2022

kaldi-asr/kaldi is the official location of the Kaldi project.

Kaldi Speech Recognition Toolkit To build the toolkit: see ./INSTALL. These instructions are valid for UNIX systems including various flavors of Linux

12.3k Jan 05, 2023

In this project we will be using the live feed coming from the webcam to create a virtual mouse with complete functionalities.

Virtual Mouse Using OpenCV In this project we will be using the live feed coming from the webcam to create a virtual mouse using hand tracking. Projec

8 Dec 20, 2022

Augmenting Anchors by the Detector Itself

Augmenting Anchors by the Detector Itself Introduction It is difficult to determine the scale and aspect ratio of anchors for anchor-based object dete

4 Nov 06, 2022

SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition

SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition PDF Abstract Explainable artificial intelligence has been gaining attention

87 Dec 26, 2022

Assignment work with webcam

work with webcam : Press key 1 to use emojy on your face Press key 2 to use lip and eye on your face Press key 3 to checkered your face Press key 4 to

2 May 31, 2022

Repository collecting all the submodules for the new PyTorch-based OCR System.

OCRopus3 is being replaced by OCRopus4, which is a rewrite using PyTorch 1.7; release should be soonish. Please check github.com/tmbdev/ocropus for up

138 Dec 09, 2022

The code for “Oriented RepPoints for Aerail Object Detection”

Oriented RepPoints for Aerial Object Detection The code for the implementation of “Oriented RepPoints”, Under review. (arXiv preprint) Introduction Or

207 Dec 24, 2022

An interactive interface for using OpenCV's GrabCut algorithm for image segmentation.

Interactive GrabCut An interactive interface for using OpenCV's GrabCut algorithm for image segmentation. Setup Install dependencies: pip install nump

16 Oct 10, 2022

Geometric Augmentation for Text Image

Text Image Augmentation A general geometric augmentation tool for text images in the CVPR 2020 paper "Learn to Augment: Joint Data Augmentation and Ne

440 Jan 05, 2023

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

hocr-tools About About the code Installation System-wide with pip System-wide from source virtualenv Available Programs hocr-check -- check the hOCR f

285 Dec 08, 2022

https://arxiv.org/abs/1904.01941

Character-Region-Awareness-for-Text-Detection- https://arxiv.org/abs/1904.01941 Train You can train SynthText data use python source/train_SynthText.p

120 Dec 28, 2022

Natural language detection

Related tags

Overview

What’s so cool about franc?

What’s not so cool about franc?

Install

Use

.all

only

ignore

CLI

Supported languages

Language code

Ports

Derivation

License

Comments

sentence 1

sentence 2

Releases(6.1.0)

6.1.0(Nov 23, 2022)

[email protected](Nov 23, 2022)

[email protected](Nov 23, 2022)

6.0.0(Aug 15, 2021)

[email protected](Aug 15, 2021)

[email protected](Aug 15, 2021)

[email protected](Aug 15, 2021)

[email protected](Oct 12, 2020)

[email protected](May 10, 2020)

5.0.0(Jan 27, 2020)

[email protected](Jan 27, 2020)

[email protected](Jan 27, 2020)

[email protected](Jan 27, 2020)

4.1.1(Oct 28, 2019)

[email protected](Oct 28, 2019)

fr[email protected](Oct 28, 2019)

[email protected](Apr 30, 2019)

[email protected](Apr 30, 2019)

[email protected](Apr 30, 2019)

[email protected](Apr 30, 2019)

[email protected](Apr 30, 2018)

[email protected](Apr 30, 2018)

[email protected](Apr 30, 2018)

[email protected](Apr 30, 2018)

[email protected](Jan 21, 2018)

3.1.1(Jul 27, 2017)

[email protected](Jul 27, 2017)

3.1.0(Apr 2, 2017)

3.0.0(Mar 6, 2017)

2.0.0(Mar 31, 2016)

Owner

Titus

一键翻译各类图片内文字

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

Amazing 3D explosion animation using Pygame module.

One Metrics Library to Rule Them All!

OpenCV-Erlang/Elixir bindings

This is a repository to learn and get more computer vision skills, make robotics projects integrating the computer vision as a perception tool and create a lot of awesome advanced controllers for the robots of the future.

OpenGait is a flexible and extensible gait recognition project

An Implementation of the FOTS: Fast Oriented Text Spotting with a Unified Network

kaldi-asr/kaldi is the official location of the Kaldi project.

In this project we will be using the live feed coming from the webcam to create a virtual mouse with complete functionalities.

Augmenting Anchors by the Detector Itself

SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition

Assignment work with webcam

Repository collecting all the submodules for the new PyTorch-based OCR System.

The code for “Oriented RepPoints for Aerail Object Detection”

An interactive interface for using OpenCV's GrabCut algorithm for image segmentation.

Geometric Augmentation for Text Image

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

https://arxiv.org/abs/1904.01941

`.all`

`only`

`ignore`