Text language identification using Wikipedia data

The aim of this project is to provide high-quality language detection over all the web's languages. The proxy for all web's languages is Wikipedia. Currently, we support 156 languages that have their Wikipedia entries.

Usage

The main function is text-langs that returns 2 values:

a lang - probability alist (languages are represented by their ISO-639-1 codes)
a vector of tokens with their inferred langs

WILD> (text-langs "це тест")
((:UK . 0.5000003) (:RU . 0.4999998))
#(<це - UK:1.00> <тест - RU:1.00>)

Running as a service

Installation

Install SBCL
Get Quicklisp
Git clone project
$ cd wiki-lang-detect; sbcl --load run.lisp

Running as a Docker

docker build -t wiki-lang-detect:latest .
docker run -it -p 5000:5000 wiki-lang-detect:latest

curl -X POST -H "Content-Type: application/json" -d "{'text': 'Несе Галя'}"  http://localhost:5000/detect | jq '.'

Or you can use prebuilt Docker image maintained outside of this repository.

docker run -it -p 5000:5000 chaliy/wiki-lang-detect:latest

API

See swagger definition

Text language identification using Wikipedia data

Related tags

Overview

Text language identification using Wikipedia data

Usage

Running as a service

Installation

Running as a Docker

API

Helpful links:

Owner

Vsevolod Dyomkin

A webcam-based 3x3x3 rubik's cube solver written in Python 3 and OpenCV.

In this project we will be using the live feed coming from the webcam to create a virtual mouse with complete functionalities.

Polaris is a Face recognition attendance system .

2 telegram-bots: for image recognition and for text generation

Bu uygulamada Python ve Opencv kullanarak bilgisayar kamerasından yüz tespiti yapıyoruz.

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Implementation of EAST scene text detector in Keras

Automatically resolve RidderMaster based on TensorFlow & OpenCV

Binarize document images

This is a GUI program which consist of 4 OpenCV projects

[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks

A dataset handling library for computer vision datasets in LOST-fromat

Image augmentation for machine learning experiments.

Localization of thoracic abnormalities model based on VinBigData (top 1%)

Histogram specification using openCV in python .

A Python script to capture images from multiple webcams at once and save them into your local machine

Optical character recognition for Japanese text, with the main focus being Japanese manga

Ddddocr - 通用验证码识别OCR pypi版

Play the Namibian game of Owela against a terrible AI. Built using Django and htmx.

a Deep Learning Framework for Text