🎐 a python library for doing approximate and phonetic matching of strings.

Overview

jellyfish

https://coveralls.io/repos/jamesturk/jellyfish/badge.png?branch=master Documentation Status

Jellyfish is a python library for doing approximate and phonetic matching of strings.

Written by James Turk <[email protected]> and Michael Stephens.

See https://github.com/jamesturk/jellyfish/graphs/contributors for contributors.

See http://jellyfish.readthedocs.io for documentation.

Source is available at http://github.com/jamesturk/jellyfish.

Jellyfish >= 0.7 only supports Python 3, if you need Python 2 please use 0.6.x.

Included Algorithms

String comparison:

  • Levenshtein Distance
  • Damerau-Levenshtein Distance
  • Jaro Distance
  • Jaro-Winkler Distance
  • Match Rating Approach Comparison
  • Hamming Distance

Phonetic encoding:

  • American Soundex
  • Metaphone
  • NYSIIS (New York State Identification and Intelligence System)
  • Match Rating Codex

Example Usage

>>> import jellyfish
>>> jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish')
2
>>> jellyfish.jaro_distance(u'jellyfish', u'smellyfish')
0.89629629629629637
>>> jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs')
1
>>> jellyfish.metaphone(u'Jellyfish')
'JLFX'
>>> jellyfish.soundex(u'Jellyfish')
'J412'
>>> jellyfish.nysiis(u'Jellyfish')
'JALYF'
>>> jellyfish.match_rating_codex(u'Jellyfish')
'JLLFSH'

Running Tests

If you are interested in contributing to Jellyfish, you may want to run tests locally. Jellyfish uses tox to run tests, which you can setup and run as follows:

pip install tox
# cd jellyfish/
tox
Owner
James Turk
Principal Architect of @openstates. Formerly of @OpenPrecincts, @pbs, @sunlightlabs.
James Turk
Natural Language Processing Best Practices & Examples

NLP Best Practices In recent years, natural language processing (NLP) has seen quick growth in quality and usability, and this has helped to drive bus

Microsoft 6.1k Dec 31, 2022
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

Parallel WaveGAN implementation with Pytorch This repository provides UNOFFICIAL pytorch implementations of the following models: Parallel WaveGAN Mel

Tomoki Hayashi 1.2k Dec 23, 2022
Code for evaluating Japanese pretrained models provided by NTT Ltd.

japanese-dialog-transformers 日本語の説明文はこちら This repository provides the information necessary to evaluate the Japanese Transformer Encoder-decoder dialo

NTT Communication Science Laboratories 216 Dec 22, 2022
Python library for processing Chinese text

SnowNLP: Simplified Chinese Text Processing SnowNLP是一个python写的类库,可以方便的处理中文文本内容,是受到了TextBlob的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和TextBlob

Rui Wang 6k Jan 02, 2023
Random Directed Acyclic Graph Generator

DAG_Generator Random Directed Acyclic Graph Generator verison1.0 简介 工作流通常由DAG(有向无环图)来定义,其中每个计算任务$T_i$由一个顶点(node,task,vertex)表示。同时,任务之间的每个数据或控制依赖性由一条加权

Livion 17 Dec 27, 2022
A BERT-based reverse-dictionary of Korean proverbs

Wisdomify A BERT-based reverse-dictionary of Korean proverbs. 김유빈 : 모델링 / 데이터 수집 / 프로젝트 설계 / back-end 김종윤 : 데이터 수집 / 프로젝트 설계 / front-end Quick Start C

Eu-Bin KIM 94 Dec 08, 2022
Creating a python chatbot that Starbucks users can text to place an order + help cut wait time of a normal coffee.

Creating a python chatbot that Starbucks users can text to place an order + help cut wait time of a normal coffee.

2 Jan 20, 2022
Build Text Rerankers with Deep Language Models

Reranker is a lightweight, effective and efficient package for training and deploying deep languge model reranker in information retrieval (IR), question answering (QA) and many other natural languag

Luyu Gao 140 Dec 06, 2022
Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Neural G2P to portuguese language Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written for

fluz 11 Nov 16, 2022
Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

SWRM Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors" Clone Clone th

14 Jan 03, 2023
Club chatbot

Chatbot Club chatbot Instructions to get the Chatterbot working Step 1. First make sure you are using a version of Python 3 or newer. To check your ve

5 Mar 07, 2022
Türkçe küfürlü içerikleri bulan bir yapay zeka kütüphanesi / An ML library for profanity detection in Turkish sentences

"Kötü söz sahibine aittir." -Anonim Nedir? sinkaf uygunsuz yorumların bulunmasını sağlayan bir python kütüphanesidir. Farkı nedir? Diğer algoritmalard

KaraGoz 4 Feb 18, 2022
Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Twitch Revenues Bu script'i kullanarak istediğiniz yayıncıların, Twitch'den sızdırılan 125 GB'lik veriye dayanarak, 2019-2021 arası aylık gelirlerini

4 Nov 11, 2021
Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch

N-Grammer - Pytorch Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch Install $ pip install n-grammer-pytorch Usage

Phil Wang 66 Dec 29, 2022
Python3 to Crystal Translation using Python AST Walker

py2cr.py A code translator using AST from Python to Crystal. This is basically a NodeVisitor with Crystal output. See AST documentation (https://docs.

66 Jul 25, 2022
Get list of common stop words in various languages in Python

Python Stop Words Table of contents Overview Available languages Installation Basic usage Python compatibility Overview Get list of common stop words

Alireza Savand 142 Dec 21, 2022
Label data using HuggingFace's transformers and automatically get a prediction service

Label Studio for Hugging Face's Transformers Website • Docs • Twitter • Join Slack Community Transfer learning for NLP models by annotating your textu

Heartex 135 Dec 29, 2022
Repository for Project Insight: NLP as a Service

Project Insight NLP as a Service Contents Introduction Features Installation Setup and Documentation Project Details Demonstration Directory Details H

Abhishek Kumar Mishra 286 Dec 06, 2022
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained mo

Hugging Face 77.2k Jan 03, 2023
Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁

TGCLOUD 🪁 Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁 Features Easy to Deploy Heroku Supp

Mr.Acid dev 6 Oct 18, 2022