Python binding for Morfologik

Morfologik is Polish morphological analyzer. For more information see http://github.com/morfologik/morfologik-stemming/ and http://http://www.morfologik.blogspot.com/

Requirements

This binding works with Python 2 and Python 3.

Installation

Install it from pip

pip install pyMorfologik

or directly from github

git clone https://github.com/dmirecki/pyMorfologik.git

Usage

Now, only simple stems are supported:

>>> from pymorfologik import Morfologik
>>> from pymorfologik.parsing import ListParser
>>>
>>> parser = ListParser()
>>> stemmer = Morfologik()
>>> stemmer.stem(['Ala ma kota'], parser)
[(u'Ala',
  {u'Al': [u'subst:sg:acc:m1+subst:sg:gen:m1'],
   u'Ala': [u'subst:sg:nom:f'],
   u'Alo': [u'subst:sg:acc:m1+subst:sg:gen:m1']}),
 (u'ma',
  {u'mieć': [u'verb:fin:sg:ter:imperf:refl.nonrefl'],
   u'mój': [u'adj:sg:nom.voc:f:pos']}),
 (u'kota', {u'kot': [u'subst:sg:acc:m1'], u'kota': [u'subst:sg:nom:f']})]

Acknowledgements

This repo is based on Morfologik, a great contribution of Marcin Miłowski (http://marcinmilkowski.pl) and Dawid Weiss (http://www.dawidweiss.com).

Contributions

Damian Mirecki

Adrian Bohdanowicz

pyMorfologik MorfologikpyMorfologik - Python binding for Morfologik.

Related tags

Overview

Python binding for Morfologik

Requirements

Installation

Usage

Acknowledgements

Contributions

Owner

Damian Mirecki

Translation to python of Chris Sims' optimization function

Subtitle Workshop (subshop): tools to download and synchronize subtitles

Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downstream tasks like translation and summarisation.

Learning to Rewrite for Non-Autoregressive Neural Machine Translation

Training RNNs as Fast as CNNs

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

Community and sentiment analysis based on tweets

Chinese Grammatical Error Diagnosis

A single model that parses Universal Dependencies across 75 languages.

A toolkit for document-level event extraction, containing some SOTA model implementations

ETM - R package for Topic Modelling in Embedding Spaces

Augmenty is an augmentation library based on spaCy for augmenting texts.

A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

An Open-Source Package for Neural Relation Extraction (NRE)

Auto_code_complete is a auto word-completetion program which allows you to customize it on your needs

Shared, streaming Python dict

A very simple framework for state-of-the-art Natural Language Processing (NLP)

BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions

This repository implements a brute-force spellchecker utilizing the Damerau-Levenshtein edit distance.

Utilities for preprocessing text for deep learning with Keras