GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning

Last update: Jan 05, 2023

Related tags

Overview

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning

GrammarTagger is an open-source toolkit for grammatical profiling for language learning. It can analyze text in English and Chinese and show you grammatical items included in the input, along with its estimated difficulty.

Usage

GrammarTagger is written in Python (3.7+) and AllenNLP (2.1.0+). If you have conda installed, you can set up the environment as follows:

git clone https://github.com/octanove/grammartagger.git
cd grammartagger
conda create -n grammartagger python=3.7
conda activate grammartagger
pip install -r requirements.txt

Also, download the pretrained models (see below). After these steps, you can run GrammarTagger as follows:

English:

echo 'He loves to learn new languages, and last month he practiced some lessons in Spanish.' | python scripts/predict.py model-en-multi.tar.gz | jq
{
  "spans": [
    {
      "span": [0, 3],
      "tokens": ["[CLS]", "he", "loves", "to"],
      "label": "194:VP.SV.AFF"
    },
    {
      "span": [2, 2],
      "tokens": ["loves"],
      "label": "60:TA.PRESENT.does.AFF"
    },
    {
      "span": [2, 4],
      "tokens": ["loves", "to", "learn"],
      "label": "101:TO.VV_to_do"
    },
    ...
  ],
  "tokens": [
      "[CLS]", "he", "loves", "to", "learn", "new", "languages", ",",
      "and", "last", "month", "he", "practiced", "some", "lessons", "in", "spanish", ".", "[SEP]"
  ],
  "level_probs": {
    "c2": 0.008679441176354885,
    "b2": 0.005526999477297068,
    "c1": 0.05267713591456413,
    "b1": 0.06360447406768799,
    "a2": 0.06990284472703934,
    "a1": 0.7954732775688171
  }
}

Chinese:

$ echo '她住得很远，我想送她回去。' | python scripts/predict.py model-zh-multi.tar.gz | jq
{
  "spans": [
    {
      "span": [2, 5],
      "tokens": ["住", "得", "很", "远"],
      "label": "2.12.1:V 得 A:(using adverbs)"
    },
    {
      "span": [4, 4]
      "tokens": ["很"],
      "label": "1.06.2:很:very"
    },
    {
      "span": [8, 8],
      "tokens": ["想"],
      "label": "1.08.1:想:to want"
    }
  ],
  "tokens": ["[CLS]", "她", "住", "得", "很", "远", "，", "我", "想", "送", "她", "回", "去", "。", "[SEP]"],
  "level_probs": {
    "HSK 6": 9.971807230613194e-06,
    "HSK 5": 0.0011904890416190028,
    "HSK 3": 0.005279902834445238,
    "HSK 4": 0.00014815296162851155,
    "HSK 2": 0.9917035102844238,
    "HSK 1": 0.0016456041485071182
  }
}

Technical details

GrammarTagger is based on pretrained contextualizers, namely BERT (Devlin et al. 2019), and span classification. See the following paper for more details.

Hagiwara et al. 2021. GrammarTagger: A Multilingual, Minimally-Supervised Grammar Profiler for Language Education

Pretrained models

English: model-en-multi.tar.gz (387 MB)
Chinese: model-zh-multi.tar.gz (363 MB)

These pretrained models are licensed under CC BY-NC-ND 4.0 for academic/personal uses. If you are interested in a commercial license, please contact [email protected]. We are also working on improved models with wider grammar coverage and higher accuracy.

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning

Related tags

Overview

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning

Usage

Technical details

Pretrained models

Owner

Octanove Labs

Unsupervised Language Model Pre-training for French

ChatterBot is a machine learning, conversational dialog engine for creating chat bots

Natural Language Processing with transformers

Google's Meena transformer chatbot implementation

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems

spaCy plugin for Transformers , Udify, ELmo, etc.

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

This is a simple item2vec implementation using gensim for recbole

A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository

Train 🤗-transformers model with Poutyne.

ByT5: Towards a token-free future with pre-trained byte-to-byte models

This library is testing the ethics of language models by using natural adversarial texts.

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)

Searching keywords in PDF file folders

apple's universal binaries BUT MUCH WORSE (PRACTICAL SHITPOST) (NOT PRODUCTION READY)

Nested Named Entity Recognition for Chinese Biomedical Text

Linking data between GBIF, Biodiverse, and Open Tree of Life