A text augmentation tool for named entity recognition.

Last update: Oct 11, 2022

Overview

neraug

This python library helps you with augmenting text data for named entity recognition.

Augmentation Example

Reference from An Analysis of Simple Data Augmentation for Named Entity Recognition

Installation

To install the library:

pip install neraug

Usage

One of the example algorithms: DictionaryReplacement:

>>> from neraug.augmentator import DictionaryReplacement
>>> from neraug.scheme import IOBES

>>> ne_dic = {'Tokyo Big Sight': 'LOC'}
>>> augmentator = DictionaryReplacement(ne_dic, str.split, IOBES)
>>> x = ['I', 'went', 'to', 'Tokyo']
>>> y = ['O', 'O', 'O', 'S-LOC']
>>> x_augs, y_augs = augmentator.augment(x, y, n=1)   
>>> x_augs
[['I', 'went', 'to', 'Tokyo', 'Big', 'Sight']]
>>> y_augs
[['O', 'O', 'O', 'B-LOC', 'I-LOC', 'E-LOC']]

The library supports the following algorithms:

DictionaryReplacement
LabelWiseTokenReplacement
MentionReplacement
ShuffleWithinSegment

and supports the following scheme:

IOB2
IOBES
BILOU

Reference

Appreciate for the following research:

An Analysis of Simple Data Augmentation for Named Entity Recognition

Citation

@misc{neraug,
  title={neraug: A data augmentation tool for named entity recognition},
  author={Hiroki Nakayama},
  url={https://github.com/Hironsan/neraug},
  year={2021}
}

You might also like...

Pytorch-Named-Entity-Recognition-with-BERT

BERT NER Use google BERT to do CoNLL-2003 NER ! Train model using Python and Inference using C++ ALBERT-TF2.0 BERT-NER-TENSORFLOW-2.0 BERT-SQuAD Requi

1.1k Dec 25, 2022

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

0 Feb 13, 2022

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

For better performance, you can try NLPGNN, see NLPGNN for more details. BERT-NER Version 2 Use Google's BERT for named entity recognition （CoNLL-2003

1.2k Dec 26, 2022

Named Entity Recognition API used by TEI Publisher

TEI Publisher Named Entity Recognition API This repository contains the API used by TEI Publisher's web-annotation editor to detect entities in the in

14 Nov 15, 2022

Nested Named Entity Recognition

Nested Named Entity Recognition Training Dataset: CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark url: https://tianchi.aliyun.

8 Dec 25, 2022

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

9 Nov 7, 2022

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Named Entity Recognition API with spaCy and GiNZA I wrote a blog post about this

3 Feb 27, 2022

Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

README Code for Two-stage Identifier: "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022. For details of the model a

45 Nov 29, 2022

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

spaCyOpenTapioca A spaCy wrapper of OpenTapioca for named entity linking on Wikidata. Table of contents Installation How to use Local OpenTapioca Vizu

80 Jan 3, 2023

Releases(v0.1.1)

v0.1.1(Jul 22, 2021)

Remove tokenizer from MentionReplacement
Source code(tar.gz)
Source code(zip)
v0.1.0(Jul 22, 2021)

Source code(tar.gz)
Source code(zip)

A text augmentation tool for named entity recognition.

Related tags

Overview

neraug

Augmentation Example

Installation

Usage

Reference

Citation

You might also like...

Pytorch-Named-Entity-Recognition-with-BERT

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

Named Entity Recognition API used by TEI Publisher

Nested Named Entity Recognition

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

Releases(v0.1.1)

v0.1.1(Jul 22, 2021)

v0.1.0(Jul 22, 2021)

Owner

Hiroki Nakayama

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

Snips Python library to extract meaning from text

Uses Google's gTTS module to easily create robo text readin' on command.

apple's universal binaries BUT MUCH WORSE (PRACTICAL SHITPOST) (NOT PRODUCTION READY)

Repositório da disciplina no semestre 2021-2

构建一个多源（公众号、RSS）、干净、个性化的阅读环境

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

🧪 Cutting-edge experimental spaCy components and features

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

NLP Overview

PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.

Python library to make development of portfolio analysis faster and easier

Pipeline for training LSA models using Scikit-Learn.