Machine learning models from Singapore's NLP research community

Last update: Dec 17, 2022

Related tags

Overview

SG-NLP

Machine learning models from Singapore's natural language processing (NLP) research community.

sgnlp is a Python package that allows you to easily get started on using various (NLP) models implemented using the Pytorch and Transfromers frameworks.

We have an accompanying demo site where you can interact with our models and get a better understanding on how they work.

Installation

Python >= 3.8

pip install sgnlp

Documentation

Visit our documentation for tutorials.

License

Code and models from this project are released under the MIT License unless otherwise stated. If a model's code is under a separate license, it can be found in the respective model's folder.

Comments

Change demo api to use gevent worker
Using multiple workers of the default type 'sync' in gunicorn is not working on Kubernetes

Workers constantly terminated due to signal 9

Try gevent to see if it works out
opened by jonheng 2
UFD use case tutorial and usability improvement
Added additional tutorial on how to use UFD to train and evaluate on custom dataset

Bug fix for UFD parse_args_and_load_config util function

Added feature to create folder if folder doesn't exist

Added some train args param in eval args param to improve usability

Made caching optional

Added validation to make debugging easier

Added links to config file examples for reccon models
opened by vincenttzc 1
Wrong assert comparison for SenticGCN dataclass
Latest SenticGCN implementation for the Dev branch. In the dataclass.py, post_init method in SenticGCNTrainArgs, there are the following assertions,

assert self.repeats > 1, "Repeats value must be at least 1." assert self.patience > 1, "Patience value must be at least 1."

The comparison operator should be >= instead.
bug
opened by raymondng76 0
47 centralized logging
Create a centralized logger for 'sgnlp' base logger

'sgnlp' logger is created from a config json and is init a the 'sgnlp' module init.py

Replace all logging method call with their own script specific logger
opened by raymondng76 0
Add parent class for preprocessor
[x] Create a module named sgnlp.base

[x] Add abstractmethods for preprocess, save, load

[x] Add batch iteration to parent __call__

[x] Parent __call__ should return a dictionary

enhancement
opened by jonheng 0
46 senticgcn bugfix
Add multi-word aspect support

Update documentation to reflect multi-word support

Update unit tests

Update usage example to include multi-word support
opened by raymondng76 0
Fix multi-word aspect issue with Sentic-GCN preprocessor

The current implementation of preprocessor matches a single aspect index for the purpose of matching postprocessor output. The aspect index field for process_input payload should be expended to handle aspects with multiple indexes.
bug

opened by raymondng76 0
Add Sentic-GCN demo_api to SGNlp
Close #43

This pull request is to add Sentic-GCN demo_api models to sgnlp. Includes the follow components:

model_card

api.py

dockerfiles

requirements.txt

usage.py
opened by K-WeiMing 0
Add Sentic-GCN to SGNlp
close #41

This pull request is to add Sentic-GCN models to sgnlp. Includes the follow components:

Models

Configs

Tokenizers

Embedding models

Trainer/Evaluator

Unit test

documentation

Does not include demo_api as it is covered in another issue tickets.
opened by raymondng76 0
download_pretrained for demo API does not cache downloaded files/models
To allow the containers to start up quicker, models and files were downloaded and cached during build time.

Recent changes in the huggingface transformers package has broken this functionality:

Released in v4.22.0

Issue

Possible choices moving forward:

Write a simple caching utility function

Stick to versions of transformers before 4.22.0
opened by jonheng 0
Add Stance Detection model

Paper: https://aclanthology.org/2020.emnlp-main.108.pdf

Prof: Jiang Jing from SMU

Repo: GitHub - jefferyYu/DualHierarchicalTransformer: Predicting Stance and Rumor Veracity via Dual Hierarchical Transformer

opened by atenzer 0

Releases(v0.4.0)

v0.4.0(Oct 7, 2022)

New model: Coherence Momentum Model
Source code(tar.gz)
Source code(zip)
v0.3.0(Apr 22, 2022)
New models:

Sentic GCN

LIF

UFD

Source code(tar.gz)
Source code(zip)
v0.2.0(Oct 19, 2021)
New models:

RST Pointer

GEC

Source code(tar.gz)
Source code(zip)
v0.1.1(Aug 26, 2021)

Bug fix on rumour detection module paths
Source code(tar.gz)
Source code(zip)
v0.1.0(Aug 26, 2021)

Removed UFD for further review.

Refactoring and improvements to LSR and Rumour detection models.
Source code(tar.gz)
Source code(zip)
v0.0.1(Aug 5, 2021)
Initial release of sgnlp.

Models included:

RECCON

LSR

UFD

Rumour detection twitter

Source code(tar.gz)
Source code(zip)

Owner

AI Singapore | AI Makerspace

Grow local AI talents and empowering start-ups, SMEs and enterprises with AI components, frameworks, platforms and advisory services.

GitHub Repository

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

Welcome to Spokestack Python! This library is intended for developing voice interfaces in Python. This can include anything from Raspberry Pi applicat

133 Sep 20, 2022

NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

NL-Augmenter 🦎 → 🐍 The NL-Augmenter is a collaborative effort intended to add transformations of datasets dealing with natural language. Transformat

684 Jan 09, 2023

KR-FinBert And KR-FinBert-SC

KR-FinBert & KR-FinBert-SC Much progress has been made in the NLP (Natural Language Processing) field, with numerous studies showing that domain adapt

5 Jul 29, 2022

Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

2 Aug 29, 2022

中文医疗信息处理基准CBLUE: A Chinese Biomedical LanguageUnderstanding Evaluation Benchmark

English | 中文说明 CBLUE AI (Artificial Intelligence) is playing an indispensabe role in the biomedical field, helping improve medical technology. For fur

452 Dec 30, 2022

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Spanish Language Models 💃🏻 A repository part of the MarIA project. Corpora 📃 Corpora Number of documents Number of tokens Size (GB) BNE 201,080,084

203 Dec 20, 2022

Named Entity Recognition API used by TEI Publisher

TEI Publisher Named Entity Recognition API This repository contains the API used by TEI Publisher's web-annotation editor to detect entities in the in

14 Nov 15, 2022

Some embedding layer implementation using ivy library

ivy-manual-embeddings Some embedding layer implementation using ivy library. Just for fun. It is based on NYCTaxiFare dataset from kaggle (cut down to

2 Feb 10, 2022

Text Normalization（文本正则化）

Text Normalization（文本正则化）任务描述：通过机器学习算法将英文文本的“手写”形式转换成“口语“形式，例如“6ft”转换成“six feet”等实验结果 XGBoost + bag-of-words: 0.99159 XGBoost+Weights+rules：0.99002

0 Feb 26, 2022

voice2json is a collection of command-line tools for offline speech/intent recognition on Linux

Command-line tools for speech and intent recognition on Linux

988 Jan 04, 2023

Contract Understanding Atticus Dataset

Contract Understanding Atticus Dataset This repository contains code for the Contract Understanding Atticus Dataset (CUAD), a dataset for legal contra

273 Dec 17, 2022

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Image captioning End-to-end image captioning with EfficientNet-b3 + LSTM with Attention Model is seq2seq model. In the encoder pretrained EfficientNet

2 Feb 10, 2022

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP) predictions: part-of-speech (POS) tags, chunking (CHK), name entity recognition (

20 Apr 30, 2022

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

2.2k Jan 09, 2023

This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 - treatments and vaccinations.

Project: Text Analysis - This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 -

1 Mar 14, 2022