Machine learning models from Singapore's NLP research community

Related tags

Text Data & NLPsgnlp
Overview

SG-NLP

Machine learning models from Singapore's natural language processing (NLP) research community.

sgnlp is a Python package that allows you to easily get started on using various (NLP) models implemented using the Pytorch and Transfromers frameworks.

We have an accompanying demo site where you can interact with our models and get a better understanding on how they work.

Installation

  • Python >= 3.8
pip install sgnlp

Documentation

Visit our documentation for tutorials.

License

Code and models from this project are released under the MIT License unless otherwise stated. If a model's code is under a separate license, it can be found in the respective model's folder.

Comments
  • Change demo api to use gevent worker

    Change demo api to use gevent worker

    • Using multiple workers of the default type 'sync' in gunicorn is not working on Kubernetes
    • Workers constantly terminated due to signal 9
    • Try gevent to see if it works out
    opened by jonheng 2
  • UFD use case tutorial and usability improvement

    UFD use case tutorial and usability improvement

    • Added additional tutorial on how to use UFD to train and evaluate on custom dataset
    • Bug fix for UFD parse_args_and_load_config util function
    • Added feature to create folder if folder doesn't exist
    • Added some train args param in eval args param to improve usability
    • Made caching optional
    • Added validation to make debugging easier
    • Added links to config file examples for reccon models
    opened by vincenttzc 1
  • Wrong assert comparison for SenticGCN dataclass

    Wrong assert comparison for SenticGCN dataclass

    Latest SenticGCN implementation for the Dev branch. In the dataclass.py, post_init method in SenticGCNTrainArgs, there are the following assertions,

    assert self.repeats > 1, "Repeats value must be at least 1."
    assert self.patience > 1, "Patience value must be at least 1." 
    

    The comparison operator should be >= instead.

    bug 
    opened by raymondng76 0
  • 47 centralized logging

    47 centralized logging

    • Create a centralized logger for 'sgnlp' base logger
    • 'sgnlp' logger is created from a config json and is init a the 'sgnlp' module init.py
    • Replace all logging method call with their own script specific logger
    opened by raymondng76 0
  • Add parent class for preprocessor

    Add parent class for preprocessor

    • [x] Create a module named sgnlp.base
    • [x] Add abstractmethods for preprocess, save, load
    • [x] Add batch iteration to parent __call__
    • [x] Parent __call__ should return a dictionary
    enhancement 
    opened by jonheng 0
  • 46 senticgcn bugfix

    46 senticgcn bugfix

    • Add multi-word aspect support
    • Update documentation to reflect multi-word support
    • Update unit tests
    • Update usage example to include multi-word support
    opened by raymondng76 0
  • Fix multi-word aspect issue with Sentic-GCN preprocessor

    Fix multi-word aspect issue with Sentic-GCN preprocessor

    The current implementation of preprocessor matches a single aspect index for the purpose of matching postprocessor output. The aspect index field for process_input payload should be expended to handle aspects with multiple indexes.

    bug 
    opened by raymondng76 0
  • Add Sentic-GCN demo_api to SGNlp

    Add Sentic-GCN demo_api to SGNlp

    Close #43

    This pull request is to add Sentic-GCN demo_api models to sgnlp. Includes the follow components:

    • model_card
    • api.py
    • dockerfiles
    • requirements.txt
    • usage.py
    opened by K-WeiMing 0
  • Add Sentic-GCN to SGNlp

    Add Sentic-GCN to SGNlp

    close #41

    This pull request is to add Sentic-GCN models to sgnlp. Includes the follow components:

    • Models
    • Configs
    • Tokenizers
    • Embedding models
    • Trainer/Evaluator
    • Unit test
    • documentation

    Does not include demo_api as it is covered in another issue tickets.

    opened by raymondng76 0
  • download_pretrained for demo API does not cache downloaded files/models

    download_pretrained for demo API does not cache downloaded files/models

    To allow the containers to start up quicker, models and files were downloaded and cached during build time.

    Recent changes in the huggingface transformers package has broken this functionality:

    • Released in v4.22.0
    • Issue

    Possible choices moving forward:

    • Write a simple caching utility function
    • Stick to versions of transformers before 4.22.0
    opened by jonheng 0
  • Add Stance Detection model

    Add Stance Detection model

    opened by atenzer 0
Releases(v0.4.0)
Owner
AI Singapore | AI Makerspace
Grow local AI talents and empowering start-ups, SMEs and enterprises with AI components, frameworks, platforms and advisory services.
AI Singapore | AI Makerspace
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

Welcome to Spokestack Python! This library is intended for developing voice interfaces in Python. This can include anything from Raspberry Pi applicat

Spokestack 133 Sep 20, 2022
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

NL-Augmenter 🦎 → 🐍 The NL-Augmenter is a collaborative effort intended to add transformations of datasets dealing with natural language. Transformat

684 Jan 09, 2023
KR-FinBert And KR-FinBert-SC

KR-FinBert & KR-FinBert-SC Much progress has been made in the NLP (Natural Language Processing) field, with numerous studies showing that domain adapt

5 Jul 29, 2022
Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

2 Aug 29, 2022
中文医疗信息处理基准CBLUE: A Chinese Biomedical LanguageUnderstanding Evaluation Benchmark

English | 中文说明 CBLUE AI (Artificial Intelligence) is playing an indispensabe role in the biomedical field, helping improve medical technology. For fur

452 Dec 30, 2022
Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Spanish Language Models 💃🏻 A repository part of the MarIA project. Corpora 📃 Corpora Number of documents Number of tokens Size (GB) BNE 201,080,084

Plan de Tecnologías del Lenguaje - Gobierno de España 203 Dec 20, 2022
Named Entity Recognition API used by TEI Publisher

TEI Publisher Named Entity Recognition API This repository contains the API used by TEI Publisher's web-annotation editor to detect entities in the in

e-editiones.org 14 Nov 15, 2022
Some embedding layer implementation using ivy library

ivy-manual-embeddings Some embedding layer implementation using ivy library. Just for fun. It is based on NYCTaxiFare dataset from kaggle (cut down to

Ishtiaq Hussain 2 Feb 10, 2022
Text Normalization(文本正则化)

Text Normalization(文本正则化) 任务描述:通过机器学习算法将英文文本的“手写”形式转换成“口语“形式,例如“6ft”转换成“six feet”等 实验结果 XGBoost + bag-of-words: 0.99159 XGBoost+Weights+rules:0.99002

Jason_Zhang 0 Feb 26, 2022
voice2json is a collection of command-line tools for offline speech/intent recognition on Linux

Command-line tools for speech and intent recognition on Linux

Michael Hansen 988 Jan 04, 2023
Contract Understanding Atticus Dataset

Contract Understanding Atticus Dataset This repository contains code for the Contract Understanding Atticus Dataset (CUAD), a dataset for legal contra

The Atticus Project 273 Dec 17, 2022
End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Image captioning End-to-end image captioning with EfficientNet-b3 + LSTM with Attention Model is seq2seq model. In the encoder pretrained EfficientNet

2 Feb 10, 2022
Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP) predictions: part-of-speech (POS) tags, chunking (CHK), name entity recognition (

jawahar 20 Apr 30, 2022
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

pyannote 2.2k Jan 09, 2023
This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 - treatments and vaccinations.

Project: Text Analysis - This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 -

1 Mar 14, 2022
Intent parsing and slot filling in PyTorch with seq2seq + attention

PyTorch Seq2Seq Intent Parsing Reframing intent parsing as a human - machine translation task. Work in progress successor to torch-seq2seq-intent-pars

Sean Robertson 159 Apr 04, 2022
Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Realistic Few-Shot Relation Extraction This repository contains code to reproduce the results in the paper "Towards Realistic Few-Shot Relation Extrac

Bloomberg 8 Nov 09, 2022
jiant is an NLP toolkit

🚨 Update 🚨 : As of 2021/10/17, the jiant project is no longer being actively maintained. This means there will be no plans to add new models, tasks,

ML² AT CILVR 1.5k Dec 28, 2022
Understand Text Summarization and create your own summarizer in python

Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. Technologies that can make a coherent

Sreekanth M 1 Oct 18, 2022
Fake news detector filters - Smart filter project allow to classify the quality of information and web pages

fake-news-detector-1.0 Lists, lists and more lists... Spam filter list, quality keyword list, stoplist list, top-domains urls list, news agencies webs

Memo Sim 1 Jan 04, 2022