A repo for materials relating to the tutorial of CS-332 NLP

Last update: Feb 15, 2022

Overview

CS-332-NLP

A repo for materials relating to the tutorial of CS-332 NLP

Tutorial 1:
- Introduction
- Corpus
- Regular expression
- Tokenization
Tutorial 2:
- Normalization
- Parsing
- Morpheme
- Stemming
- Lemmatization

Acknowledgements

Speech and Language Processing. Daniel Jurafsky & James H. Martin. (Edition 2 & 3)
Marcinkiewicz, M. A. (1994). Building a large annotated corpus of English: The Penn Treebank. Using Large Corpora, 273.
http://su.diva-portal.org/smash/record.jsf?pid=diva2%3A686162&dswid=9114

Owner

Alok singh

GitHub Repository

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP) predictions: part-of-speech (POS) tags, chunking (CHK), name entity recognition (

20 Apr 30, 2022

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

WhiteningBERT Source code and data for paper WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach. Preparation git clone https://github.com

49 Dec 17, 2022

Binaural Speech Synthesis

Binaural Speech Synthesis This repository contains code to train a mono-to-binaural neural sound renderer. If you use this code or the provided datase

135 Dec 18, 2022

Chinese version of GPT2 training code, using BERT tokenizer.

GPT2-Chinese Description Chinese version of GPT2 training code, using BERT tokenizer or BPE tokenizer. It is based on the extremely awesome repository

5.6k Jan 04, 2023

Crie tokens de autenticação íntegros e seguros com UToken.

UToken - Tokens seguros. UToken (ou Unhandleable Token) é uma bilioteca criada para ser utilizada na geração de tokens seguros e íntegros, ou seja, nã

0 Nov 29, 2022

Maix Speech AI lib, including ASR, chat, TTS etc.

Maix-Speech 中文 | English Brief Now only support Chinese, See 中文 Build Clone code by: git clone https://github.com/sipeed/Maix-Speech Compile x86x64 c

267 Dec 25, 2022

Nested Named Entity Recognition

Nested Named Entity Recognition Training Dataset: CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark url: https://tianchi.aliyun.

8 Dec 25, 2022

Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

Ceaser-Cipher The Caesar Cipher technique is one of the earliest and simplest me

2 May 12, 2022

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

335 Jan 04, 2023

CCF BDCI BERT系统调优赛题baseline（Pytorch版本）

CCF BDCI BERT系统调优赛题baseline（Pytorch版本）此版本基于Pytorch后端的huggingface进行实现。由于此实现使用了Oneflow的dataloader作为数据读入的方式，因此也需要安装Oneflow。其它框架的数据读取可以参考OneflowDataloade

9 Oct 13, 2022

The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

Neural Machine Translation communication system The model is basically direct to convert one source language to another targeted language using encode

7 Sep 22, 2022

Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!

4 Jul 20, 2022

A repo for materials relating to the tutorial of CS-332 NLP

Related tags

Overview

CS-332-NLP

Contents

Acknowledgements

Owner

Alok singh

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

Binaural Speech Synthesis

Chinese version of GPT2 training code, using BERT tokenizer.

Crie tokens de autenticação íntegros e seguros com UToken.

Maix Speech AI lib, including ASR, chat, TTS etc.

Nested Named Entity Recognition

Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

CCF BDCI BERT系统调优赛题baseline（Pytorch版本）

The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

Final Project for the Intel AI Readiness Boot Camp NLP (Jan)

Generate a cool README/About me page for your Github Profile

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

Official Stanford NLP Python Library for Many Human Languages

This library is testing the ethics of language models by using natural adversarial texts.

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Data preprocessing rosetta parser for python

Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!