NLP算法

说明

此算法仓库包括文本分类、序列标注、关系抽取、文本匹配、文本相似度匹配这五个主流NLP任务，涉及到22个相关的模型算法。

框架结构

文件结构

all_models
├── Base_line
│   ├── __init__.py
│   ├── base_data_process.py
│   ├── base_evaluation.py
│   └── single_tokenizer.py
│
├── Texts_Classification
│   ├── 机器学习_文本分类
│   ├── fasttext_文本分类
│   ├── textcnn_文本分类
│   ├── lstm_文本分类
│   ├── han_文本分类
│   ├── bert_文本分类
│   └── 数据准备
│
├── Sequence_Labeling
│   ├── crf_suite
│   ├── lstm_crf
│   ├── bert_lstm_crf
│   ├── bert_mrc
│   └── 数据准备
│
├── Relation_Extraction
│   ├── CasRel
│   ├── multihead_joint_extraction
│   ├── R-bert_relation_recognition
│   ├── attention_lstm_relation_recognition
│   ├── attention_lstm_relation_recognition_for_single_sentence
│   ├── tagging_scheme_joint_extraction
│   ├── entity_extraction_bert_lstm_crf
│   └── 数据准备
│
├── Text_Matching
│   ├── DSSM
│   ├── ARC-II
│   ├── ESIM
│   ├── bert
│   └── 数据准备
│
├── Text_Similarity_Matching
│   ├── tfidf
│   ├── BM25
│   ├── pysparnn
│   └── commodity_title.txt
│
├── 记录
├── .gitignore
└── README.md

nlp基础任务

Related tags

Overview

NLP算法

说明

框架结构

文件结构

Owner

zuxinqi

ThinkTwice: A Two-Stage Method for Long-Text Machine Reading Comprehension

CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training

The official repository of the ISBI 2022 KNIGHT Challenge

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

Text Normalization（文本正则化）

Speech Recognition Database Management with python

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Collection of scripts to pinpoint obfuscated code

The RWKV Language Model

Text preprocessing, representation and visualization from zero to hero.

edge-SR: Super-Resolution For The Masses

Wikipedia-Utils: Preprocessing Wikipedia Texts for NLP

Converts text into a PDF of handwritten notes

Code for the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Crie tokens de autenticação íntegros e seguros com UToken.

Natural Language Processing library built with AllenNLP 🌲🌱

This repository contains (not all) code from my project on Named Entity Recognition in philosophical text