硕士期间自学的NLP子任务，供学习参考

Last update: May 31, 2022

Overview

NLP_Chinese_down_stream_task

自学的NLP子任务，供学习参考

任务1 ：短文本分类

(1).数据集：THUCNews中文文本数据集(10分类)

(2).模型：BERT+FC/LSTM，Pytorch实现

(3).使用方法：

预训练模型使用的是中文BERT-WWM, 下载地址(https://github.com/ymcui/Chinese-BERT-wwm), 下载解压后放入[bert_pretrain]文件夹下，运行“main.py”即可

(4).训练结果：

任务2：命名体识别(NER)

(1).数据集：china-people-daily-ner-corpus（中国人民日报数据集）

(2).模型：BiLSTM+CRF，Tensorflow_cpu >= 2.1

使用了中文Wikipedia训练好的100维词向量，运行main.py即可。

(3).训练结果:

(4).F1-Score结果:

任务3：文本匹配（语义相似度，Semantic Textual Similarity）

(1).数据集：fake-news-pair-classification-challenge(kaggle虚假新闻标题分类竞赛，标签有三种关系：'unrelated', 'agreed', 'disagreed')

(2).模型：Siamese LSTM + 任意文本相似度匹配方法，Tensorflow_cpu >= 2.1

(3).使用方法：

直接运行“main.py”即可

硕士期间自学的NLP子任务，供学习参考

Related tags

Overview

NLP_Chinese_down_stream_task

任务1 ：短文本分类

(3).使用方法：

(4).训练结果：

任务2：命名体识别(NER)

(3).训练结果:

(4).F1-Score结果:

任务3：文本匹配（语义相似度，Semantic Textual Similarity）

(3).使用方法：

(4).训练结果：

Reference:

Owner

Repository for Graph2Pix: A Graph-Based Image to Image Translation Framework

CorNet Correlation Networks for Extreme Multi-label Text Classification

A tool helps build a talk preview image by combining the given background image and talk event description

NLP Core Library and Model Zoo based on PaddlePaddle 2.0

Text Analysis & Topic Extraction on Android App user reviews

A benchmark for evaluation and comparison of various NLP tasks in Persian language.

Fidibo.com comments Sentiment Analyser

Topic Modelling for Humans

Hostapd-mac-tod-acl - Setup a hostapd AP with MAC ToD ACL

Multilingual word vectors in 78 languages

The ability of computer software to identify words and phrases in spoken language and convert them to human-readable text

Text editor on python to convert english text to malayalam(Romanization/Transiteration).

A list of NLP(Natural Language Processing) tutorials

Google and Stanford University released a new pre-trained model called ELECTRA

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

End-to-end text to speech system using gruut and onnx. There are 40 voices available across 8 languages.

Traditional Chinese Text Recognition Dataset: Synthetic Dataset and Labeled Data

Intent parsing and slot filling in PyTorch with seq2seq + attention

AI-powered literature discovery and review engine for medical/scientific papers

Train GPT-3 model on V100(16GB Mem) Using improved Transformer.