当前位置:网站首页>NLP dataset collation (updating)
NLP dataset collation (updating)
2022-04-22 12:16:00 【Zhang mouwen's blog_ Lambda】
NLP Data set collation
Chinese and English NLP Dataset search platform , Click on Search for
One 、 Sentiment analysis
| ID | title | Updated date | Data set provider | explain | keyword | Category | remarks |
|---|---|---|---|---|---|---|---|
| 1 | weibo_senti_100k | nothing | nothing | Sina Weibo marked with emotion , The positive and negative comments are about 5 Ten thousand | Microblogging 、 emotional | 2. Classified tasks | nothing |
| 2 | Weibo Emotion Corpus | 2016 | The Hong Kong Polytechnic University | Microblog corpus , Marking the 7 class emotions: like, disgust, happiness, sadness, anger, surprise, fear. size : More than 40000 microblogs | Microblogging 、 emotional | Multi category tasks | The origin of the paper |
| 3 | title | Updated date | Data set provider | explain | keyword | Category | nothing |
| 4 | title | Updated date | Data set provider | explain | keyword | Category | nothing |
| 5 | title | Updated date | Data set provider | explain | keyword | Category | nothing |
| 6 | title | Updated date | Data set provider | explain | keyword | Category | nothing |
| 7 | title | Updated date | Data set provider | explain | keyword | Category | nothing |
| 8 | title | Updated date | Data set provider | explain | keyword | Category | nothing |
| 9 | title | Updated date | Data set provider | explain | keyword | Category | nothing |
| 10 | title | Updated date | Data set provider | explain | keyword | Category | nothing |
| 11 | title | Updated date | Data set provider | explain | keyword | Category | nothing |
Two 、 Text classification
3、 ... and 、 Text matching
Four 、 Text in this paper,
5、 ... and 、 Machine translation
6、 ... and 、NER
7、 ... and 、QA
8、 ... and 、 Knowledge map
Nine 、 corpus
Ten 、 reading comprehension
Chinese word splitting Dictionary
- English can do char embedding, Try Chinese Word splitting dictionary
- Or use Bert Pre training model Break down Chinese characters .
Chinese dataset platform
-
Sogou lab , It provides some high-quality Chinese text data sets , Mostly for 2012 Years ago : Portal
-
Zhongke nature language processing and information retrieval sharing platform : Portal
Small data of Chinese corpus
-
Including Chinese Named Entity Recognition 、 Chinese relationship recognition 、 Some small amount of data such as Chinese reading comprehension : Portal
-
Wikipedia dataset : Portal
-
NLP Tools
(1)THULAC:https://github.com/thunlp/THULAC : Including Chinese participle 、 Part of speech tagging function .
(2)HanLP:https://github.com/hankcs/HanLP
(3) Harbin Institute of technology :LTP https://github.com/HIT-SCIR/ltp
(4)NLPIR:https://github.com/NLPIR-team/NLPIR
(5)jieba:https://github.com/yanyiwu/cppjieba
(6) Baidu Qianyan data set :https://github.com/luge-ai/luge-ai
Reference article : From Jane book
版权声明
本文为[Zhang mouwen's blog_ Lambda]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204221209542747.html
边栏推荐
- Wow, it's so rich.
- Distributed transaction and lock
- 什么样的SQL报表可以使用max–pt函数?
- 塔米狗项目解读|北京华隆典当有限责任公司49.5%股权转让
- Canvas series tutorial 01 - line, triangle, polygon, rectangle, palette
- Case 4-1.4: path in the reactor (simulation establishment and simulation path of small top reactor)
- Case 4-1.7: file transfer (concurrent search)
- base64加密解密和json处理
- How to compare the hardware configuration when buying mobile phones of different brands?
- [in depth understanding of tcallusdb technology] data interface description for reading the specified location in the list - [list table]
猜你喜欢

Ali's internship offer successfully landed, which is very important

低频(LF)RFID 智能终端

Comparison of data protection modes between Oracle data guard and Jincang kingbasees cluster

模糊集合论

Kernel pwn 基础教程之 Heap Overflow

EFR32晶体校准指南

js 【详解】闭包

NFT、GameFi、SocialFi、云存储,DFINITY 生态上最热赛道详解

订货系统打破批发企业瓶颈期,助力企业数字化转型

Low frequency (LF) RFID intelligent terminal
随机推荐
ESP32-CAM使用历
【生活中的逻辑谬误】以暴制暴和压制理性
Father of MySQL: the code should be written at once, not later
Some problems in the use of zuul
日撸代码300行学习笔记 Day 47
Low frequency (LF) RFID intelligent terminal
Esp32-cam usage history
Add page transition animation in fluent
[Tang Laoshi] subcontracting and sticking in unity network communication
购买不同品牌的手机,怎么对比硬件配置?
What is the lifecycle of automated testing?
STM32F429BIT6 SD卡模拟U盘
Redis新版本发布,你还认为Redis是单线程?
Experience and guidance of senior students preparing for the postgraduate entrance examination of English translation of Southeast University in 2023
ONT和ONU
UML总结
Ali's internship offer successfully landed, which is very important
Smart business card applet creates business card page function and realizes key code
電工第二講
【深入理解TcaplusDB技术】更替列表指定位置数据接口说明——[List表]