A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

Overview



GitHub issues GitHub stars GitHub license

古文自然语言处理模型合集,收录互联网上的古文相关模型及资源。

更多内容请参考:

古文预训练语言模型

古文预训练语言模型是处理各种古文任务的基础模型,需要结合各种下游任务数据微调,才能发挥最大作用。这里收集了所有互联网上公开的古文预训练语言模型:

名称 简/繁 下载链接 备注
guwenbert-base Hugging Face 基于殆知阁语料和中文模型训练
guwenbert-large Hugging Face
guwenbert-fs-base One Drive 基于殆知阁语料从头训练
roberta-classical-chinese-base-char 简繁 Hugging Face 基于guwenbert训练,扩展了繁体词表
roberta-classical-chinese-large-char 简繁 Hugging Face
sikubert Hugging Face 基于四库全书语料和中文模型训练
sikuroberta Hugging Face

古文应用模型

古文应用模型是基于古文预训练模型,结合特定领域数据微调得到的模型,能够实现古文的各种实际应用。其中guwen-X模型使用的训练数据可以在CCLUE中下载,如果输入包含繁体字请先使用本页最下方提到的工具进行转换。

古文断句

guwen-seg: 基于guwenbert-fs-base的断句模型。

古文标点

guwen-punc: 基于guwenbert-fs-base的标点模型。

古文引号检测

guwen-quote: 基于guwenbert-fs-base的引号检测模型。

注意:如下图所示,使用Transformers自带的序列标注模型存在一定误差,请在实际场景中使用CRF模型解码。相关代码参考 crf_example.ipynb

古文命名实体识别

guwen-ner: 基于guwenbert-base的命名实体识别模型。

注意:为取得最好表现,推荐在实际场景中使用CRF模型解码。相关代码参考 crf_example.ipynb

古文分类

guwen-cls: 基于guwenbert-fs-base的古文分类模型。

古诗情感分类

guwen-sent: 基于guwenbert-base的古文分类模型。

其他古文相关资源

  • OpenCC: 简繁转换工具
  • zhconv: 简繁转换工具 (注意需使用zh-hans选项,只转换单字,避免转换地区词)
  • 甲言Jiayan: 古汉语处理的NLP工具包,古文分词,词性标注,断句,标点等工具
  • UD-Kanbun: 古文分词,词性标注,句法解析
  • daizhigev20: 殆知阁古代文献v2.0语料库
  • chinese-poetry: 最全中文诗歌古典文集数据库
  • Classical-Chinese: 古文现代文翻译平行语料库

关于

本仓库旨在收集互联网上的开源古文NLP模型,版权归原作者所有,欢迎补充更多资源,如有问题可以在Issue区讨论,或邮件联系ethanyt at qq.com

Owner
Ethan
Natural Language Processing, Deep Learning, Information Retrieval, Full-stack Development.
Ethan
Wrapper to display a script output or a text file content on the desktop in sway or other wlroots-based compositors

nwg-wrapper This program is a part of the nwg-shell project. This program is a GTK3-based wrapper to display a script output, or a text file content o

Piotr Miller 94 Dec 27, 2022
Predict an emoji that is associated with a text

Sentiment Analysis Sentiment analysis in computational linguistics is a general term for techniques that quantify sentiment or mood in a text. Can you

Tetsumichi(Telly) Umada 30 Sep 07, 2022
[ICLR 2021 Spotlight] Pytorch implementation for "Long-tailed Recognition by Routing Diverse Distribution-Aware Experts."

RIDE: Long-tailed Recognition by Routing Diverse Distribution-Aware Experts. by Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu and Stella X. Yu at UC

Xudong (Frank) Wang 205 Dec 16, 2022
Top2Vec is an algorithm for topic modeling and semantic search.

Top2Vec is an algorithm for topic modeling and semantic search. It automatically detects topics present in text and generates jointly embedded topic, document and word vectors.

Dimo Angelov 2.4k Jan 06, 2023
A Persian Image Captioning model based on Vision Encoder Decoder Models of the transformers🤗.

Persian-Image-Captioning We fine-tuning the Vision Encoder Decoder Model for the task of image captioning on the coco-flickr-farsi dataset. The implem

Hamtech-ai 15 Aug 25, 2022
Multi Task Vision and Language

12-in-1: Multi-Task Vision and Language Representation Learning Please cite the following if you use this code. Code and pre-trained models for 12-in-

Meta Research 711 Jan 08, 2023
Ray-based parallel data preprocessing for NLP and ML.

Wrangl Ray-based parallel data preprocessing for NLP and ML. pip install wrangl # for latest pip install git+https://github.com/vzhong/wrangl See exa

Victor Zhong 33 Dec 27, 2022
Use PaddlePaddle to reproduce the paper:mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

MT5_paddle Use PaddlePaddle to reproduce the paper:mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer English | 简体中文 mT5: A Massively

2 Oct 17, 2021
Implementation of TTS with combination of Tacotron2 and HiFi-GAN

Tacotron2-HiFiGAN-master Implementation of TTS with combination of Tacotron2 and HiFi-GAN for Mandarin TTS. Inference In order to inference, we need t

SunLu Z 7 Nov 11, 2022
Simple Text-To-Speech Bot For Discord

Simple Text-To-Speech Bot For Discord This is a very simple TTS bot for discord made with python. For this bot you need FFMPEG, see installation to se

1 Sep 26, 2022
KR-FinBert And KR-FinBert-SC

KR-FinBert & KR-FinBert-SC Much progress has been made in the NLP (Natural Language Processing) field, with numerous studies showing that domain adapt

5 Jul 29, 2022
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained mo

Hugging Face 77.2k Jan 03, 2023
A python wrapper around the ZPar parser for English.

NOTE This project is no longer under active development since there are now really nice pure Python parsers such as Stanza and Spacy. The repository w

ETS 49 Sep 12, 2022
🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

A hyper-fast, safe Python module to read and write JSON data. Works as a drop-in replacement for Python's built-in json module. This is alpha software

Matthias 479 Jan 01, 2023
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing Trankit is a light-weight Transformer-based Pyth

652 Jan 06, 2023
Language-Agnostic SEntence Representations

LASER Language-Agnostic SEntence Representations LASER is a library to calculate and use multilingual sentence embeddings. NEWS 2019/11/08 CCMatrix is

Facebook Research 3.2k Jan 04, 2023
Command Line Text-To-Speech using Google TTS

cli-tts Thanks to gTTS by @pndurette! This is an interactive command line text-to-speech tool using Google TTS. Just type text and the voice will be p

ReekyStive 3 Nov 11, 2022
An implementation of the Pay Attention when Required transformer

Pay Attention when Required (PAR) Transformer-XL An implementation of the Pay Attention when Required transformer from the paper: https://arxiv.org/pd

7 Aug 11, 2022
Ask for weather information like a human

weather-nlp About Ask for weather information like a human. Goals Understand typical questions like: Hourly temperatures in Potsdam on 2020-09-15. Rai

5 Oct 29, 2022
Code for Editing Factual Knowledge in Language Models

KnowledgeEditor Code for Editing Factual Knowledge in Language Models (https://arxiv.org/abs/2104.08164). @inproceedings{decao2021editing, title={Ed

Nicola De Cao 86 Nov 28, 2022