Mengzi Pretrained Models

Last update: Jan 04, 2023

Overview

中文 | English

Mengzi

尽管预训练语言模型在 NLP 的各个领域里得到了广泛的应用，但是其高昂的时间和算力成本依然是一个亟需解决的问题。这要求我们在一定的算力约束下，研发出各项指标更优的模型。

我们的目标不是追求更大的模型规模，而是轻量级但更强大，同时对部署和工业落地更友好的模型。

基于语言学信息融入和训练加速等方法，我们研发了 Mengzi 系列模型。由于与 BERT 保持一致的模型结构，Mengzi 模型可以快速替换现有的预训练模型。

详细的技术报告请参考:

Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese

快速上手

Mengzi-BERT

# 使用 Huggingface transformers 加载
from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained("Langboat/mengzi-bert-base")
model = BertModel.from_pretrained("Langboat/mengzi-bert-base")

Mengzi-T5

# 使用 Huggingface transformers 加载
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("Langboat/mengzi-t5-base")
model = T5ForConditionalGeneration.from_pretrained("Langboat/mengzi-t5-base")

Mengzi-Oscar

参考文档

依赖安装

pip install transformers

下游任务

CLUE 分数

Model	AFQMC	TNEWS	IFLYTEK	CMNLI	WSC	CSL	CMRC2018	C3	CHID
RoBERTa-wwm-ext	74.30	57.51	60.80	80.70	67.20	80.67	77.59	67.06	83.78
Mengzi-BERT-base	74.58	57.97	60.68	82.12	87.50	85.40	78.54	71.70	84.16

RoBERTa-wwm-ext 的分数来自 CLUE baseline

对应超参

Task	Learning rate	Batch size	Epochs
AFQMC	3e-5	32	10
TNEWS	3e-5	128	10
IFLYTEK	3e-5	64	10
CMNLI	3e-5	512	10
WSC	8e-6	64	50
CSL	5e-5	128	5
CMRC2018	5e-5	8	5
C3	1e-4	240	3
CHID	5e-5	256	5

下载链接

联系方式

微信讨论群

邮箱

wangyulong[at]chuangxin[dot]com

免责声明

该项目中的内容仅供技术研究参考，不作为任何结论性依据。使用者可以在许可证范围内任意使用该模型，但我们不对因使用该项目内容造成的直接或间接损失负责。技术报告中所呈现的实验结果仅表明在特定数据集和超参组合下的表现，并不能代表各个模型的本质。实验结果可能因随机数种子，计算设备而发生改变。

使用者以各种方式使用本模型（包括但不限于修改使用、直接使用、通过第三方使用）的过程中，不得以任何方式利用本模型直接或间接从事违反所属法域的法律法规、以及社会公德的行为。使用者需对自身行为负责，因使用本模型引发的一切纠纷，由使用者自行承担全部法律及连带责任。我们不承担任何法律及连带责任。

我们拥有对本免责声明的解释、修改及更新权。

文献引用

@misc{zhang2021mengzi,
      title={Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese}, 
      author={Zhuosheng Zhang and Hanqing Zhang and Keming Chen and Yuhang Guo and Jingyun Hua and Yulong Wang and Ming Zhou},
      year={2021},
      eprint={2110.06696},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Mengzi Pretrained Models

Related tags

Overview

Mengzi

导航

快速上手

Mengzi-BERT

Mengzi-T5

Mengzi-Oscar

依赖安装

下游任务

CLUE 分数

对应超参

下载链接

联系方式

微信讨论群

邮箱

免责声明

文献引用

Owner

Langboat

SegNet-like Autoencoders in TensorFlow

DANet for Tabular data classification/ regression.

Semiconductor Machine learning project

This repository is an implementation of our NeurIPS 2021 paper (Stylized Dialogue Generation with Multi-Pass Dual Learning) in PyTorch.

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Powerful and efficient Computer Vision Annotation Tool (CVAT)

Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU

PyTorch implementation of 1712.06087 "Zero-Shot" Super-Resolution using Deep Internal Learning

Code for the paper: Adversarial Machine Learning: Bayesian Perspectives

Implementation of paper "DCS-Net: Deep Complex Subtractive Neural Network for Monaural Speech Enhancement"

Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"

Implementation of the algorithm shown in the article "Modelo de Predicción de Éxito de Canciones Basado en Descriptores de Audio"

🔥 TensorFlow Code for technical report: "YOLOv3: An Incremental Improvement"

This repository is based on Ultralytics/yolov5, with adjustments to enable polygon prediction boxes.

利用yolov5和TensorRT从0到1实现目标检测的模型训练到模型部署全过程

Code for Learning to Segment The Tail (LST)

Encoding Causal Macrovariables

This is the code for CVPR 2021 oral paper: Jigsaw Clustering for Unsupervised Visual Representation Learning

Predicting Tweet Sentiment Maching Learning and streamlit

Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling