Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Last update: Dec 06, 2022

Related tags

Deep Learning LEBERT

Overview

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Arxiv link of the paper: https://arxiv.org/abs/2105.07148

Requirement

Python 3.7.0
Transformer 3.4.0
Numpy 1.18.5
Packaging 17.1
skicit-learn 0.23.2
torch 1.16.0+cu92
tqdm 4.50.2
multiprocess 0.70.10
tensorflow 2.3.1
tensorboardX 2.1
seqeval 1.2.1

Input Format

CoNLL format (prefer BIOES tag scheme), with each character its label for one line. Sentences are splited with a null line.

美   B-LOC  
国   E-LOC  
的   O  
华   B-PER  
莱   I-PER  
士   E-PER  

我   O  
跟   O  
他   O  
谈   O  
笑   O  
风   O  
生   O

Chinese BERT，Chinese Word Embedding, and Checkpoints

Chinese BERT

Chinese BERT: https://cdn.huggingface.co/bert-base-chinese-pytorch_model.bin

Chinese word embedding:

Word Embedding: https://ai.tencent.com/ailab/nlp/en/data/Tencent_AILab_ChineseEmbedding.tar.gz

Checkpoints and Shells

Directory Structure of data

berts
- bert
  - config.json
  - vocab.txt
  - pytorch_model.bin
dataset
- NER
  - weibo
  - note4
  - msra
  - resume
- POS
  - ctb5
  - ctb6
  - ud1
  - ud2
- CWS
  - ctb6
  - msr
  - pku
vocab
- tencent_vocab.txt, the vocab of pre-trained word embedding table.
embedding
- word_embedding.txt
result
- NER
  - weibo
  - note4
  - msra
  - resume
- POS
  - ctb5
  - ctb6
  - ud1
  - ud2
- CWS
  - ctb6
  - msr
  - pku
log

Run

1.Convert .char.bmes file to .json file, python3 to_json.py
2.run the shell, sh run_ner.sh

If you want to load my checkpoints, you need to make some revisions to your transformers.

My model is trained in distribution mode so it can not be directly loaded by single-GPU mode. You can follow the below steps to revise the transformers before load my checkpoints.

Enter the source code director of Transformer, cd source/transformers-master
Find the modeling_util.py, and positioned to about 995 lines
change the code as follows:
Compile the revised source code and install. python3 setup.py install

Cite

@misc{liu2021lexicon,
      title={Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter}, 
      author={Wei Liu and Xiyan Fu and Yue Zhang and Wenming Xiao},
      year={2021},
      eprint={2105.07148},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Related tags

Overview

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

Requirement

Input Format

Chinese BERT，Chinese Word Embedding, and Checkpoints

Chinese BERT

Chinese word embedding:

Checkpoints and Shells

Directory Structure of data

Run

If you want to load my checkpoints, you need to make some revisions to your transformers.

Cite

Owner

Code for Fold2Seq paper from ICML 2021

AI4Good project for detecting waste in the environment

PyTorch CZSL framework containing GQA, the open-world setting, and the CGE and CompCos methods.

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

Minimal diffusion models - Minimal code and simple experiments to play with Denoising Diffusion Probabilistic Models (DDPMs)

🏖 Keras Implementation of Painting outside the box

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.

use machine learning to recognize gesture on raspberrypi

Kaggle: Cell Instance Segmentation

Painting app using Python machine learning and vision technology.

Multi-Task Learning as a Bargaining Game

Streamlit Tutorial (ex: stock price dashboard, cartoon-stylegan, vqgan-clip, stylemixing, styleclip, sefa)

Repo for EMNLP 2021 paper "Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression"

StyleGAN - Official TensorFlow Implementation

Impelmentation for paper Feature Generation and Hypothesis Verification for Reliable Face Anti-Spoofing

Animation of solving the traveling salesman problem to optimality using mixed-integer programming and iteratively eliminating sub tours

A tensorflow model that predicts if the image is of a cat or of a dog.

NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go

This repository contains the re-implementation of our paper deSpeckNet: Generalizing Deep Learning Based SAR Image Despeckling