Knowledge-Inheritance

Source code paper: Knowledge Inheritance for Pre-trained Language Models (preprint). The trained model parameters (in Fairseq format) can be downloaded from Tsinghua Cloud. You can use convert_fairseq_to_huggingface.py to convert the Fairseq format into Huggingface's transformers format easily.

We refer the downstream performance evaluation to the implementation of Fairseq (GLUE tasks) and Don't Stop Pre-training (ACL-ARC / CHEMPROT).

If you have any question, feel free to contact us ([email protected]).

1. Available Pretrained Models

WB domain: Wikipedia + BookCorpus; CS domain: computer science papers; BIO domain: biomedical papers;

Models trained by self-learning

RoBERTa_WB_H_4
RoBERTa_WB_H_6
RoBERTa_WB_H_8
RoBERTa_WB_H_10
RoBERTa_WB_D_288
RoBERTa_WB_D_384
RoBERTa_WB_D_480
RoBERTa_WB_D_576
RoBERTa_WB_D_672
RoBERTa_WB_BASE
RoBERTa_WB_MEDIUM
RoBERTa_WB_BASE_PLUS
RoBERTa_WB_LARGE
GPT_WB_MEDIUM
GPT_WB_BASE
GPT_WB_BASE_PLUS
RoBERTa_CS_MEDIUM
RoBERTa_CS_BASE
RoBERTa_BIO_MEDIUM
RoBERTa_BIO_BASE

Models trained by Knowledge Inheritance

RoBERTa_WB_BASE -> RoBERTa_WB_BASE_PLUS
RoBERTa_WB_BASE -> RoBERTa_WB_LARGE
RoBERTa_WB_BASE_PLUS -> RoBERTa_WB_LARGE
RoBERTa_WB_BASE -> RoBERTa_WB_BASE_PLUS -> RoBERTa_WB_LARGE

Source code for paper: Knowledge Inheritance for Pre-trained Language Models

Related tags

Overview

Knowledge-Inheritance

1. Available Pretrained Models

Models trained by self-learning

Models trained by Knowledge Inheritance

Owner

THUNLP

[ACL-IJCNLP 2021] "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets"

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

Keras attention models including botnet,CoaT,CoAtNet,CMT,cotnet,halonet,resnest,resnext,resnetd,volo,mlp-mixer,resmlp,gmlp,levit

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

tf2onnx - Convert TensorFlow, Keras and Tflite models to ONNX.

This repository contains the implementation of the HealthGen model, a generative model to synthesize realistic EHR time series data with missingness

Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Python TFLite scripts for detecting objects of any class in an image without knowing their label.

The official codes for the ICCV2021 Oral presentation "Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework"

No-reference Image Quality Assessment(NIQA) Algorithms (BRISQUE, NIQE, PIQE, RankIQA, MetaIQA)

Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.

Code for "Sparse Steerable Convolutions: An Efficient Learning of SE(3)-Equivariant Features for Estimation and Tracking of Object Poses in 3D Space"

A lightweight library to compare different PyTorch implementations of the same network architecture.

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering

A Lightweight Experiment & Resource Monitoring Tool 📺

Zeyuan Chen, Yangchao Wang, Yang Yang and Dong Liu.

Winning solution of the Indoor Location & Navigation Kaggle competition

YouRefIt: Embodied Reference Understanding with Language and Gesture

SSD: Single Shot MultiBox Detector pytorch implementation focusing on simplicity