Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

This repository contains the code in both PyTorch and TensorFlow for our paper

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution)

Preprint 2018

TensorFlow

The source code is in the tf/ folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training.
Besides the source code, we also provide pretrained "TensorFlow" models with state-of-the-art (SoTA) performances reported in the paper.
Please refer to tf/README.md for details.

PyTorch

The source code is in the pytorch/ folder, supporting single-node multi-gpu training via the module nn.DataParallel.
Please refer to pytorch/README.md for details.

Results

Transformer-XL achieves new state-of-the-art results on multiple language modeling benchmarks. Transformer-XL is also the first to break through the 1.0 barrier on char-level language modeling. Below is a summary.

Method	enwiki8	text8	One Billion Word	WT-103	PTB (w/o finetuning)
Previous Best	1.06	1.13	23.7	20.5	55.5
Transformer-XL	0.99	1.08	21.8	18.3	54.5

Acknowledgement

A large portion of the getdata.sh script comes from the awd-lstm repo. Happy Language Modeling :)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Related tags

Overview

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

TensorFlow

PyTorch

Results

Acknowledgement

Owner

Zhilin Yang

Searching keywords in PDF file folders

A high-level yet extensible library for fast language model tuning via automatic prompt search

Need: Image Search With Python

apple's universal binaries BUT MUCH WORSE (PRACTICAL SHITPOST) (NOT PRODUCTION READY)

Entity Disambiguation as text extraction (ACL 2022)

My Implementation for the paper EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks using Tensorflow

wxPython app for converting encodings, modifying and fixing SRT files

YACLC - Yet Another Chinese Learner Corpus

Translate - a PyTorch Language Library

Global Rhythm Style Transfer Without Text Transcriptions

Dust model dichotomous performance analysis

pytorch implementation of Attention is all you need

Speach Recognitions

Utilities for preprocessing text for deep learning with Keras

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

The aim of this task is to predict someone's English proficiency based on a text input.

FedNLP: A Benchmarking Framework for Federated Learning in Natural Language Processing

Text Normalization（文本正则化）

Must-read papers on improving efficiency for pre-trained language models.