Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Last update: Jan 06, 2023

Related tags

Deep Learning transformer-xl

Overview

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

This repository contains the code in both PyTorch and TensorFlow for our paper

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution)

Preprint 2018

TensorFlow

The source code is in the tf/ folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training.
Besides the source code, we also provide pretrained "TensorFlow" models with state-of-the-art (SoTA) performances reported in the paper.
Please refer to tf/README.md for details.

PyTorch

The source code is in the pytorch/ folder, supporting single-node multi-gpu training via the module nn.DataParallel.
Please refer to pytorch/README.md for details.

Results

Transformer-XL achieves new state-of-the-art results on multiple language modeling benchmarks. Transformer-XL is also the first to break through the 1.0 barrier on char-level language modeling. Below is a summary.

Method	enwiki8	text8	One Billion Word	WT-103	PTB (w/o finetuning)
Previous Best	1.06	1.13	23.7	20.5	55.5
Transformer-XL	0.99	1.08	21.8	18.3	54.5

Acknowledgement

A large portion of the getdata.sh script comes from the awd-lstm repo. Happy Language Modeling :)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Related tags

Overview

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

TensorFlow

PyTorch

Results

Acknowledgement

Owner

Zhilin Yang

Keras community contributions

Self-Supervised Contrastive Learning of Music Spectrograms

PyTorch implementation of paper A Fast Knowledge Distillation Framework for Visual Recognition.

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Topic Modelling for Humans

Deep motion generator collections

Voice Gender Recognition

[ICCV'2021] "SSH: A Self-Supervised Framework for Image Harmonization", Yifan Jiang, He Zhang, Jianming Zhang, Yilin Wang, Zhe Lin, Kalyan Sunkavalli, Simon Chen, Sohrab Amirghodsi, Sarah Kong, Zhangyang Wang

Multi-Horizon-Forecasting-for-Limit-Order-Books

Elegy is a framework-agnostic Trainer interface for the Jax ecosystem.

Machine learning and Deep learning models, deploy on telegram (the best social media)

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

Data reduction pipeline for KOALA on the AAT.

A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

PyTorch Implementation of Sparse DETR

FaceAnon - Anonymize people in images and videos using yolov5-crowdhuman

Implementation of the paper "Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning"

SANet: A Slice-Aware Network for Pulmonary Nodule Detection

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

Hand Gesture Volume Control is AIML based project which uses image processing to control the volume of your Computer.