Modified GPT using average pooling to reduce the softmax attention memory constraints.

Last update: Dec 03, 2021

Overview

NLP-GPT-Upsampling

This repository contains an implementation of Open AI's GPT Model. In particular, this implementation takes inspiration from the Nystromformer implementation to approximate the full attention softmax matrix to model longer sequences in NLP language modeling tasks by a simple strided average pooling of the input text sequence to reduce the sequence length. The reduced length attention output is then upsampled back to the original sequence length using the bilinear method.

It should be noted that due to the simplicity of this implementation, the performance of the model will not be comparable to the original GPT model utilising the full attention matrix. The tradeoff is that this naive strided averaging would be able to model longer sequences as compared to the original GPT implementation.

Fig. 1: GPT Model Architecture (obtained from GPT paper)

Data

This repository includes codes to process the Movie Dialogue dataset, where the preparation of the data follows this script closely, as well as the Reddit Jokes dataset.

To prepare the data prior to training the model(s), run

python process_movie_dialogue_subword.py

for the Movie Dialogue dataset, or

python process_reddit_jokes_subword_v1.py

for the Reddit Jokes dataset.

Training and Model Inference

Having processed the data into sub-word tokens, run

python train_movie_dialogue_sw_tf_ver2_gpt_keras_upsampled.py
python infer_movie_dialogue_sw_tf_ver2_gpt_keras_upsampled.py

python train_reddit_jokes_sw_tf_ver2_gpt_keras_upsampled.py
python infer_reddit_jokes_sw_tf_ver2_gpt_keras_upsampled.py

to train the respective models based on the dataset loaded and perform inference of the trained model.

Modified GPT using average pooling to reduce the softmax attention memory constraints.

Related tags

Overview

NLP-GPT-Upsampling

Data

Training and Model Inference

Owner

WD

Codename generator using WordNet parts of speech database

NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings

Blazing fast language detection using fastText model

Unlimited Call - Text Bombing Tool

It analyze the sentiment of the user, whether it is postive or negative.

CCF BDCI 2020 房产行业聊天问答匹配赛道 A榜47/2985

Unsupervised Language Model Pre-training for French

Korean Simple Contrastive Learning of Sentence Embeddings using SKT KoBERT and kakaobrain KorNLU dataset

Build Text Rerankers with Deep Language Models

A simple visual front end to the Maya UE4 RBF plugin delivered with MetaHumans

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Enterprise Scale NLP with Hugging Face & SageMaker Workshop series

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

GooAQ 🥑 : Google Answers to Google Questions!

Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0

Snips Python library to extract meaning from text

Blender addon - Scrub timeline from viewport with a shortcut