Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Last update: Jan 05, 2023

Related tags

Overview

Regression Transformer

Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Development setup

conda env create -f conda.yml
conda activate terminator
pip install -e .

Generate some data

Example data for QED can be generated using scripts/generate_example_data.py.

python scripts/generate_example_data.py examples/example.smi examples/qed_property_example.txt

If you need to create a new vocabulary for a dataset you can use scripts/create_vocabulary.py it will also automatically add some special tokens at the top of your vocabulary file.

python scripts/create_vocabulary.py examples/qed_property_example.txt examples/vocab.txt

At this point the folder containing the vocabulary file can be used to load a tokenizer compatible with any ExpressionBertTokenizer:

>>> from terminator.tokenization import ExpressionBertTokenizer
>>> tokenizer = ExpressionBertTokenizer.from_pretrained('examples')
>>> text = '
   
    0.3936|CBr'
   
>>> tokens = tokenizer.tokenize(text)
>>> print(tokens)
['
   
    '
   , '_0_0_', '_._', '_3_-1_', '_9_-2_', '_3_-3_', '_6_-4_', '|', 'C', 'Br']
>>> token_indexes = tokenizer.convert_tokens_to_ids(tokenizer.tokenize(text))
>>> print(token_indexes)
[16, 17, 18, 28, 45, 34, 35, 19, 15, 63]
>>> tokenizer.build_inputs_with_special_tokens(token_indexes)
[12, 16, 17, 18, 28, 45, 34, 35, 19, 15, 63, 13]

Prepare some train/eval data line by line:

head -n 900 examples/qed_property_example.txt > examples/train.txt
tail -n +901 examples/qed_property_example.txt > examples/eval.txt

Launch the training:

python scripts/run_language_modeling.py --output_dir examples/models/xlnet_selfies \
    --config_name configs/xlnet_selfies.json --tokenizer_name ./examples/vocab.txt \
    --do_train --do_eval --learning_rate 1e-4 --num_train_epochs 5 --save_total_limit 2 \
    --save_steps 500 --per_gpu_train_batch_size 16 --evaluate_during_training --eval_data_file ./examples/eval.txt \
    --train_data_file ./examples/train.txt --line_by_line --block_size 510 --seed 42 --logging_steps 250

Exemplary model configurations (number of heads, layers, etc.) can be found in the configs folder.

Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Related tags

Overview

Regression Transformer

Development setup

Generate some data

Owner

International Business Machines

Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Learned Virtual View Visibility ICCV2021

PyTorch implementation of Trust Region Policy Optimization

Editing a classifier by rewriting its prediction rules

This is the official pytorch implementation of Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation(TESKD)

Retinal vessel segmentation based on GT-UNet

Code from Daniel Lemire, A Better Alternative to Piecewise Linear Time Series Segmentation

Gluon CV Toolkit

Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition - NeurIPS2021

OpenMMLab Text Detection, Recognition and Understanding Toolbox

For storing the complete exploration of Visual Question Answering for our B.Tech Project

Learn about quantum computing and algorithm on quantum computing

Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Winning solution of the Indoor Location & Navigation Kaggle competition

Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

Easy-to-use,Modular and Extendible package of deep-learning based CTR models .

CBKH: The Cornell Biomedical Knowledge Hub

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

Build upon neural radiance fields to create a scene-specific implicit 3D semantic representation, Semantic-NeRF