Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Last update: Jan 05, 2023

Related tags

Overview

Regression Transformer

Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Development setup

conda env create -f conda.yml
conda activate terminator
pip install -e .

Generate some data

Example data for QED can be generated using scripts/generate_example_data.py.

python scripts/generate_example_data.py examples/example.smi examples/qed_property_example.txt

If you need to create a new vocabulary for a dataset you can use scripts/create_vocabulary.py it will also automatically add some special tokens at the top of your vocabulary file.

python scripts/create_vocabulary.py examples/qed_property_example.txt examples/vocab.txt

At this point the folder containing the vocabulary file can be used to load a tokenizer compatible with any ExpressionBertTokenizer:

>>> from terminator.tokenization import ExpressionBertTokenizer
>>> tokenizer = ExpressionBertTokenizer.from_pretrained('examples')
>>> text = '
   
    0.3936|CBr'
   
>>> tokens = tokenizer.tokenize(text)
>>> print(tokens)
['
   
    '
   , '_0_0_', '_._', '_3_-1_', '_9_-2_', '_3_-3_', '_6_-4_', '|', 'C', 'Br']
>>> token_indexes = tokenizer.convert_tokens_to_ids(tokenizer.tokenize(text))
>>> print(token_indexes)
[16, 17, 18, 28, 45, 34, 35, 19, 15, 63]
>>> tokenizer.build_inputs_with_special_tokens(token_indexes)
[12, 16, 17, 18, 28, 45, 34, 35, 19, 15, 63, 13]

Prepare some train/eval data line by line:

head -n 900 examples/qed_property_example.txt > examples/train.txt
tail -n +901 examples/qed_property_example.txt > examples/eval.txt

Launch the training:

python scripts/run_language_modeling.py --output_dir examples/models/xlnet_selfies \
    --config_name configs/xlnet_selfies.json --tokenizer_name ./examples/vocab.txt \
    --do_train --do_eval --learning_rate 1e-4 --num_train_epochs 5 --save_total_limit 2 \
    --save_steps 500 --per_gpu_train_batch_size 16 --evaluate_during_training --eval_data_file ./examples/eval.txt \
    --train_data_file ./examples/train.txt --line_by_line --block_size 510 --seed 42 --logging_steps 250

Exemplary model configurations (number of heads, layers, etc.) can be found in the configs folder.

Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Related tags

Overview

Regression Transformer

Development setup

Generate some data

Owner

International Business Machines

Code base of object detection

K-FACE Analysis Project on Pytorch

PyTorch implementation of Deformable Convolution

[AAAI-2022] Official implementations of MCL: Mutual Contrastive Learning for Visual Representation Learning

Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition"

PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA

Speckle-free Holography with Partially Coherent Light Sources and Camera-in-the-loop Calibration

Face Recognition & AI Based Smart Attendance Monitoring System.

No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency

The code written during my Bachelor Thesis "Classification of Human Whole-Body Motion using Hidden Markov Models".

Over-the-Air Ensemble Inference with Model Privacy

WRENCH: Weak supeRvision bENCHmark

PyTorch implementation of TSception V2 using DEAP dataset

Face and Pose detector that emits MQTT events when a face or human body is detected and not detected.

A TensorFlow Implementation of "Deep Multi-Scale Video Prediction Beyond Mean Square Error" by Mathieu, Couprie & LeCun.

Statsmodels: statistical modeling and econometrics in Python

The official PyTorch implementation of paper BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

Set of methods to ensemble boxes from different object detection models, including implementation of "Weighted boxes fusion (WBF)" method.

Video Frame Interpolation with Transformer (CVPR2022)

Bachelor's Thesis in Computer Science: Privacy-Preserving Federated Learning Applied to Decentralized Data