Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"

Last update: Dec 19, 2022

Related tags

Deep Learning incontext-learning

Overview

GINC small-scale in-context learning dataset

GINC (Generative In-Context learning Dataset) is a small-scale synthetic dataset for studying in-context learning. The pretraining data is generated by a mixture of HMMs and the in-context learning prompt examples are also generated from HMMs (either from the mixture or not). The prompt examples are out-of-distribution with respect to the pretraining data since every example is independent, concatenated, and separated by delimiters. We provide code to generate GINC-style datasets of varying vocabulary sizes, number of HMMs, and other parameters.

Quickstart

Please create a conda environment or virtualenv using the information in conda-env.yml, then install transformers by going into the transformers/ directory and running pip install -e .. Modify consts.sh to change the default output locations and insert code to activate the environment of choice. Run scripts/runner.sh to run all the experiments on sbatch.

Explore the data

The default dataset has vocab size 50 and the pretraining data is generated as a mixture of 5 HMMs. The pretraining dataset is in data/GINC_trans0.1_start10.0_nsymbols50_nvalues10_nslots10_vic0.9_nhmms10/train.json while in-context prompts are in data/GINC_trans0.1_start10.0_nsymbols50_nvalues10_nslots10_vic0.9_nhmms10/id_prompts_randomsample_*.json.

This repo contains the experiments for the paper An Explanation of In-context Learning as Implicit Bayesian Inference. If you found this repo useful, please cite

@article{xie2021incontext,
  author = {Sang Michael Xie and Aditi Raghunathan and Percy Liang and Tengyu Ma},
  journal = {arXiv preprint arXiv:2111.02080},
  title = {An Explanation of In-context Learning as Implicit Bayesian Inference},
  year = {2021},
}

Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"

Related tags

Overview

GINC small-scale in-context learning dataset

Quickstart

Explore the data

Owner

P-Lambda

Geometric Vector Perceptrons --- a rotation-equivariant GNN for learning from biomolecular structure

Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Melanoma Skin Cancer Detection using Convolutional Neural Networks and Transfer Learning🕵🏻‍♂️

Code for "Hierarchical Skills for Efficient Exploration" HSD-3 Algorithm and Baselines

Pytoydl: A toy deep learning framework built upon numpy.

SoGCN: Second-Order Graph Convolutional Networks

a morph transfer UGATIT for image translation.

Code for Domain Adaptive Video Segmentation via Temporal Consistency Regularization in ICCV 2021

Security evaluation module with onnx, pytorch, and SecML.

Model Quantization Benchmark

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.

This repository is the official implementation of Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models

Code accompanying paper: Meta-Learning to Improve Pre-Training

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

This program generates a random 12 digit/character password (upper and lowercase) and stores it in a file along with your username and app/website.

This repository contains a CBIR system that uses swin transformer to extract image's feature.

Implemenets the Contourlet-CNN as described in C-CNN: Contourlet Convolutional Neural Networks, using PyTorch

CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator