General-purpose program synthesiser

Overview

DeepSynth

General-purpose program synthesiser.

This is the repository for the code of the paper "Scaling Neural Program Synthesis with Distribution-based Search".

Authors: Anonymous

Figure

Abstract

We consider the problem of automatically constructing computer programs from input-output examples. We investigate how to augment probabilistic and neural program synthesis methods with new search algorithms, proposing a framework called distribution-based search. Within this framework, we introduce two new search algorithms: HEAP SEARCH, an enumerative method, and SQRT SAMPLING, a probabilistic method. We prove certain optimality guarantees for both methods, show how they integrate with probabilistic and neural techniques, and demonstrate how they can operate at scale across parallel compute environments. Collectively these findings offer theoretical and applied studies of search algorithms for program synthesis that integrate with recent developments in machine-learned program synthesizers.

Usage

Installation

# clone this repository
git clone https://github.com/nathanael-fijalkow/DeepSynth.git

# create your new env
conda create -n deep_synth python>=3.7 
# activate it
conda activate deep_synth
# install pip
yes | conda install pip
# install this package and the dependencies
pip install torch cython tqdm numpy matplotlib
pip install git+https://github.com/MaxHalford/vose
# For flashfill dataset
pip install sexpdata
# If you want to do the parallel experiments
pip install ray

# You are good to go :)
# To test your installation you can run the following tests:
python unit_test_algorithms.py
python unit_test_programs.py
python unit_test_algorithms.py
python unit_test_predictions.py
# Only if you installed ray
python unit_test_parallel.py

File structure

./
        Algorithms/      # the search algorithms + parallel pipeline
        DSL/             # DSL: dreamcoder, deepcoder, flashfill
        list_dataset/    # DreamCoder dataset in pickle format
        Predictions/     # all files related to the ANN for prediction of the grammars 

Reproducing the experiments

All of the files mentioned in this section are located in the root folder and follow this pattern run_*_experiments*.py.

Here is a short summary of each experiment:

  • run_random_PCFG_search.py produce a list of all programs generated under Xsec of search time by all algorithms.
  • run_random_PCFG_search_parallel.py same experiment but iwth the grammar_splitter and multiple CPUs.
  • run_experiments_ .py try to find solutions using an ANN to predict the grammar and for each algorithm logs the search data for the corresponding . The suffix parallel can also be found indicating that the algorithms are run in parallel. The semantics experiments in the paper used a trained model thatn can be obtained using produce_network.py or directly in the repository. The results can be plotted using plot_results_semantics.py.

Note that for the DreamCoder experiment in our paper, we did not use the cached evaluation of HeapSearch, this can be reproduced by setting use_heap_search_cached_eval to False in run_experiment.py.

Quick guide to using ANN to predict a grammar

Is it heavily inspired by the file model_loader.py.

First we create a prediction model:

############################
##### Hyperparameters ######
############################

max_program_depth = 4

size_max = 10  # maximum number of elements in a list (input or output)
nb_inputs_max = 2  # maximum number of inputs in an IO
lexicon = list(range(30))  # all elements of a list must be from lexicon
# only useful for VariableSizeEncoding
encoding_output_dimension = 30  # fixing the dimension

embedding_output_dimension = 10
# only useful for RNNEmbedding
number_layers_RNN = 1

size_hidden = 64

############################
######### PCFG #############
############################

deepcoder = DSL(semantics, primitive_types)
type_request = Arrow(List(INT), List(INT))
deepcoder_cfg = deepcoder.DSL_to_CFG(
    type_request, max_program_depth=max_program_depth)
deepcoder_pcfg = deepcoder_cfg.CFG_to_Uniform_PCFG()

############################
###### IO ENCODING #########
############################

# IO = [[I1, ...,Ik], O]
# I1, ..., Ik, O are lists
# IOs = [IO1, IO2, ..., IOn]
# task = (IOs, program)
# tasks = [task1, task2, ..., taskp]

#### Specification: #####
# IOEncoder.output_dimension: size of the encoding of one IO
# IOEncoder.lexicon_size: size of the lexicon
# IOEncoder.encode_IO: outputs a tensor of dimension IOEncoder.output_dimension
# IOEncoder.encode_IOs: inputs a list of IO of size n
# and outputs a tensor of dimension n * IOEncoder.output_dimension

IOEncoder = FixedSizeEncoding(
    nb_inputs_max=nb_inputs_max,
    lexicon=lexicon,
    size_max=size_max,
)


# IOEncoder = VariableSizeEncoding(
#     nb_inputs_max = nb_inputs_max,
#     lexicon = lexicon,
#     output_dimension = encoding_output_dimension,
#     )

############################
######### EMBEDDING ########
############################

# IOEmbedder = SimpleEmbedding(
#     IOEncoder=IOEncoder,
#     output_dimension=embedding_output_dimension,
#     size_hidden=size_hidden,
# )
 
IOEmbedder = RNNEmbedding(
    IOEncoder=IOEncoder,
    output_dimension=embedding_output_dimension,
    size_hidden=size_hidden,
    number_layers_RNN=number_layers_RNN,
)

#### Specification: #####
# IOEmbedder.output_dimension: size of the output of the embedder
# IOEmbedder.forward_IOs: inputs a list of IOs
# and outputs the embedding of the encoding of the IOs
# which is a tensor of dimension
# (IOEmbedder.input_dimension, IOEmbedder.output_dimension)
# IOEmbedder.forward: same but with a batch of IOs

############################
######### MODEL ############
############################

model = RulesPredictor(
    cfg=deepcoder_cfg,
    IOEncoder=IOEncoder,
    IOEmbedder=IOEmbedder,
    size_hidden=size_hidden,
)

# model = LocalRulesPredictor(
#     cfg = deepcoder_cfg,
#     IOEncoder = IOEncoder,
#     IOEmbedder = IOEmbedder,
#     # size_hidden = size_hidden,
#     )

Now we can produce the grammars:

dsl = DSL(semantics, primitive_types)
batched_grammars = model(batched_examples)
if isinstance(model, RulesPredictor):
    batched_grammars = model.reconstruct_grammars(batched_grammars)

Quick guide to train a neural network

Just copy the model initialisation used in your experiment in the file produce_network.py or use the ones provided that correspond to our experiments. You can change the hyperparameters, then run the script. A .weights file should appear at the root folder. This will train a neural network on random generated programs as described in Appendix F in the paper.

Quick guide to using a search algorithm for a grammar

There are already functions for that in run_experiment.py, namely run_algorithm and run_algorithm_parallel. The former enables you to run the specified algorithm in a single thread while the latter in parallel with a grammar splitter. To produce a is_correct function you can use make_program_checker in experiment_helper.py.

How to download the DeepCoder dataset?

First, download the archive from here (Deepcoder repo): https://storage.googleapis.com/deepcoder/dataset.tar.gz in a folder deepcoder_dataset at the root of DeepSynth. Then you simply need to:

gunzip dataset.tar.gz
tar -xf dataset.tar

You should see a few JSON files.

You might also like...
A simple python program that can be used to implement user authentication tokens into your program...

token-generator A simple python module that can be used by developers to implement user authentication tokens into your program... code examples creat

Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

VGGFace2-HQ - A high resolution face dataset for face editing purpose
VGGFace2-HQ - A high resolution face dataset for face editing purpose

The first open source high resolution dataset for face swapping!!! A high resolution version of VGGFace2 for academic face editing purpose

MAME is a multi-purpose emulation framework.

MAME's purpose is to preserve decades of software history. As electronic technology continues to rush forward, MAME prevents this important "vintage" software from being lost and forgotten.

A general 3D Object Detection codebase in PyTorch.

Det3D is the first 3D Object Detection toolbox which provides off the box implementations of many 3D object detection algorithms such as PointPillars, SECOND, PIXOR, etc, as well as state-of-the-art methods on major benchmarks like KITTI(ViP) and nuScenes(CBGS).

Scikit-learn compatible estimation of general graphical models
Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

(CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

ClassSR (CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic Paper Authors: Xiangtao Kong, Hengyuan

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch
Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

Perceiver - Pytorch Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch Install $ pip install perceiver-pytorch Usage

Comments
  • Questions about the installation instructions.

    Questions about the installation instructions.

    Hi Nathanaël,

    I started to review your JOSS submission and have some questions about the installation part in the README.

    Quote the version specification

    conda create -n deep_synth python>=3.7 
    

    should be changed to the following, otherwise, it's not accepted by some shells such as zsh.

    conda create -n deep_synth "python>=3.7"
    

    How to install PyTorch

    I would recommend providing the compatible PyTorch version requirements and some potential commands to install the compatible versions (such as different CUDA/CPU versions). Since conda env is already created, one can also install PyTorch via conda.

    > pip install torch cython tqdm numpy matplotlib
    
    ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
    ERROR: No matching distribution found for torch
    

    Missing pip package

    pip install scipy  # required by unit_tests_algorithms.py
    

    Correct the script names

    python unit_test_algorithms.py
    python unit_test_programs.py
    python unit_test_algorithms.py
    python unit_test_predictions.py
    # Only if you installed ray
    python unit_test_parallel.py
    

    The script name should be corrected.

    python unit_tests_algorithms.py
    python unit_tests_programs.py
    python unit_tests_algorithms.py
    python unit_tests_predictions.py
    

    Missing file for unit_test_parallel.py.

    Fail to run the tests

    > python unit_tests_algorithms.py
    Traceback (most recent call last):
      File "/myapps/research/synthesis/DeepSynth/unit_tests_algorithms.py", line 11, in <module>
        from dsl import DSL
      File "/myapps/research/synthesis/DeepSynth/dsl.py", line 6, in <module>
        from cfg import CFG
      File "/myapps/research/synthesis/DeepSynth/cfg.py", line 4, in <module>
        from pcfg_logprob import LogProbPCFG
      File "/myapps/research/synthesis/DeepSynth/pcfg_logprob.py", line 7, in <module>
        import vose
      File "/home/aplusplus/anaconda3/envs/deep_synth/lib/python3.9/site-packages/vose/__init__.py", line 1, in <module>
        from .sampler import Sampler
      File "vose/sampler.pyx", line 1, in init vose.sampler
    ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 232 from C header, got 216 from PyObject
    

    A specific package version may be needed.

    Best, Shengwei

    opened by njuaplusplus 5
Releases(joss-release)
  • joss-release(Oct 13, 2022)

    What's Changed

    • More documentation and addition of guide to use the software.
    • Install requirements by @bzz in https://github.com/nathanael-fijalkow/DeepSynth/pull/3
    Source code(tar.gz)
    Source code(zip)
Owner
Nathanaël Fijalkow
Computer science researcher
Nathanaël Fijalkow
Amazing-Python-Scripts - 🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.

📑 Introduction A curated collection of Amazing Python scripts from Basics to Advance with automation task scripts. This is your Personal space to fin

Avinash Ranjan 1.1k Dec 29, 2022
[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

involution Official implementation of a neural operator as described in Involution: Inverting the Inherence of Convolution for Visual Recognition (CVP

Duo Li 1.3k Dec 28, 2022
A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available.

Use this instead: https://github.com/facebookresearch/maskrcnn-benchmark A Pytorch Implementation of Detectron Example output of e2e_mask_rcnn-R-101-F

Roy 2.8k Dec 29, 2022
[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

3DVG-Transformer This repository is for the ICCV 2021 paper "3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds" Our method "3DV

22 Dec 11, 2022
Rust bindings for the C++ api of PyTorch.

tch-rs Rust bindings for the C++ api of PyTorch. The goal of the tch crate is to provide some thin wrappers around the C++ PyTorch api (a.k.a. libtorc

Laurent Mazare 2.3k Dec 30, 2022
Finite difference solution of 2D Poisson equation. Can handle Dirichlet, Neumann and mixed boundary conditions.

Poisson-solver-2D Finite difference solution of 2D Poisson equation Current version can handle Dirichlet, Neumann, and mixed (combination of Dirichlet

Mohammad Asif Zaman 34 Dec 23, 2022
Code accompanying our paper Feature Learning in Infinite-Width Neural Networks

Empirical Experiments in "Feature Learning in Infinite-width Neural Networks" This repo contains code to replicate our experiments (Word2Vec, MAML) in

Edward Hu 37 Dec 14, 2022
A deep learning model for style-specific music generation.

DeepJ: A model for style-specific music generation https://arxiv.org/abs/1801.00887 Abstract Recent advances in deep neural networks have enabled algo

Henry Mao 704 Nov 23, 2022
Band-Adaptive Spectral-Spatial Feature Learning Neural Network for Hyperspectral Image Classification

Band-Adaptive Spectral-Spatial Feature Learning Neural Network for Hyperspectral Image Classification

258 Dec 29, 2022
Resources for the Ki testnet challenge

Ki Testnet Challenge This repository hosts ki-testnet-challenge. A set of scripts and resources to be used for the Ki Testnet Challenge What is the te

Ki Foundation 23 Aug 08, 2022
Official Pytorch and JAX implementation of "Efficient-VDVAE: Less is more"

The Official Pytorch and JAX implementation of "Efficient-VDVAE: Less is more" Arxiv preprint Louay Hazami   ·   Rayhane Mama   ·   Ragavan Thurairatn

Rayhane Mama 144 Dec 23, 2022
rastrainer is a QGIS plugin to training remote sensing semantic segmentation model based on PaddlePaddle.

rastrainer rastrainer is a QGIS plugin to training remote sensing semantic segmentation model based on PaddlePaddle. UI TODO Init UI. Add Block. Add l

deepbands 5 Mar 04, 2022
Official implementation of CVPR2020 paper "Deep Generative Model for Robust Imbalance Classification"

Deep Generative Model for Robust Imbalance Classification Deep Generative Model for Robust Imbalance Classification Xinyue Wang, Yilin Lyu, Liping Jin

9 Nov 01, 2022
[CVPR'22] Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast

wseg Overview The Pytorch implementation of Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast. [arXiv] Though image-level weakly

Ye Du 96 Dec 30, 2022
OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages

OCR-Streamlit-App OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages OCR app gets an image a

Siva Prakash 5 Apr 05, 2022
PERIN is Permutation-Invariant Semantic Parser developed for MRP 2020

PERIN: Permutation-invariant Semantic Parsing David Samuel & Milan Straka Charles University Faculty of Mathematics and Physics Institute of Formal an

ÚFAL 40 Jan 04, 2023
Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary Differential Equations

ODE GAN (Prototype) in PyTorch Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary

Somshubra Majumdar 15 Feb 10, 2022
ConformalLayers: A non-linear sequential neural network with associative layers

ConformalLayers: A non-linear sequential neural network with associative layers ConformalLayers is a conformal embedding of sequential layers of Convo

Prograf-UFF 5 Sep 28, 2022
Explaining in Style: Training a GAN to explain a classifier in StyleSpace

Explaining in Style: Official TensorFlow Colab Explaining in Style: Training a GAN to explain a classifier in StyleSpace Oran Lang, Yossi Gandelsman,

Google 197 Nov 08, 2022
Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

Maths from examples - Learning advanced mathematical computations from examples This is the source code and data sets relevant to the paper Learning a

Facebook Research 171 Nov 23, 2022