Pytorch implementation of Integrating Tree Path in Transformer for Code Representation

Last update: Dec 23, 2022

Related tags

Deep Learning TPTrans

Overview

This is an official Pytorch implementation of the approaches proposed in:

Han Peng, Ge Li, Wenhan Wang, Yunfei Zhao, Zhi Jin “Integrating Tree Path in Transformer for Code Representation”

which appeared at NeurIPS 2021[Paper Link][Poster][Slides].

In this paper, we investigate the interaction between the absolute and relative path encoding, and propose novel code representation model TPTrans and its variants, which introduce path encoding inductive bias into the attention module of Transformer and power Transformer to know the structure of source codes.

Please cite our paper if you use the model, experimental results, or our code in your own work.

1.1 Raw data

To run experiments with TPTrans and its variants, please first create datasets from raw code snippets of CodeSearchNet dataset. Download and unzip the raw jsonl data of CSN into the raw_data dir like that

├── raw_data        
│   ├── python         
│   │   ├── train    
│   │   │   ├── XXXX.jsonl...
│   │   ├── test    
│   │   ├── valid   
│   ├── ruby          
│   ├── go        
│   ├── javascript

1.2 Tree-Sitter

The Tree-Sitter is a open-source parser for multi-language programming languages. Please install it and then download the grammer files into vendor dir for four different programming languages like that

├── vendor        
│   ├── tree-sitter-python  (from https://github.com/tree-sitter/tree-sitter-python)         
│   ├── tree-sitter-javascript  (from https://github.com/tree-sitter/tree-sitter-javascript)     
│   ├── tree-sitter-go  (from https://github.com/tree-sitter/tree-sitter-go)
│   ├── tree-sitter-ruby  (from https://github.com/tree-sitter/tree-sitter-ruby)

After that, run the multi_language_parse.py in parser dir to parse the raw code snippets into the data dir.

1.3 Training

After preprocessing, run the _main.py_ to train the model.

To run the TPTrans, please specify the relation_path=True and absolute_path=False.

To run the TPTrans-\alpha, please specify the relation_path=True and absolute_path=True.

For other command triggers, please refer the comment inline for details.

Contact If you have any questions, please contact me via email: [email protected] or open issue on Github.

Pytorch implementation of Integrating Tree Path in Transformer for Code Representation

Related tags

Overview

1.1 Raw data

1.2 Tree-Sitter

1.3 Training

Owner

Han Peng

A fast python implementation of Ray Tracing in One Weekend using python and Taichi

This repo in the implementation of EMNLP'21 paper "SPARQLing Database Queries from Intermediate Question Decompositions" by Irina Saparina, Anton Osokin

Learning Neural Network Subspaces

Iranian Cars Detection using Yolov5s, PyTorch

HAT: Hierarchical Aggregation Transformers for Person Re-identification

BasicNeuralNetwork - This project looks over the basic structure of a neural network and how machine learning training algorithms work

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Attention over nodes in Graph Neural Networks using PyTorch (NeurIPS 2019)

Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral)

A "gym" style toolkit for building lightweight Neural Architecture Search systems

Python Single Object Tracking Evaluation

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

DeepStruc is a Conditional Variational Autoencoder which can predict the mono-metallic nanoparticle from a Pair Distribution Function.

Code for KDD'20 "Generative Pre-Training of Graph Neural Networks"

Implementation of SSMF: Shifting Seasonal Matrix Factorization

Deep Ensemble Learning with Jet-Like architecture

VACA: Designing Variational Graph Autoencoders for Interventional and Counterfactual Queries

Code for "On the Effects of Batch and Weight Normalization in Generative Adversarial Networks"

An open source Python package for plasma science that is under development