Pytorch implementation of Integrating Tree Path in Transformer for Code Representation

Related tags

Deep LearningTPTrans
Overview

This is an official Pytorch implementation of the approaches proposed in:

Han Peng, Ge Li, Wenhan Wang, Yunfei Zhao, Zhi Jin “Integrating Tree Path in Transformer for Code Representation”

which appeared at NeurIPS 2021[Paper Link][Poster][Slides].

In this paper, we investigate the interaction between the absolute and relative path encoding, and propose novel code representation model TPTrans and its variants, which introduce path encoding inductive bias into the attention module of Transformer and power Transformer to know the structure of source codes.

Please cite our paper if you use the model, experimental results, or our code in your own work.

1.1 Raw data

To run experiments with TPTrans and its variants, please first create datasets from raw code snippets of CodeSearchNet dataset. Download and unzip the raw jsonl data of CSN into the raw_data dir like that

├── raw_data        
│   ├── python         
│   │   ├── train    
│   │   │   ├── XXXX.jsonl...
│   │   ├── test    
│   │   ├── valid   
│   ├── ruby          
│   ├── go        
│   ├── javascript        

1.2 Tree-Sitter

The Tree-Sitter is a open-source parser for multi-language programming languages. Please install it and then download the grammer files into vendor dir for four different programming languages like that

├── vendor        
│   ├── tree-sitter-python  (from https://github.com/tree-sitter/tree-sitter-python)         
│   ├── tree-sitter-javascript  (from https://github.com/tree-sitter/tree-sitter-javascript)     
│   ├── tree-sitter-go  (from https://github.com/tree-sitter/tree-sitter-go)
│   ├── tree-sitter-ruby  (from https://github.com/tree-sitter/tree-sitter-ruby)

After that, run the multi_language_parse.py in parser dir to parse the raw code snippets into the data dir.

1.3 Training

After preprocessing, run the _main.py_ to train the model.

To run the TPTrans, please specify the relation_path=True and absolute_path=False.

To run the TPTrans-\alpha, please specify the relation_path=True and absolute_path=True.

For other command triggers, please refer the comment inline for details.

Contact If you have any questions, please contact me via email: [email protected] or open issue on Github.

Owner
Han Peng
Han Peng
A fast python implementation of Ray Tracing in One Weekend using python and Taichi

ray-tracing-one-weekend-taichi A fast python implementation of Ray Tracing in One Weekend using python and Taichi. Taichi is a simple "Domain specific

157 Dec 26, 2022
This repo in the implementation of EMNLP'21 paper "SPARQLing Database Queries from Intermediate Question Decompositions" by Irina Saparina, Anton Osokin

SPARQLing Database Queries from Intermediate Question Decompositions This repo is the implementation of the following paper: SPARQLing Database Querie

Yandex Research 20 Dec 19, 2022
Learning Neural Network Subspaces

Learning Neural Network Subspaces Welcome to the codebase for Learning Neural Network Subspaces by Mitchell Wortsman, Maxwell Horton, Carlos Guestrin,

Apple 117 Nov 17, 2022
Iranian Cars Detection using Yolov5s, PyTorch

Iranian Cars Detection using Yolov5 Train 1- git clone https://github.com/ultralytics/yolov5 cd yolov5 pip install -r requirements.txt 2- Dataset ../

Nahid Ebrahimian 22 Dec 05, 2022
HAT: Hierarchical Aggregation Transformers for Person Re-identification

HAT: Hierarchical Aggregation Transformers for Person Re-identification

11 Sep 05, 2022
BasicNeuralNetwork - This project looks over the basic structure of a neural network and how machine learning training algorithms work

BasicNeuralNetwork - This project looks over the basic structure of a neural network and how machine learning training algorithms work. For this project, I used the sigmoid function as an activation

Manas Bommakanti 1 Jan 22, 2022
This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Black-Box-Defense This repository contains the code and models necessary to replicate the results of our recent paper: How to Robustify Black-Box ML M

OPTML Group 2 Oct 05, 2022
Attention over nodes in Graph Neural Networks using PyTorch (NeurIPS 2019)

Intro This repository contains code to generate data and reproduce experiments from our NeurIPS 2019 paper: Boris Knyazev, Graham W. Taylor, Mohamed R

Boris Knyazev 242 Jan 06, 2023
Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral)

Joint Discriminative and Generative Learning for Person Re-identification [Project] [Paper] [YouTube] [Bilibili] [Poster] [Supp] Joint Discriminative

NVIDIA Research Projects 1.2k Dec 30, 2022
A "gym" style toolkit for building lightweight Neural Architecture Search systems

A "gym" style toolkit for building lightweight Neural Architecture Search systems

Jack Turner 12 Nov 05, 2022
Python Single Object Tracking Evaluation

pysot-toolkit The purpose of this repo is to provide evaluation API of Current Single Object Tracking Dataset, including VOT2016 VOT2018 VOT2018-LT OT

348 Dec 22, 2022
[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PointDSC repository PyTorch implementation of PointDSC for CVPR'2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency",

153 Dec 14, 2022
U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."

Xuebin Qin 6.5k Jan 09, 2023
DeepStruc is a Conditional Variational Autoencoder which can predict the mono-metallic nanoparticle from a Pair Distribution Function.

ChemRxiv | [Paper] XXX DeepStruc Welcome to DeepStruc, a Deep Generative Model (DGM) that learns the relation between PDF and atomic structure and the

Emil Thyge Skaaning Kjær 13 Aug 01, 2022
Code for KDD'20 "Generative Pre-Training of Graph Neural Networks"

GPT-GNN: Generative Pre-Training of Graph Neural Networks GPT-GNN is a pre-training framework to initialize GNNs by generative pre-training. It can be

Ziniu Hu 346 Dec 19, 2022
Implementation of SSMF: Shifting Seasonal Matrix Factorization

SSMF Implementation of SSMF: Shifting Seasonal Matrix Factorization, Koki Kawabata, Siddharth Bhatia, Rui Liu, Mohit Wadhwa, Bryan Hooi. NeurIPS, 2021

Koki Kawabata 9 Jun 10, 2022
Deep Ensemble Learning with Jet-Like architecture

Ransomware analysis using DEL with jet-like architecture comprising two CNN wings, a sparse AE tail, a non-linear PCA to produce a diverse feature space, and an MLP nose

Ahsen Nazir 2 Feb 06, 2022
VACA: Designing Variational Graph Autoencoders for Interventional and Counterfactual Queries

VACA Code repository for the paper "VACA: Designing Variational Graph Autoencoders for Interventional and Counterfactual Queries (arXiv)". The impleme

Pablo Sánchez-Martín 16 Oct 10, 2022
Code for "On the Effects of Batch and Weight Normalization in Generative Adversarial Networks"

Note: this repo has been discontinued, please check code for newer version of the paper here Weight Normalized GAN Code for the paper "On the Effects

Sitao Xiang 182 Sep 06, 2021
An open source Python package for plasma science that is under development

PlasmaPy PlasmaPy is an open source, community-developed Python 3.7+ package for plasma science. PlasmaPy intends to be for plasma science what Astrop

PlasmaPy 444 Jan 07, 2023