Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Last update: Dec 30, 2022

Related tags

Text Data & NLP PLBART

Overview

PLBART

Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021.

Note. A detailed documentation is coming soon.

Pre-training data

PLBART is pre-trained on Java and Python functions and natural language descriptions collected from Github and StackOverflow.

Evaluation tasks

We evaluated PLBART on five tasks.

Code summarization [REF]
Code generation [REF]
Code translation [REF]
Clone detection [REF]
Vulnerability REF [REF]

Notes

We will publish the pretrained PLBART checkpoint soon.
We list all the files in this repository here.

Acknowledgement

PLBART uses Fairseq, codeXglue, and TransCoder and thanks the authors of these works for their contribution.

Citation

@inproceedings{ahmad2020summarization,
    author = {Ahmad, Wasi Uddin and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
    booktitle = {Proceedings of the 2021 Conference of the North {A}merican Chapter of the Association for Computational Linguistics},
    title = {Unified Pre-training for Program Understanding and Generation},
    year = {2021}
}

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Related tags

Overview

PLBART

Pre-training data

Evaluation tasks

Notes

Acknowledgement

Citation

Owner

Wasi Ahmad

A simple implementation of N-gram language model.

American Sign Language (ASL) to Text Converter

AI-powered literature discovery and review engine for medical/scientific papers

Just a basic Telegram AI chat bot written in Python using Pyrogram.

Implementation of Fast Transformer in Pytorch

Making text a first-class citizen in TensorFlow.

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

Training code of Spatial Time Memory Network. Semi-supervised video object segmentation.

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

🤕 spelling exceptions builder for lazy people

⚖️ A Statutory Article Retrieval Dataset in French.

A crowdsourced dataset of dialogues grounded in social contexts involving utilization of commonsense.

Python library for Serbian Natural language processing (NLP)

Higher quality textures for the Metal Gear Solid series.

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

Paddlespeech Streaming ASR GUI

Almost State-of-the-art Text Generation library

translate using your voice

Yet Another Neural Machine Translation Toolkit

A natural language processing model for sequential sentence classification in medical abstracts.