Catbird is an open source paraphrase generation toolkit based on PyTorch.

Last update: Dec 15, 2022

Overview

Catbird is an open source paraphrase generation toolkit based on PyTorch.

Quick Start

Requirements and Installation

The project is based on PyTorch 1.5+ and Python 3.6+.

Install Catbird

The package can be installed using pip:

pip install catbird

This does not include configuration files or tools. Alternatively, you can run from the source code:

a. Clone the repository.

git clone https://github.com/AfonsoSalgadoSousa/catbird.git

b. Install dependencies. This project uses Poetry as its package manager. There should Make sure you have it installed. For more info check Poetry's official documentation. To install dependencies, simply run:

poetry install

Dataset Preparation

For now, we only work with the Quora Question Pairs dataset. It is recommended to download and extract the datasets somewhere outside the project directory and symlink the dataset root to $CATBIRD/data as below. If your folder structure is different, you may need to change the corresponding paths in config files.

catbird
├── catbird
├── tools
├── configs
├── data
│   ├── quora
│   │   ├── quora_duplicate_questions.tsv

We use the HuggingFace Datasets library to load the datasets.

Train

poetry run python tools/train.py ${CONFIG_FILE} [optional arguments]

Example:

Train T5 on QQP.

$ python tools/train.py configs/t5_quora.yaml

Contributors

Afonso Sousa ([email protected])

Catbird is an open source paraphrase generation toolkit based on PyTorch.

Related tags

Overview

Quick Start

Requirements and Installation

Install Catbird

Dataset Preparation

Train

Contributors

Owner

Afonso Salgado de Sousa

DeepStochlog Package For Python

Framework for training options with different attention mechanism and using them to solve downstream tasks.

Paddle-Adversarial-Toolbox (PAT) is a Python library for Deep Learning Security based on PaddlePaddle.

Companion code for the paper Theoretical characterization of uncertainty in high-dimensional linear classification

IJON is an annotation mechanism that analysts can use to guide fuzzers such as AFL.

PyTorch implementation of Graph Convolutional Networks in Feature Space for Image Deblurring and Super-resolution, IJCNN 2021.

[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

[SIGGRAPH 2022 Journal Track] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

SOFT: Softmax-free Transformer with Linear Complexity, NeurIPS 2021 Spotlight

A hybrid framework (neural mass model + ML) for SC-to-FC prediction

Source code for the ACL-IJCNLP 2021 paper entitled "T-DNA: Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation" by Shizhe Diao et al.

MINOS: Multimodal Indoor Simulator

CRISCE: Automatically Generating Critical Driving Scenarios From Car Accident Sketches

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Council-GAN - Implementation for our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020)

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

Codebase for testing whether hidden states of neural networks encode discrete structures.

FTIR-Deep Learning - FTIR Deep Learning With Python