ASCEND Chinese-English code-switching dataset

Last update: Dec 09, 2022

Related tags

Overview

ASCEND

ASCEND (A Spontaneous Chinese-English Dataset) introduces a high-quality resource of spontaneous multi-turn conversational dialogue Chinese-English code-switching corpus collected in Hong Kong.

Download the dataset

You can find ASCEND at HuggingFace

Download ASCEND

git lfs install
git clone https://huggingface.co/datasets/CAiRE/ASCEND
# if you want to clone without large files – just their pointers
# prepend your git clone with the following env var:
GIT_LFS_SKIP_SMUDGE=1

Cite us

@misc{lovenia2021ascend, title={ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation}, author={Holy Lovenia and Samuel Cahyawijaya and Genta Indra Winata and Peng Xu and Xu Yan and Zihan Liu and Rita Frieske and Tiezheng Yu and Wenliang Dai and Elham J. Barezi and Pascale Fung}, year={2021}, eprint={2112.06223}, archivePrefix={arXiv}, primaryClass={cs.CL} }

Owner

CAiRE

GitHub Repository

PyTorch implementation of the paper: Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding This repository contains the official PyTorch implementation of th

26 Dec 14, 2022

A library that integrates huggingface transformers with the world of fastai, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models.

blurr A library that integrates huggingface transformers with version 2 of the fastai framework Install You can now pip install blurr via pip install

253 Dec 31, 2022

ASCEND Chinese-English code-switching dataset

Related tags

Overview

ASCEND

Download the dataset

Cite us

Owner

CAiRE

PyTorch implementation of the paper: Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

A library that integrates huggingface transformers with the world of fastai, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models.

iBOT: Image BERT Pre-Training with Online Tokenizer

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Awesome Treasure of Transformers Models Collection

Enterprise Scale NLP with Hugging Face & SageMaker Workshop series

TFIDF-based QA system for AIO2 competition

VD-BERT: A Unified Vision and Dialog Transformer with BERT

Club chatbot

Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

SentAugment is a data augmentation technique for semi-supervised learning in NLP.

Installation, test and evaluation of Scribosermo speech-to-text engine

CATs: Semantic Correspondence with Transformers

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

Yet another Python binding for fastText

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time