Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Last update: Nov 07, 2022

Related tags

Text Data & NLP SpeechMix

Overview

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Related tags

Overview

SpeechMix

Introduction

Speech encoder NLP decoder

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

Speech encoder NLP encoder decoder

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

Speech encoder NLP encoder decoder only fine-tune on speech encoder

Installation

pip install

Build from source

Owner

Eric Lam

Fidibo.com comments Sentiment Analyser

Various Algorithms for Short Text Mining

Unofficial PyTorch implementation of Google AI's VoiceFilter system

A paper list for aspect based sentiment analysis.

📔️ Generate a text-based journal from a template file.

TLA - Twitter Linguistic Analysis

Text vectorization tool to outperform TFIDF for classification tasks

Few-shot Natural Language Generation for Task-Oriented Dialog

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

This project converts your human voice input to its text transcript and to an automated voice too.

Nmt - TensorFlow Neural Machine Translation Tutorial

Pattern Matching in Python

Nateve compiler developed with python.

Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)

Source code for CsiNet and CRNet using Fully Connected Layer-Shared feedback architecture.

💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)