Python codes for Lite Audio-Visual Speech Enhancement.

Last update: Dec 01, 2022

Related tags

Deep Learning LAVSE

Overview

Lite Audio-Visual Speech Enhancement (Interspeech 2020)

Introduction

This is the PyTorch implementation of Lite Audio-Visual Speech Enhancement (LAVSE).

We have also put some preprocessed sample data (including enhanced results) in this repository.

The dataset of TMSV (Taiwan Mandarin speech with video) used in LAVSE is released here.

Please cite the following paper if you find the codes useful in your research.

@inproceedings{chuang2020lite,
  title={Lite Audio-Visual Speech Enhancement},
  author={Chuang, Shang-Yi and Tsao, Yu and Lo, Chen-Chou and Wang, Hsin-Min},
  booktitle={Proc. Interspeech 2020}
}

Prerequisites

Ubuntu 18.04
Python 3.6
CUDA 10

You can use pip to install Python depedencies.

pip install -r requirements.txt

Usage

You can simply enter the command below and the average PESQ and STOI results will show on your terminal pane.

Remember to activate visdom (probably in a screen or tmux) for recording the training loss before bashing the script.

bash run.sh

Go check run.sh if you need further information about the command lines.

License

The LAVSE work is released under MIT License.

See LICENSE for more details.

Acknowledgments

Bio-ASP Lab, CITI, Academia Sinica, Taipei, Taiwan
SLAM Lab, IIS, Academia Sinica, Taipei, Taiwan

Python codes for Lite Audio-Visual Speech Enhancement.

Related tags

Overview

Lite Audio-Visual Speech Enhancement (Interspeech 2020)

Introduction

Prerequisites

Usage

License

Acknowledgments

Owner

Shang-Yi Chuang

Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

MCMC samplers for Bayesian estimation in Python, including Metropolis-Hastings, NUTS, and Slice

「PyTorch Implementation of AnimeGANv2」を用いて、生成した顔画像を元の画像に上書きするデモ

A setup script to generate ITK Python Wheels

Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary Differential Equations

NeuroGen: activation optimized image synthesis for discovery neuroscience

Conditional Gradients For The Approximately Vanishing Ideal

This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.

Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

Neural Scene Flow Fields using pytorch-lightning, with potential improvements

NeurIPS-2021: Neural Auto-Curricula in Two-Player Zero-Sum Games.

FANet - Real-time Semantic Segmentation with Fast Attention

SOLO and SOLOv2 for instance segmentation, ECCV 2020 & NeurIPS 2020.

Pytorch code for "Text-Independent Speaker Verification Using 3D Convolutional Neural Networks".

GAN-based Matrix Factorization for Recommender Systems

Rapid experimentation and scaling of deep learning models on molecular and crystal graphs.

Code of paper "Compositionally Generalizable 3D Structure Prediction"

Code for "Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search"

This is a Keras implementation of a CNN for estimating age, gender and mask from a camera.

Augmentation for Single-Image-Super-Resolution