Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

Overview

HiFi-GAN+

This project is an unoffical implementation of the HiFi-GAN+ model for audio bandwidth extension, from the paper Bandwidth Extension is All You Need by Jiaqi Su, Yunyun Wang, Adam Finkelstein, and Zeyu Jin.

The model takes a band-limited audio signal (usually 8/16/24kHz) and attempts to reconstruct the high frequency components needed to restore a full-band signal at 48kHz. This is useful for upsampling low-rate outputs from upstream tasks like text-to-speech, voice conversion, etc. or enhancing audio that was filtered to remove high frequency noise. For more information, please see this blog post.

Status

PyPI Tests Coveralls DOI

Wandb Gradio Colab

Usage

The example below uses a pretrained HiFi-GAN+ model to upsample a 1 second 24kHz sawtooth to 48kHz.

import torch
from hifi_gan_bwe import BandwidthExtender

model = BandwidthExtender.from_pretrained("hifi-gan-bwe-10-42890e3-vctk-48kHz")

fs = 24000
x = torch.full([fs], 261.63 / fs).cumsum(-1) % 1.0 - 0.5
y = model(x, fs)

There is a Gradio demo on HugggingFace Spaces where you can upload audio clips and run the model. You can also run the model on Colab with this notebook.

Running with pipx

The HiFi-GAN+ library can be run directly from PyPI if you have the pipx application installed. The following script uses a hosted pretrained model to upsample an MP3 file to 48kHz. The input audio can be in any format supported by the audioread library, and the output can be in any format supported by soundfile.

pipx run --python=python3.9 hifi-gan-bwe \
  hifi-gan-bwe-10-42890e3-vctk-48kHz \
  input.mp3 \
  output.wav

Running in a Virtual Environment

If you have a Python 3.9 virtual environment installed, you can install the HiFi-GAN+ library into it and run synthesis, training, etc. using it.

pip install hifi-gan-bwe

hifi-synth hifi-gan-bwe-10-42890e3-vctk-48kHz input.mp3 output.wav

Pretrained Models

The following models can be loaded with BandwidthExtender.from_pretrained and used for audio upsampling. You can also download the model file from the link and use it offline.

Name Sample Rate Parameters Wandb Metrics Notes
hifi-gan-bwe-10-42890e3-vctk-48kHz 48kHz 1M bwe-10-42890e3 Same as bwe-05, but uses bandlimited interpolation for upsampling, for reduced noise and aliasing. Uses the same parameters as resampy's kaiser_best mode.
hifi-gan-bwe-11-d5f542d-vctk-8kHz-48kHz 48kHz 1M bwe-11-d5f542d Same as bwe-10, but trained only on 8kHz sources, for specialized upsampling.
hifi-gan-bwe-12-b086d8b-vctk-16kHz-48kHz 48kHz 1M bwe-12-b086d8b Same as bwe-10, but trained only on 16kHz sources, for specialized upsampling.
hifi-gan-bwe-13-59f00ca-vctk-24kHz-48kHz 48kHz 1M bwe-13-59f00ca Same as bwe-10, but trained only on 24kHz sources, for specialized upsampling.
hifi-gan-bwe-05-cd9f4ca-vctk-48kHz 48kHz 1M bwe-05-cd9f4ca Trained for 200K iterations on the VCTK speech dataset with noise agumentation from the DNS Challenge dataset.

Training

If you want to train your own model, you can use any of the methods above to install/run the library or fork the repo and run the script commands locally. The following commands are supported:

Name Description
hifi-train Starts a new training run, pass in a name for the run.
hifi-clone Clone an existing training run at a given or the latest checkpoint.
hifi-export Optimize a model for inference and export it to a PyTorch model file (.pt).
hifi-synth Run model inference using a trained model on a source audio file.

For example, you might start a new training run called bwe-01 with the following command:

hifi-train 01

To train a model, you will first need to download the VCTK and DNS Challenge datasets. By default, these datasets are assumed to be in the ./data/vctk and ./data/dns directories. See train.py for how to specify your own training data directories. If you want to use a custom training dataset, you can implement a dataset wrapper in datasets.py.

The training scripts use wandb.ai for experiment tracking and visualization. Wandb metrics can be disabled by passing --no_wandb to the training script. All of my own experiment results are publicly available at wandb.ai/brentspell/hifi-gan-bwe.

Each training run is identified by a name and a git hash (ex: bwe-01-8abbca9). The git hash is used for simple experiment tracking, reproducibility, and model provenance. Using git to manage experiments also makes it easy to change model hyperparameters by simply changing the code, making a commit, and starting the training run. This is why there is no hyperparameter configuration file in the project, since I often end up having to change the code anyway to run interesting experiments.

Development

Setup

The following script creates a virtual environment using pyenv for the project and installs dependencies.

pyenv install 3.9.10
pyenv virtualenv 3.9.10 hifi-gan-bwe
pip install -r requirements.txt

If you want to run the hifi-* scripts described above in development, you can install the package locally:

pip install -e .

You can then run tests, etc. follows:

pytest --cov=hifi_gan_bwe
black .
isort --profile=black .
flake8 .
mypy .

These checks are also included in the pre-commit configuration for the project, so you can set them up to run automatically on commit by running

pre-commit install

Acknowledgements

The original research on the HiFi-GAN+ model is not my own, and all credit goes to the paper's authors. I also referred to kan-bayashi's excellent Parallel WaveGAN implementation, specifically the WaveNet module. If you use this code, please cite the original paper:

@inproceedings{su2021bandwidth,
  title={Bandwidth extension is all you need},
  author={Su, Jiaqi and Wang, Yunyun and Finkelstein, Adam and Jin, Zeyu},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={696--700},
  year={2021},
  organization={IEEE},
  url={https://doi.org/10.1109/ICASSP39728.2021.9413575},
}

License

Copyright © 2022 Brent M. Spell

Licensed under the MIT License (the "License"). You may not use this package except in compliance with the License. You may obtain a copy of the License at

https://opensource.org/licenses/MIT

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Owner
Brent M. Spell
Brent M. Spell
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

DLR-RM 4.7k Jan 01, 2023
This repo is developed for Strong Baseline For Vehicle Re-Identification in Track 2 Ai-City-2021 Challenges

A STRONG BASELINE FOR VEHICLE RE-IDENTIFICATION This paper is accepted to the IEEE Conference on Computer Vision and Pattern Recognition Workshop(CVPR

Cybercore Co. Ltd 78 Dec 29, 2022
RepVGG: Making VGG-style ConvNets Great Again

RepVGG: Making VGG-style ConvNets Great Again (PyTorch) This is a super simple ConvNet architecture that achieves over 80% top-1 accuracy on ImageNet

2.8k Jan 04, 2023
Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 130+ Indicators

Pandas TA - A Technical Analysis Library in Python 3 Pandas Technical Analysis (Pandas TA) is an easy to use library that leverages the Pandas package

Kevin Johnson 3.2k Jan 09, 2023
Unofficial implementation of "Coordinate Attention for Efficient Mobile Network Design"

Unofficial implementation of "Coordinate Attention for Efficient Mobile Network Design". CoordAttention tensorflow slim

Billy 9 Aug 22, 2022
Image Completion with Deep Learning in TensorFlow

Image Completion with Deep Learning in TensorFlow See my blog post for more details and usage instructions. This repository implements Raymond Yeh and

Brandon Amos 1.3k Dec 23, 2022
Simultaneous Demand Prediction and Planning

Simultaneous Demand Prediction and Planning Dependencies Python packages: Pytorch, scikit-learn, Pandas, Numpy, PyYAML Data POI: data/poi Road network

Yizong Wang 1 Sep 01, 2022
Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetu

3 Dec 05, 2022
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

TUCH This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright License fo

Lea Müller 45 Jan 07, 2023
DGCNN - Dynamic Graph CNN for Learning on Point Clouds

DGCNN is the author's re-implementation of Dynamic Graph CNN, which achieves state-of-the-art performance on point-cloud-related high-level tasks including category classification, semantic segmentat

Wang, Yue 1.3k Dec 26, 2022
ML models and internal tensors 3D visualizer

The free Zetane Viewer is a tool to help understand and accelerate discovery in machine learning and artificial neural networks. It can be used to ope

Zetane Systems 787 Dec 30, 2022
A collection of inference modules for fastai2

fastinference A collection of inference modules for fastai including inference speedup and interpretability Install pip install fastinference There ar

Zachary Mueller 83 Oct 10, 2022
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

Kingdrone 174 Dec 22, 2022
Code release for Hu et al. Segmentation from Natural Language Expressions. in ECCV, 2016

Segmentation from Natural Language Expressions This repository contains the code for the following paper: R. Hu, M. Rohrbach, T. Darrell, Segmentation

Ronghang Hu 88 May 24, 2022
A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019).

ClusterGCN ⠀⠀ A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019). A

Benedek Rozemberczki 697 Dec 27, 2022
Dataset Condensation with Contrastive Signals

Dataset Condensation with Contrastive Signals This repository is the official implementation of Dataset Condensation with Contrastive Signals (DCC). T

3 May 19, 2022
Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

VQGAN-CLIP-Docker About Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized This is a stripped and minimal dependency repository for running loca

Kevin Costa 73 Sep 11, 2022
RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

[Paper] [Хабр] [Model Card] [Colab] [Kaggle] RuDOLPH 🦌 🎄 ☃️ One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP Russian Diffusio

AI Forever 232 Jan 04, 2023
TResNet: High Performance GPU-Dedicated Architecture

TResNet: High Performance GPU-Dedicated Architecture paperV2 | pretrained models Official PyTorch Implementation Tal Ridnik, Hussam Lawen, Asaf Noy, I

426 Dec 28, 2022
Weakly- and Semi-Supervised Panoptic Segmentation (ECCV18)

Weakly- and Semi-Supervised Panoptic Segmentation by Qizhu Li*, Anurag Arnab*, Philip H.S. Torr This repository demonstrates the weakly supervised gro

Qizhu Li 159 Dec 20, 2022