A two-stage U-Net for high-fidelity denoising of historical recordings

Last update: Jan 05, 2023

Overview

A two-stage U-Net for high-fidelity denoising of historical recordings

Official repository of the paper (not submitted yet):

E. Moliner and V. Välimäki,, "A two-stage U-Net for high-fidelity denosing of historical recordinds", in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Singapore, May, 2022

Abstract

Enhancing the sound quality of historical music recordings is a long-standing problem. This paper presents a novel denoising method based on a fully-convolutional deep neural network. A two-stage U-Net model architecture is designed to model and suppress the degradations with high fidelity. The method processes the time-frequency representation of audio, and is trained using realistic noisy data to jointly remove hiss, clicks, thumps, and other common additive disturbances from old analog discs. The proposed model outperforms previous methods in both objective and subjective metrics. The results of a formal blind listening test show that the method can denoise real gramophone recordings with an excellent quality. This study shows the importance of realistic training data and the power of deep learning in audio restoration.

Listen to our audio samples

Requirements

You will need at least python 3.7 and CUDA 10.1 if you want to use GPU. See requirements.txt for the required package versions.

To install the environment through anaconda, follow the instructions:

conda env update -f environment.yml
conda activate historical_denoiser

Denoising Recordings

Run the following commands to clone the repository and install the pretrained weights of the two-stage U-Net model:

git clone https://github.com/eloimoliner/denoising-historical-recordings.git
cd denoising-historical-recordings
wget https://github.com/eloimoliner/denoising-historical-recordings/releases/download/v0.0/checkpoint.zip
unzip checkpoint.zip /experiments/trained_model/

If the environment is installed correctly, you can denoise an audio file by running:

bash inference.sh "file name"

A ".wav" file with the denoised version, as well as the residual noise and the original signal in "mono", will be generated in the same directory as the input file.

Training

TODO

Comments

Will it work in Windows without CUDA?

Hello, The readme says: "You will need at least python 3.7 and CUDA 10.1 if you want to use GPU."

Unfortunately, my first attempt to run it in Windows without CUDA-supporting VGA failed. There is really no separate environment file for CPU-only? Is it possible to make it work without massive changes to the code?

opened by vitacon 15
installation without conda

Hi,

could you leave some hints about how to install this without conda? Your readme appears to be very much specified to this one case. Also it seems that you develop under linux so you use bash to execute. Maybe here a hint for win- users would be cool too.

I am just trying to get this to run under windows and so far had no success. I will update if I get further. All the best!

opened by GitHubGeniusOverlord 9
strange tensorflow version in requirements.txt

Hi,

when running python -m pip install tensorflow==2.3.0 as indicated in your requirements file, I get

ERROR: Could not find a version that satisfies the requirement tensorflow==2.3.0 (from versions: 2.5.0rc0, 2.5.0rc1, 2.5.0rc2, 2.5.0rc3, 2.5.0, 2.5.1, 2.5.2, 2.6.0rc0, 2.6.0rc1, 2.6.0rc2, 2.6.0, 2.6.1, 2.6.2, 2.7.0rc0, 2.7.0rc1, 2.7.0, 2.8.0rc0) ERROR: No matching distribution found for tensorflow==2.3.0

It seems this version isn't even supported by pip anymore. Upgrade to 2.5.0?

The same is true for scipy==1.4.1. Not sure about which version to take there.

opened by GitHubGeniusOverlord 3
Update inference.sh

Small change to allow spaces in file names. Bash expands the variable $1 correctly even if it is in double quotes, python receives a single argument and not (if there are spaces) multiple arguments.

opened by JorenSix 1
How to start training for denoising?

If I would like to do a denoising task, where I've clean signals (in the "clean" folder) and noisy signals (in the "noise" folder).

opened by listener17 1

Releases(v0.0)

v0.0(Aug 31, 2021)

Uploading pretrained model
Source code(tar.gz)
Source code(zip)
checkpoint.zip(251.80 MB)

Owner

Eloi Moliner Juanpere

Doctoral candidate on audio signal processing at Aalto university.

GitHub Repository

This repository contains a set of codes to run (i.e., train, perform inference with, evaluate) a diarization method called EEND-vector-clustering.

EEND-vector clustering The EEND-vector clustering (End-to-End-Neural-Diarization-vector clustering) is a speaker diarization framework that integrates

45 Dec 26, 2022

A two-stage U-Net for high-fidelity denoising of historical recordings

Related tags

Overview

A two-stage U-Net for high-fidelity denoising of historical recordings

Abstract

Requirements

Denoising Recordings

Training

Comments

Will it work in Windows without CUDA?

installation without conda

strange tensorflow version in requirements.txt

Update inference.sh

How to start training for denoising?

Releases(v0.0)

v0.0(Aug 31, 2021)

Owner

Eloi Moliner Juanpere

Gym for multi-agent reinforcement learning

TransCD: Scene Change Detection via Transformer-based Architecture

Jremesh-tools - Blender addon for quad remeshing

CondNet: Conditional Classifier for Scene Segmentation

This package contains a PyTorch Implementation of IB-GAN of the submitted paper in AAAI 2021

Faune proche - Retrieval of Faune-France data near a google maps location

Unofficial implementation of Point-Unet: A Context-Aware Point-Based Neural Network for Volumetric Segmentation

Notes taking website build with Docker + Django + React.

Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

The repository forked from NVlabs uses our data. (Differentiable rasterization applied to 3D model simplification tasks)

A python library to build Model Trees with Linear Models at the leaves.

Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation in PyTorch

My implementation of transformers related papers for computer vision in pytorch

Tensorflow implementation of our method: "Triangle Graph Interest Network for Click-through Rate Prediction".

Video Instance Segmentation using Inter-Frame Communication Transformers (NeurIPS 2021)

Awesome Long-Tailed Learning

HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electronic Health Records

Learning-based agent for Google Research Football

Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.

This repository contains a set of codes to run (i.e., train, perform inference with, evaluate) a diarization method called EEND-vector-clustering.