FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR

Overview

FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR

This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating roomimpulse responses (RIRs) for a given rectangular acoustic environment. Our model is inspired by StackGAN architecture. The audio examples and spectrograms of the generated RIRs are available here.

Requirements

Python3.6
Pytorch
python-dateutil
easydict
pandas
torchfile
gdown
pickle

Embedding

Each normalized embedding is created as follows: If you are using our trained model, you may need to use extra parameter Correction(CRR).

Listener Position = LP
Source Position = SP
Room Dimension = RD
Reverberation Time = T60
Correction = CRR

CRR = 0.1 if 0.5
   
    <0.6
CRR = 0.2 if T60>0.6
CRR = 0 otherwise

Embedding = ([LP_X,LP_Y,LP_Z,SP_X,SP_y,SP_Z,RD_X,RD_Y,RD_Z,(T60+CRR)] /5) + 1

   

Generete RIRs using trained model

Download the trained model using this command

source download_generate.sh

Create normalized embeddings list in pickle format. You can run following command to generate an example embedding list

 python3 example1.py

Run the following command inside code_new to generate RIRs corresponding to the normalized embeddings list. You can find generated RIRs inside code_new/Generated_RIRs

python3 main.py --cfg cfg/RIR_eval.yml --gpu 0

Range

Our trained NN-DAS is capable of generating RIRs with the following range accurately.

Room Dimension X --> 8m to 11m
Room Dimesnion Y --> 6m to 8m
Room Dimension Z --> 2.5m to 3.5m
Listener Position --> Any position within the room
Speaker Position --> Any position within the room
Reverberation time --> 0.2s to 0.7s

Training the Model

Run the following command to download the training dataset we created using a Diffuse Acoustic Simulator. You also can train the model using your dataset.

source download_data.sh

Run the following command to train the model. You can pass what GPUs to be used for training as an input argument. In this example, I am using 2 GPUs.

python3 main.py --cfg cfg/RIR_s1.yml --gpu 0,1

Related Works

  1. IR-GAN: Room Impulse Response Generator for Far-field Speech Recognition (INTERSPEECH2021)
  2. TS-RIR: Translated synthetic room impulse responses for speech augmentation (IEEE ASRU 2021)

Citations

If you use our FAST-RIR for your research, please consider citing

@misc{ratnarajah2021fastrir,
      title={FAST-RIR: Fast neural diffuse room impulse response generator}, 
      author={Anton Ratnarajah and Shi-Xiong Zhang and Meng Yu and Zhenyu Tang and Dinesh Manocha and Dong Yu},
      year={2021},
      eprint={2110.04057},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

Our work is inspired by

@inproceedings{han2017stackgan,
Author = {Han Zhang and Tao Xu and Hongsheng Li and Shaoting Zhang and Xiaogang Wang and Xiaolei Huang and Dimitris Metaxas},
Title = {StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks},
Year = {2017},
booktitle = {{ICCV}},
}

If you use our training dataset generated using Diffuse Acoustic Simulator in your research, please consider citing

@inproceedings{9052932,
  author={Z. {Tang} and L. {Chen} and B. {Wu} and D. {Yu} and D. {Manocha}},  
  booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},  
  title={Improving Reverberant Speech Training Using Diffuse Acoustic Simulation},   
  year={2020},  
  volume={},  
  number={},  
  pages={6969-6973},
}
Owner
Anton Jeran Ratnarajah
Anton Jeran Ratnarajah
Repository for the paper "Exploring the Sensory Spaces of English Perceptual Verbs in Natural Language Data"

Sensory Spaces of English Perceptual Verbs This repository contains the code and collocational data described in the paper "Exploring the Sensory Spac

David Peng 0 Sep 07, 2021
Official repository for Natural Image Matting via Guided Contextual Attention

GCA-Matting: Natural Image Matting via Guided Contextual Attention The source codes and models of Natural Image Matting via Guided Contextual Attentio

Li Yaoyi 349 Dec 26, 2022
Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR)

Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR) This is the official implementation of our paper Personalized Tran

Yongchun Zhu 81 Dec 29, 2022
All of the figures and notebooks for my deep learning book, for free!

"Deep Learning - A Visual Approach" by Andrew Glassner This is the official repo for my book from No Starch Press. Ordering the book My book is called

Andrew Glassner 227 Jan 04, 2023
Band-Adaptive Spectral-Spatial Feature Learning Neural Network for Hyperspectral Image Classification

Band-Adaptive Spectral-Spatial Feature Learning Neural Network for Hyperspectral Image Classification

258 Dec 29, 2022
根据midi文件演奏“风物之诗琴”的脚本 "Windsong Lyre" auto play

Genshin-lyre-auto-play 简体中文 | English 简介 根据midi文件演奏“风物之诗琴”的脚本。由Python驱动,在此承诺, ⚠️ 项目内绝不含任何能够引起安全问题的代码。 前排提示:所有键盘在动但是原神没反应的都是因为没有管理员权限,双击run.bat或者以管理员模式

御坂17032号 386 Jan 01, 2023
Pytorch port of Google Research's LEAF Audio paper

leaf-audio-pytorch Pytorch port of Google Research's LEAF Audio paper published at ICLR 2021. This port is not completely finished, but the Leaf() fro

Dennis Fedorishin 80 Oct 31, 2022
Tech Resources for Academic Communities

Free tech resources for faculty, students, researchers, life-long learners, and academic community builders for use in tech based courses, workshops, and hackathons.

Microsoft 2.5k Jan 04, 2023
TalkingHead-1KH is a talking-head dataset consisting of YouTube videos

TalkingHead-1KH Dataset TalkingHead-1KH is a talking-head dataset consisting of YouTube videos, originally created as a benchmark for face-vid2vid: On

173 Dec 29, 2022
Universal Probability Distributions with Optimal Transport and Convex Optimization

Sylvester normalizing flows for variational inference Pytorch implementation of Sylvester normalizing flows, based on our paper: Sylvester normalizing

Rianne van den Berg 172 Dec 13, 2022
deep_image_prior_extension

Code for "Is Deep Image Prior in Need of a Good Education?" Project page: https://jleuschn.github.io/docs.educated_deep_image_prior/. Supplementary Ma

riccardo barbano 7 Jan 09, 2022
Unofficial PyTorch implementation of MobileViT.

MobileViT Overview This is a PyTorch implementation of MobileViT specified in "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Tr

Chin-Hsuan Wu 348 Dec 23, 2022
TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction TSDF++ is a novel multi-object TSDF formulation that can encode mult

ETHZ ASL 130 Dec 29, 2022
Multiple paper open-source codes of the Microsoft Research Asia DKI group

📫 Paper Code Collection (MSRA DKI Group) This repo hosts multiple open-source codes of the Microsoft Research Asia DKI Group. You could find the corr

Microsoft 249 Jan 08, 2023
Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

FLAME Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation, accepted at the 17th IEEE Internation Co

Neelabh Sinha 19 Dec 17, 2022
This is the official code of L2G, Unrolling and Recurrent Unrolling in Learning to Learn Graph Topologies.

Learning to Learn Graph Topologies This is the official code of L2G, Unrolling and Recurrent Unrolling in Learning to Learn Graph Topologies. Requirem

Stacy X PU 16 Dec 09, 2022
Codebase for the Summary Loop paper at ACL2020

Summary Loop This repository contains the code for ACL2020 paper: The Summary Loop: Learning to Write Abstractive Summaries Without Examples. Training

Canny Lab @ The University of California, Berkeley 44 Nov 04, 2022
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.

ENet This work has been published in arXiv: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. Packages: train contains too

e-Lab 344 Nov 21, 2022
Speech Recognition is an important feature in several applications used such as home automation, artificial intelligence

Speech Recognition is an important feature in several applications used such as home automation, artificial intelligence, etc. This article aims to provide an introduction on how to make use of the S

RISHABH MISHRA 1 Feb 13, 2022
A lightweight python AUTOmatic-arRAY library.

A lightweight python AUTOmatic-arRAY library. Write numeric code that works for: numpy cupy dask autograd jax mars tensorflow pytorch ... and indeed a

Johnnie Gray 62 Dec 27, 2022