[email protected] Reverb Database. | PythonRepo" /> [email protected] Reverb Database. | PythonRepo">

The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal and to generate far-field speech data using room impulse response data from BUT [email protected] Reverb Database.

Overview

Add_noise_and_rir_to_speech

The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal and to generate far-field speech data using room impulse response data from BUT [email protected] Reverb Database.

Noise and RIR dataset description:

  • BUT [email protected] Reverb Database:

    The database is being built with respect to collect a large number of various Room Impulse Responses, Room environmental noises (or "silences"), Retransmitted speech (for ASR and SID testing), and meta-data (positions of microphones, speakers etc.).

    The goal is to provide speech community with a dataset for data enhancement and distant microphone or microphone array experiments in ASR and SID.

    In this codebase, we only use the RIR data, which is used to synthesize far-field speech, the composition of the RIR dataset and citation details are as follows.

    Room Name Room Type Size (length, depth, height) (m) (microphone_num x loudspeaker_num)
    Q301 Office 10.7x6.9x2.6 31 x 3
    L207 Office 4.6x6.9x3.1 31 x 6
    L212 Office 7.5x4.6x3.1 31 x 5
    L227 Stairs 6.2x2.6x14.2 31 x 5
    R112 Hotel room 4.4x2.8x2.6 31 x 5
    CR2 Conference room 28.2x11.1x3.3 31 x 4
    E112 Lecture room 11.5x20.1x4.8 31 x 2
    D105 Lecture room 17.2x22.8x6.9 31 x 6
    C236 Meeting room 7.0x4.1x3.6 31 x 10
    @ARTICLE{8717722,
             author={Szöke, Igor and Skácel, Miroslav and Mošner, Ladislav and Paliesek, Jakub and Černocký, Jan},
             journal={IEEE Journal of Selected Topics in Signal Processing}, 
             title={Building and evaluation of a real room impulse response dataset}, 
             year={2019},
             volume={13},
             number={4},
             pages={863-876},
             doi={10.1109/JSTSP.2019.2917582}
     }
    
  • MUSAN database:

    The database consists of music from several genres, speech from twelve languages, and a wide assortment of technical and non-technical noises and we only use the noise data in this database. Citation details are as follows.

    @misc{snyder2015musan,
          title={MUSAN: A Music, Speech, and Noise Corpus}, 
          author={David Snyder and Guoguo Chen and Daniel Povey},
          year={2015},
          eprint={1510.08484},
          archivePrefix={arXiv},
          primaryClass={cs.SD}
    }
    

Before using the data-processing code:

  • If you do not want the original dataset to be overwritten, please download the dataset again for use

  • You need to create three files: 'training_list.txt', 'validation_list.txt', 'testing_list.txt', based on your training, validation and test data file paths respectively, and ensure the audio in the file paths can be read and written.

  • The content of the aforementioned '*_list.txt' files are in the following form:

    *_list.txt
    	/../...../*.wav
    	/../...../*.wav
    	/../...../*.wav
    

Instruction for using the following data-processing code:

  1. mix_cleanaudio_with_rir_offline.py: Generate far-field speech offline

    • two parameters are needed:

      • --data_root: the data path which you want to download and store the RIR dataset in.
      • --clean_data_list_path: the path of the folder in which 'training_list.txt', 'validation_list.txt', 'testing_list.txt' are stored in
    • 2 folders will be created in data_root: 'ReverDB_data (Removable if needed)', 'ReverDB_mix'

  2. download_and_extract_noise_file.py: Generate musan noise file

    • one parameters are needed:
      • --data_root: the data path which you want to download and store the noise dataset in.
    • 2 folder will be created in data_root: 'musan (Removable if needed)', 'noise'
  3. vad_torch.py: Voice activity detection when adding noise to the speech

    The noise data is usually added online according to the SNR requirements, several pieces of code are provided below, please add them in the appropriate places according to your needs!

    import torchaudio
    import numpy as np
    import torch
    import random
    from vad_torch import VoiceActivityDetector
    
    
    def _add_noise(speech_sig, vad_duration, noise_sig, snr):
        """add noise to the audio.
        :param speech_sig: The input audio signal (Tensor).
        :param vad_duration: The length of the human voice (int).
        :param noise_sig: The input noise signal (Tensor).
        :param snr: the SNR you want to add (int).
        :returns: noisy speech sig with specific snr.
        """
        if vad_duration != 0:
            snr = 10**(snr/10.0)
            speech_power = torch.sum(speech_sig**2)/vad_duration
            noise_power = torch.sum(noise_sig**2)/noise_sig.shape[1]
            noise_update = noise_sig / torch.sqrt(snr * noise_power/speech_power)
    
            if speech_sig.shape[1] > noise_update.shape[1]:
                # padding
                temp_wav = torch.zeros(1, speech_sig.shape[1])
                temp_wav[0, 0:noise_update.shape[1]] = noise_update
                noise_update = temp_wav
            else:
                # cutting
                noise_update = noise_update[0, 0:speech_sig.shape[1]]
    
            return noise_update + speech_sig
        
        else:
            return speech_sig
        
    def main():
        # loading speech file
        speech_file = './speech.wav'
    	waveform, sr = torchaudio.load(speech_file)
    	waveform = waveform - waveform.mean()
    	
        # loading noise file and set snr
    	snr = 0       
    	noise_file = random.randint(1, 930)
    	
        # Voice activity detection
    	v = VoiceActivityDetector(waveform, sr)
    	raw_detection = v.detect_speech()
    	speech_labels = v.convert_windows_to_readible_labels(raw_detection)
    	vad_duration = 0
        if not len(speech_labels) == 0:
            for i in range(len(speech_labels)):
                start = speech_labels[i]['speech_begin']
                end = speech_labels[i]['speech_end']
                vad_duration = vad_duration + end-start
                
    	# adding noise
        noise, _ = torchaudio.load('/notebooks/noise/' + str(noise_file) + '.wav')
        waveform = _add_noise(waveform, vad_duration, noise, snr)
    
    if __name__ == '__main__':
        main()
Owner
Yunqi Chen
3rd-year undergraduate student; Passionate about all kinds of sports and everything interesting!
Yunqi Chen
Program Input Nilai Mahasiswa Menggunakan Fungsi

PROGRAM INPUT NILAI MAHASISWA MENGGUNAKAN FUNGSI Nama : Maulana Reza Badrudin Nim : 312110510 Matkul : Bahas Pemograman DESKRIPSI Deklarasi dicti

Maulana Reza Badrudin 1 Jan 05, 2022
A submodule of rmcrkd/ODE-Uniqueness

Heston-ODE This repo contains the Heston-related code that accompanies the article One-sided maximal uniqueness for a class of spatially irregular ord

0 Jan 05, 2022
A Python software implementation of the Intel 4004 processor

Pyntel4004 A Python software implementation of the Intel 4004 processor. General Information Two pass assembler using the original mnemonics, directiv

alshapton 5 Oct 01, 2022
Kivy program for identification & rotation sensing of objects on multi-touch tables.

ObjectViz ObjectViz is a multitouch object detection solution, enabling you to create physical markers out of any reliable multitouch solution. It's e

TangibleDisplay 8 Apr 04, 2022
Quantity Takeoff with Python. Collecting groups of elements by filters

The free tool QuantityTakeoff allows you to group elements from Revit and IFC models (in BIMJSON-CSV format) with just a few filters and find the required volume values for the grouped elements.

OpenDataBIM 9 Jan 06, 2023
Find habits that genuinely increase your productivity

BiProductive Description This repository contains the application BiProductive, which analyzes the habits of the person, tests his productivity, and d

Rizvan Iskaliev 43 Jun 11, 2022
Um Script De Mensagem anonimas Para linux e Termux Feito em python

Um Script De Mensagem anonimas Para linux e Termux Feito em python feito em um celular

6 Sep 09, 2021
An end-to-end Python-based Infrastructure as Code framework for network automation and orchestration.

Nectl An end-to-end Python-based Infrastructure as Code framework for network automation and orchestration. Features Data modelling and validation. Da

Adam Kirchberger 15 Oct 14, 2022
OntoSeer is a tool to help users build better quality ontologies

Ontoseer This document provides documentation for the first version of OntoSeer.OntoSeer is a tool that monitors the ontology development process andp

Knowledgeable Computing and Reasoning Lab 9 Aug 15, 2022
A tool to improve Boolean satisfiability (SAT) solver user's life

SatHelper This is a tool to improve the Boolean satisfiability (SAT) and MaxSAT solver user's life. It helps you model various problems as SAT and Max

Tomas Balyo 1 Nov 16, 2021
2021华为软件精英挑战赛 程序输出分析器

AutoGrader 0.2.0更新:加入资源分配溢出检测,如果发生资源溢出会输出溢出发生的位置。 如果通过检测,会显示通过符号 如果没有通过检测,会显示警告,并输出溢出发生的位置和操作

54 Aug 14, 2022
Cairo-math-64x61 - Fixed point 64.61 math library for Cairo / Starknet

Cairo Math 64x61 A fixed point 64.61 math library for Cairo & Starknet Signed 64

Influence 63 Dec 05, 2022
Download and process GOES-16 and GOES-17 data from NOAA's archive on AWS using Python.

Download and display GOES-East and GOES-West data GOES-East and GOES-West satellite data are made available on Amazon Web Services through NOAA's Big

Brian Blaylock 88 Dec 16, 2022
Adjust the white point, gamma or make your XDR display darker without losing HDR peak luminance or the ability to adjust display brightness

XDR Tuner Adjust the white point, gamma or make your XDR display darker without losing HDR peak luminance or the ability to adjust display brightness

François Simond 16 Dec 28, 2022
Draw random mazes in python

a-maze Draw random mazes in python This program generates and draws a rectangular maze, with an entrance on one side and one on the opposite side. The

Andrea Pasquali 1 Nov 21, 2021
Generates images with semantic content from distribution A in the style of distribution B

A2B Generates images with semantic content from distribution A in the style of d

Richard Herbert 2 Dec 27, 2021
LinkML based SPARQL template library and execution engine

sparqlfun LinkML based SPARQL template library and execution engine modularized core library of SPARQL templates generic templates using common vocabs

Linked data Modeling Language 6 Oct 10, 2022
A small program to vote for Councilors at 42 Heilbronn.

This Docker container is build to run on server an provide an easy to use interface for every student to vote for their councillors. To run docker on

Kevin Hirsig 2 Jan 17, 2022
Audio2Face - a project that transforms audio to blendshape weights,and drives the digital human,xiaomei,in UE project

Audio2Face - a project that transforms audio to blendshape weights,and drives the digital human,xiaomei,in UE project

FACEGOOD 732 Jan 08, 2023
Airflow Operator for running Soda SQL scans

Airflow Operator for running Soda SQL scans

Todd de Quincey 7 Oct 18, 2022