kapre: Keras Audio Preprocessors

Last update: Dec 29, 2022

Overview

Kapre

Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time.

Tested on Python 3.6 and 3.7

Why Kapre?

vs. Pre-computation

You can optimize DSP parameters
Your model deployment becomes much simpler and consistent.
Your code and model has less dependencies

vs. Your own implementation

Quick and easy!
Consistent with 1D/2D tensorflow batch shapes
Data format agnostic (channels_first and channels_last)
Less error prone - Kapre layers are tested against Librosa (stft, decibel, etc) - which is (trust me) trickier than you think.
Kapre layers have some extended APIs from the default tf.signals implementation such as..
- A perfectly invertible STFT and InverseSTFT pair
- Mel-spectrogram with more options
Reproducibility - Kapre is available on pip with versioning

Workflow with Kapre

Preprocess your audio dataset. Resample the audio to the right sampling rate and store the audio signals (waveforms).
In your ML model, add Kapre layer e.g. kapre.time_frequency.STFT() as the first layer of the model.
The data loader simply loads audio signals and feed them into the model
In your hyperparameter search, include DSP parameters like n_fft to boost the performance.
When deploying the final model, all you need to remember is the sampling rate of the signal. No dependency or preprocessing!

Installation

pip install kapre

API Documentation

Please refer to Kapre API Documentation at https://kapre.readthedocs.io

One-shot example

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU, GlobalAveragePooling2D, Dense, Softmax
from kapre import STFT, Magnitude, MagnitudeToDecibel
from kapre.composed import get_melspectrogram_layer, get_log_frequency_spectrogram_layer

# 6 channels (!), maybe 1-sec audio signal, for an example.
input_shape = (44100, 6)
sr = 44100
model = Sequential()
# A STFT layer
model.add(STFT(n_fft=2048, win_length=2018, hop_length=1024,
               window_name=None, pad_end=False,
               input_data_format='channels_last', output_data_format='channels_last',
               input_shape=input_shape))
model.add(Magnitude())
model.add(MagnitudeToDecibel())  # these three layers can be replaced with get_stft_magnitude_layer()
# Alternatively, you may want to use a melspectrogram layer
# melgram_layer = get_melspectrogram_layer()
# or log-frequency layer
# log_stft_layer = get_log_frequency_spectrogram_layer() 

# add more layers as you want
model.add(Conv2D(32, (3, 3), strides=(2, 2)))
model.add(BatchNormalization())
model.add(ReLU())
model.add(GlobalAveragePooling2D())
model.add(Dense(10))
model.add(Softmax())

# Compile the model
model.compile('adam', 'categorical_crossentropy') # if single-label classification

# train it with raw audio sample inputs
# for example, you may have functions that load your data as below.
x = load_x() # e.g., x.shape = (10000, 6, 44100)
y = load_y() # e.g., y.shape = (10000, 10) if it's 10-class classification
# then..
model.fit(x, y)
# Done!

See the Jupyter notebook at the example folder

Citation

Please cite this paper if you use Kapre for your work.

@inproceedings{choi2017kapre,
  title={Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras},
  author={Choi, Keunwoo and Joo, Deokjin and Kim, Juho},
  booktitle={Machine Learning for Music Discovery Workshop at 34th International Conference on Machine Learning},
  year={2017},
  organization={ICML}
}

Comments

Migrated functions to tf.keras

This PR addresses #52 by removing the dependency on keras and switching to tensorflow.keras

Proposed version is 0.1.6 because of pull request #56

In particular, #56 keeps the dependency on keras with from keras.utils.conv_utils import conv_output_length

opened by douglas125 27
Spectrogram?

I have an older version of Kapre that has time_frequency.Spectrogram, which is trainable.

However, the new version of Kapre doesn't have Spectrogram anymore. Why?

opened by turian 8
Melspectrogram cant be set 'trainable_fb=False'

Melspectrogram cant be set 'trainable_fb=False',after I set trainable_fb=False,trainable_kernel=False,but seems like it doesnot work.It is still trainable.

opened by zhh6 8
htk=true for mel frequencies

We noticed the current implenetation of the mel_frequencies function (based on Librosa) doesn't include the htk=True option, which is handy when training CNNs because then the frequency scale is fully logarithmic which, in principle, makes more sense for frequency invariant convolutional filters.

What was the motivation for removing this? Any chance it can be added?

opened by justinsalamon 8
Added parallel STFT implementation

Hi!

As I comented in #98, I added a parallel STFT implementation based on the map_fn function following the indications of @zaccharieramzi here.

I've added a use_parallel_stft param (disabled by default) that allows to use this function. I've put this param in as many functions as I can. I also added test cases for every function I can, including an specific test that checks that the result of the tf.signal.stft is equals to the result of the parallel_stft function.

Hope this could serve us well meanwhile tensorflow resolves its issues with the fft implementation.

opened by JPery 7
Amplitude-to-decibel conversion produces different results on different batches

Related to #16, I found another issue that contributes to different prediction results depending on batch size (and the batches themselves). In particular, it occurs when using converting spectrograms to decibels.

https://github.com/keunwoochoi/kapre/blob/master/kapre/backend_keras.py#L17

The maximum is taken over the entire tensor, instead of per example in the batch. This results in different normalization when the examples in a batch are different.
bug

opened by auroracramer 7
Inverse Spectrogram and Mel-Spectrogram Layer?

Namaste!

kapre has become an integral part of all my audio Deep Learning experiments. Powerful! Thanks for providing such a great software!

I was thinking... I guess it would make sense to have layers for inverse spectrogram and inverse mel-spectrogram. Thinking about Autoencoders, this would be even more powerful. I know that reconstructing samples from spectrograms is not the best, but it is possible to a certain degree.

What do you think about that feature request?

Best, Tristan

opened by AI-Guru 7
Hey! The input is too short!

Hi,

I'm encountering an assertion problem when calling your code with a Tensorflow backend.

input_shape = (44100,1)

Could this be a be a problem with "channels_first" / "channels_last"?

Best, Alex

opened by slychief 6
Pip?

It seems you were on pip, but are no longer. Is there anything I could do to help get kapre back on there? We want to use this library in a commercial application, and for our process pip packages are easier to support than a git repository.

opened by ff-rfeather 6

trainable_stft error

Following your example but missing layer definition trainable_stft or something, can you provide example with error resolution?

`# 6 channels (!), maybe 1-sec audio signal
input_shape = (6, 44100) 
sr = 44100
model = Sequential()
model.add(Melspectrogram(n_dft=512, n_hop=256, input_shape=src_shape,
                         border_mode='same', sr=sr, n_mels=128,
                         fmin=0.0, fmax=sr/2, power=1.0,
                         return_decibel=False, trainable_fb=False,
                         trainable_kernel=False
                         name='trainable_stft'))`

  File "<ipython-input-24-cea5588ddf1e>", line 13
    name='trainable_stft'))
       ^
SyntaxError: invalid syntax

opened by sildeag 6

`STFT` layer output shape deviates from `STFTTflite` layer in batch dimension
Use Case

I want to convert a STFT layer in my model to a STFTTflite to deploy it to my mobile device. In the documentation I found that another dimension is added to account for complex numbers. But I also encountered a behaviour that is not documented.

Expected Behaviour

input_shape = (2048, 1) # mono signal model = keras.models.Sequential() # TFLite incompatible model model.add(kapre.STFT(n_fft=1024, hop_length=512, input_shape=input_shape)) tflite_model = keras.models.Sequential() # TFLite compatible model tflite_model.add(kapre.STFTTflite(n_fft=1024, hop_length=512, input_shape=input_shape))

model has the output shape (None, 3, 513, 1). Therefore, tflite_model should have the output shape (None, 3, 513, 1, 2).

Observed Behaviour

The output shape of tflite_model is (1, 3, 513, 1, 2) instead of (None, 3, 513, 1, 2).

Problem Solution

If this behaviour is unwanted:

Change the model output format so that the batch dimension is correctly shaped.

Otherwise:

Explain in the documentation why the batch dimension is shaped to 1.

Explain in the documentation how to include this layer into models which expect the batch dimension to be shaped None.
opened by PhilippMatthes 5

Problem incorporating SpecAugument in the training process

Hi,

I'm trying to add a SpecAug layer in the training process of a CNN using the code below:


CLIP_DURATION = 5 
SAMPLING_RATE = 41000
NUM_CHANNELS = 1

INPUT_SHAPE = ((CLIP_DURATION * SAMPLING_RATE), NUM_CHANNELS)

melgram = get_melspectrogram_layer(input_shape = INPUT_SHAPE,
                          n_fft = 2048,
                          hop_length = 512,
                          return_decibel=True,
                          n_mels = 40,
                          mel_f_min = 500,
                          mel_f_max = 15000,
                          input_data_format='channels_last',
                          output_data_format='channels_last')

spec_augment = SpecAugment(freq_mask_param=5,
                          time_mask_param=10,
                          n_freq_masks=2,
                          n_time_masks=3,
                          mask_value=-100)   

model = Sequential()
model.add(melgram)
model.add(spec_augment)

The CNN summary looks like this:

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 melspectrogram (Sequential)  (None, 397, 40, 1)       0         
                                                                 
 spec_augment_1 (SpecAugment  (None, 397, 40, 1)       0         
 )                                                               
                                                                 
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________

Compiling and fitting the model

model.compile(loss = 'sparse_categorical_crossentropy', optimizer='adam', metrics = 'accuracy')

early_stop = EarlyStopping(monitor='loss', patience=5)

reduce_LR = ReduceLROnPlateau(monitor="val_loss",factor=0.1,patience=4)

checkpointer = ModelCheckpoint(filepath = 'saved_models/bird_song_classification.hdf5')

model.fit(X_train, y_train, validation_data = (X_val, y_val), epochs = 50, batch_size = 32, callbacks = [early_stop, checkpointer, reduce_LR])

Then I get the following error:

Epoch 1/50
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-35-e58a056ab523>](https://localhost:8080/#) in <module>
      7 checkpointer = ModelCheckpoint(filepath = 'saved_models/bird_song_classification.hdf5')
      8 
----> 9 model.fit(X_train, y_train, validation_data = (X_val, y_val), epochs = 50, batch_size = 32, callbacks = [early_stop, checkpointer, reduce_LR])

6 frames
[/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py](https://localhost:8080/#) in tf___apply_masks_to_axis(self, x, axis, mask_param, n_masks)
     78                 try:
     79                     do_return = True
---> 80                     retval_ = ag__.converted_call(ag__.ld(tf).where, (ag__.ld(mask), ag__.ld(self).mask_value, ag__.ld(x)), None, fscope)
     81                 except:
     82                     do_return = False

TypeError: in user code:

    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 889, in train_step
        y_pred = self(x, training=True)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/tmp/__autograph_generated_filepzvfxhgz.py", line 63, in tf__call
        ag__.if_stmt((ag__.ld(training) in (None, False)), if_body_2, else_body_2, get_state_2, set_state_2, ('do_return', 'retval_'), 2)
    File "/tmp/__autograph_generated_filepzvfxhgz.py", line 58, in else_body_2
        retval_ = ag__.converted_call(ag__.ld(tf).map_fn, (), dict(elems=ag__.ld(x), fn=ag__.ld(self)._apply_spec_augment, dtype=ag__.ld(tf).float32, fn_output_signature=ag__.ld(tf).float32), fscope)
    File "/tmp/__autograph_generated_filef27o6c1f.py", line 44, in tf___apply_spec_augment
        ag__.if_stmt((ag__.ld(self).n_time_masks >= 1), if_body_1, else_body_1, get_state_1, set_state_1, ('x',), 1)
    File "/tmp/__autograph_generated_filef27o6c1f.py", line 39, in if_body_1
        x = ag__.converted_call(ag__.ld(self)._apply_masks_to_axis, (ag__.ld(x),), dict(axis=ag__.ld(time_axis), mask_param=ag__.ld(self).time_mask_param, n_masks=ag__.ld(self).n_time_masks), fscope)
    File "/tmp/__autograph_generated_file3vip8w4x.py", line 80, in tf___apply_masks_to_axis
        retval_ = ag__.converted_call(ag__.ld(tf).where, (ag__.ld(mask), ag__.ld(self).mask_value, ag__.ld(x)), None, fscope)

    TypeError: Exception encountered when calling layer "spec_augment_1" (type SpecAugment).
    
    in user code:
    
        File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 299, in call  *
            elems=x, fn=self._apply_spec_augment, dtype=tf.float32, fn_output_signature=tf.float32
        File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 273, in _apply_spec_augment  *
            x = self._apply_masks_to_axis(
        File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 254, in _apply_masks_to_axis  *
            return tf.where(mask, self.mask_value, x)
    
        TypeError: Input 'e' of 'SelectV2' Op has type float32 that does not match type int32 of argument 't'.
    
    
    Call arguments received by layer "spec_augment_1" (type SpecAugment):
      • x=tf.Tensor(shape=(None, 397, 40, 1), dtype=float32)
      • training=True
      • kwargs=<class 'inspect._empty'>

The shape of X_train is

(2182, 205000, 1)

I'm using Tensorflow 2.9.2, and Python 3.7.15

When I remove the SpecAug layer everything runs fine. I've tested using only the melspec + a mobile net at the end and it runs smooth. The problem is apparently related to SpecAug layer.

Do you have any idea what could be going wrong here? I appreciate any guidance related to the problem. Best regards.

opened by nnbuainain 2

Full-integer quantization and kapre layers
I am training a model which includes the mel-spectrogram block from get_melspectrogram_layer() right after the input layer. Training goes well, and I am able to change the specific mel-spec-layers to their TFLite-counterparts (STFTTflite, MagnitudeTflite) afterwards. I have checked also that the model performs as well as before.

The model also perfoms as expected when converting the model to .tflite using dynamic range quantization. However, when using full-integer quantization, the model loses its accuracy (see (https://www.tensorflow.org/lite/performance/post_training_quantization#integer_only).

I suppose the mel-spec starts to significantly differ as in full-integer quantization, the input values are projected to new range (int8). Is there any way to make it work with full-integer quantization?

I guess I need to separate the mel-spec-layer from the model as a preprocessing step in order to succeed with full-integer quantization, i.e., apply the input quantization to the output values of mel-spec layer. But then I would have to deploy two models to the edge device, where the input goes first to the mel-spec-block and then to the rest of the model (?).

I am using TensorFlow 2.7.0 and kapre 0.3.7.

Here is my code for testing the tflite-model:

preds = [] # Test and evaluate the TFLite-converted model on unseen test data for i, sample in enumerate(X_test_full_scaled): X = sample if input_details['dtype'] == np.int8: input_scale, input_zero_point = input_details["quantization"] X = sample / input_scale + input_zero_point X = X.reshape((1, 8000, 1)).astype(input_details["dtype"]) interpreter.set_tensor(input_index, X) interpreter.invoke() pred = interpreter.get_tensor(output_index) output_scale, output_zero_point = output_details['quantization'] if output_details['dtype'] == np.int8: pred = pred.astype(np.float32) pred = (pred - output_zero_point) * output_scale pred = np.argmax(pred, axis=1)[0] preds.append(pred) preds = np.array(preds)
opened by eppane 3
Calling Magnitude() and Phase() simultaneously

Hi,

I am looking to call Magnitude() and Phase() simultaneously for the same STFT input and concatenate the magnitude and phase before feeding into the convolution layers in my CNN sequential Keras model.

Is this possible?

Best,

Yang

opened by HsuanYang-Wang 1
about kapre.utiils

Hi, when i used "from kapre.utils import Normalization2D", I met this error which said No module named 'kapre.utils'. I see your package, and found that there is surely no utils.py. I am wondering how to slove it.

Best wishes, Daisy

opened by YiningWang2 1
Function missing in updated version

I noticed there is a functon "kapre.utils.Normalization2D" in the old version, while I cannot find it in the updated version. Why? Is there have any alternative functions?

opened by v3551G 1
trainable DSP parameters

hello contributers and community.

I love your repo! It's eases so much for me! Although having the precomputation in the model is already great I'd like to know how you can optimize DSP parameters. It looks like that this is a feature from old versions (e.g. 0.2) and by default I dont see any trainable params in this layer.

Could you please state if this is still available and how to use it?

happy hacking Paul

opened by bytosaur 2

Releases(Kapre-0.3.7)

Kapre-0.3.7(Jan 21, 2022)
Add SpecAugment layer

Source code(tar.gz)
Source code(zip)
Kapre-0.3.6(Nov 14, 2021)
bugfix (tflite)

Source code(tar.gz)
Source code(zip)
Kapre-0.3.5(Mar 18, 2021)
Add tflite-compatible stft layer

Source code(tar.gz)
Source code(zip)
Kapre-0.3.4(Sep 29, 2020)

Bugfix for get_window_fn()
Source code(tar.gz)
Source code(zip)
0.3.3(Sep 15, 2020)
kapre.augmentation is added

kapre.time_frequency.ConcatenateFrequencyMap is added

kapre.composed.get_frequency_aware_conv2d is added

In STFT and InverseSTFT, keyword arg window_fn is renamed to window_name and it expects string value, not function.

With this update, models with Kapre layers can be loaded with h5 file format.

kapre.backend.get_window_fn is added

Source code(tar.gz)
Source code(zip)

0.3.2(Aug 30, 2020)

- `kapre.signal.Frame` and `kapre.signal.Energy` are added
- `kapre.signal.LogmelToMFCC` is added
- `kapre.signal.MuLawEncoder` and `kapre.signal.MuLawDecoder` are added
- `kapre.composed.get_stft_magnitude_layer()` is added 
- doc is hosted at https://kapre.readthedocs.io/

Source code(tar.gz)
Source code(zip)

0.3.1(Aug 21, 2020)
InverseSTFT and etc.

Source code(tar.gz)
Source code(zip)
0.3.0(Aug 16, 2020)

Breaking and simplifying changes with Tensorflow 2.0 and more tests. Some features are removed. New layer - STFT(). New approach for more complicated representations - see kapre.composed.
Source code(tar.gz)
Source code(zip)
v0.1.8(May 18, 2020)

Added Delta layer
Source code(tar.gz)
Source code(zip)
kapre-master.zip(3.16 MB)
v0.1.7(Feb 20, 2020)

Source code(tar.gz)
Source code(zip)

Owner

Keunwoo Choi

MIR, machine learning, music recommendation.

GitHub Repository

Azion the best solution of Edge Computing in the world.

Azion Edge Function docker action Create or update an Edge Functions on Azion Edge Nodes. The domain name is the key for decision to a create or updat

8 Jul 16, 2022

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

35 Oct 26, 2022

Volumetric Correspondence Networks for Optical Flow, NeurIPS 2019.

VCN: Volumetric correspondence networks for optical flow [project website] Requirements python 3.6 pytorch 1.1.0-1.3.0 pytorch correlation module (opt

144 Dec 06, 2022

STEM: An approach to Multi-source Domain Adaptation with Guarantees

STEM: An approach to Multi-source Domain Adaptation with Guarantees Introduction This is the official implementation of ``STEM: An approach to Multi-s

5 Dec 19, 2022

This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

Mesa: A Memory-saving Training Framework for Transformers This is the official PyTorch implementation for Mesa: A Memory-saving Training Framework for

105 Dec 06, 2022

Locally Differentially Private Distributed Deep Learning via Knowledge Distillation (LDP-DL)

Locally Differentially Private Distributed Deep Learning via Knowledge Distillation (LDP-DL) A preprint version of our paper: Link here This is a samp

3 Jan 08, 2023

A self-supervised 3D representation learning framework named viewpoint bottleneck.

Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck Paper Created by Liyi Luo, Beiwen Tian, Hao Zhao and Guyue Zhou from Institute for AI In

63 Aug 11, 2022

AgeGuesser: deep learning based age estimation system. Powered by EfficientNet and Yolov5

AgeGuesser AgeGuesser is an end-to-end, deep-learning based Age Estimation system, presented at the CAIP 2021 conference. You can find the related pap

5 Nov 10, 2022

Bling's Object detection tool

BriVL for Building Applications This repo is used for illustrating how to build applications by using BriVL model. This repo is re-implemented from fo

47 Nov 01, 2022

All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

3.4k Jan 07, 2023

Codebase for "ProtoAttend: Attention-Based Prototypical Learning."

Codebase for "ProtoAttend: Attention-Based Prototypical Learning." Authors: Sercan O. Arik and Tomas Pfister Paper: Sercan O. Arik and Tomas Pfister,

2 May 17, 2022

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

536 Dec 20, 2022

Object classification with basic computer vision techniques

naive-image-classification Object classification with basic computer vision techniques. Final assignment for the computer vision course I took at univ

2 Jul 01, 2022

DeepGNN is a framework for training machine learning models on large scale graph data.

DeepGNN Overview DeepGNN is a framework for training machine learning models on large scale graph data. DeepGNN contains all the necessary features in

45 Jan 01, 2023

An excellent hash algorithm combining classical sponge structure and RNN.

SHA-RNN Recurrent Neural Network with Chaotic System for Hash Functions Anonymous Authors [摘要] 在这次作业中我们提出了一种新的 Hash Function —— SHA-RNN。其以海绵结构为基础，融合了混

5 May 15, 2022

a simple, efficient, and intuitive text editor

Oxygen beta a simple, efficient, and intuitive text editor Overview oxygen is a simple, efficient, and intuitive text editor designed as more featured

1 Feb 23, 2022

A heterogeneous entity-augmented academic language model based on Open Academic Graph (OAG)

Library | Paper | Slack We released two versions of OAG-BERT in CogDL package. OAG-BERT is a heterogeneous entity-augmented academic language model wh

58 Dec 17, 2022

This repository is the official implementation of Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning (NeurIPS21).

Core-tuning This repository is the official implementation of ``Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regular

18 Dec 17, 2022

Implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

PRP Introduction This is the implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

39 Dec 29, 2022

NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.

NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions Overview NUANCED is a user-centric conversational recommen

18 Dec 28, 2021