Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch

Last update: Dec 07, 2022

Overview

Auditory Slow-Fast

This repository implements the model proposed in the paper:

Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen, Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021

Project's webpage

arXiv paper

Citing

When using this code, kindly reference:

@ARTICLE{Kazakos2021SlowFastAuditory,
   title={Slow-Fast Auditory Streams For Audio Recognition},
   author={Kazakos, Evangelos and Nagrani, Arsha and Zisserman, Andrew and Damen, Dima},
           journal   = {CoRR},
           volume    = {abs/2103.03516},
           year      = {2021},
           ee        = {https://arxiv.org/abs/2103.03516},
}

Pretrained models

You can download our pretrained models on VGG-Sound and EPIC-KITCHENS-100:

Slow-Fast (EPIC-KITCHENS-100) link
Slow (EPIC-KITCHENS-100) link
Fast (EPIC-KITCHENS-100) link
Slow-Fast (VGG-Sound) link
Slow (VGG-Sound) link
Fast (VGG-Sound) link

Preparation

Requirements:
- PyTorch 1.7.1
- librosa: conda install -c conda-forge librosa
- h5py: conda install h5py
- wandb: pip install wandb
- fvcore: pip install 'git+https://github.com/facebookresearch/fvcore'
- simplejson: pip install simplejson
- psutil: pip install psutil
- tensorboard: pip install tensorboard
Add this repository to $PYTHONPATH.

export PYTHONPATH=/path/to/auditory-slow-fast/slowfast:$PYTHONPATH

VGG-Sound:
1. Download the audio. For instructions see here
2. Download train.pkl (link) and test.pkl (link). I converted the original train.csv and test.csv (found here) to pickle files with column names for easier use
EPIC-KITCHENS:
1. From the annotation repository of EPIC-KITCHENS-100 (link), download: EPIC_100_train.pkl, EPIC_100_validation.pkl, and EPIC_100_test_timestamps.pkl. EPIC_100_train.pkl and EPIC_100_validation.pkl will be used for training/validation, while EPIC_100_test_timestamps.pkl can be used to obtain the scores to submit in the AR challenge.
2. Download all the videos of EPIC-KITCHENS-100 using the download scripts found here, where you can also find detailed instructions on using the scripts.
3. Extract audio from the videos by running:
```
python audio_extraction/extract_audio.py /path/to/videos /output/path 
```
1. Save audio in HDF5 format by running:
```
python audio_extraction/wav_to_hdf5.py /path/to/audio /output/hdf5/EPIC-KITCHENS-100_audio.hdf5
```

Training/validation on EPIC-KITCHENS-100

To train the model run (fine-tuning from VGG-Sound pretrained model):

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model

To train from scratch remove TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model.

You can also train the individual streams. For example, for training Slow run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOW_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model

To validate the model run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth

To obtain scores on the test set run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth 
EPICKITCHENS.TEST_LIST EPIC_100_test_timestamps.pkl EPICKITCHENS.TEST_SPLIT test

Training/validation on VGG-Sound

To train the model run:

python tools/run_net.py --cfg configs/VGG-Sound/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir VGGSOUND.AUDIO_DATA_DIR /path/to/dataset 
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations

To validate the model run:

python tools/run_net.py --cfg configs/VGG-Sound/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir VGGSOUND.AUDIO_DATA_DIR /path/to/dataset 
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth

License

The code is published under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, found here.

Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch

Related tags

Overview

Auditory Slow-Fast

Citing

Pretrained models

Preparation

Training/validation on EPIC-KITCHENS-100

Training/validation on VGG-Sound

License

Owner

Evangelos Kazakos

This is a short program that takes the input from your microphone and uses OpenGL to draw a live colourful pattern

Convert complex chord names to midi notes

MelGAN test on audio decoding

Conferencing Speech Challenge

Users can transcribe their favorite piano recordings to MIDI files after installation

Codes for "Efficient Long-Range Attention Network for Image Super-resolution"

:notes: Cross-platform music player

無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXのコア

Pythonic bindings for FFmpeg's libraries.

Welcome to Nexus. Your personal virtual assistant

Python interface to the WebRTC Voice Activity Detector

A rofi-blocks script that searches youtube and plays the selected audio on mpv.

Anaphones are like anagrams, but for sounds.

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

XA Music Player - Telegram Music Bot

Open Sound Strip, Sequence or Record in Audacity

An AI for Music Generation

GNOME powered sound conversion

SinGlow: Generative Flow for SVS tasks in Tensorflow 2

Basically Play Pauses the song when it is safe to do so. when you die in a round

Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch

Related tags

Overview

Auditory Slow-Fast

Citing

Pretrained models

Preparation

Training/validation on EPIC-KITCHENS-100

Training/validation on VGG-Sound

License

Owner

Evangelos Kazakos

This is a short program that takes the input from your microphone and uses OpenGL to draw a live colourful pattern

Convert complex chord names to midi notes

MelGAN test on audio decoding

Conferencing Speech Challenge

Users can transcribe their favorite piano recordings to MIDI files after installation

Codes for "Efficient Long-Range Attention Network for Image Super-resolution"

:notes: Cross-platform music player

無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXのコア

﻿﻿Pythonic bindings for FFmpeg's libraries.

Welcome to Nexus. Your personal virtual assistant

Python interface to the WebRTC Voice Activity Detector

A rofi-blocks script that searches youtube and plays the selected audio on mpv.

Anaphones are like anagrams, but for sounds.

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

XA Music Player - Telegram Music Bot

Open Sound Strip, Sequence or Record in Audacity

An AI for Music Generation

GNOME powered sound conversion

SinGlow: Generative Flow for SVS tasks in Tensorflow 2

Basically Play Pauses the song when it is safe to do so. when you die in a round

Pythonic bindings for FFmpeg's libraries.