Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch

Overview

Auditory Slow-Fast

This repository implements the model proposed in the paper:

Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen, Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021

Project's webpage

arXiv paper

Citing

When using this code, kindly reference:

@ARTICLE{Kazakos2021SlowFastAuditory,
   title={Slow-Fast Auditory Streams For Audio Recognition},
   author={Kazakos, Evangelos and Nagrani, Arsha and Zisserman, Andrew and Damen, Dima},
           journal   = {CoRR},
           volume    = {abs/2103.03516},
           year      = {2021},
           ee        = {https://arxiv.org/abs/2103.03516},
}

Pretrained models

You can download our pretrained models on VGG-Sound and EPIC-KITCHENS-100:

  • Slow-Fast (EPIC-KITCHENS-100) link
  • Slow (EPIC-KITCHENS-100) link
  • Fast (EPIC-KITCHENS-100) link
  • Slow-Fast (VGG-Sound) link
  • Slow (VGG-Sound) link
  • Fast (VGG-Sound) link

Preparation

  • Requirements:
    • PyTorch 1.7.1
    • librosa: conda install -c conda-forge librosa
    • h5py: conda install h5py
    • wandb: pip install wandb
    • fvcore: pip install 'git+https://github.com/facebookresearch/fvcore'
    • simplejson: pip install simplejson
    • psutil: pip install psutil
    • tensorboard: pip install tensorboard
  • Add this repository to $PYTHONPATH.
export PYTHONPATH=/path/to/auditory-slow-fast/slowfast:$PYTHONPATH
  • VGG-Sound:
    1. Download the audio. For instructions see here
    2. Download train.pkl (link) and test.pkl (link). I converted the original train.csv and test.csv (found here) to pickle files with column names for easier use
  • EPIC-KITCHENS:
    1. From the annotation repository of EPIC-KITCHENS-100 (link), download: EPIC_100_train.pkl, EPIC_100_validation.pkl, and EPIC_100_test_timestamps.pkl. EPIC_100_train.pkl and EPIC_100_validation.pkl will be used for training/validation, while EPIC_100_test_timestamps.pkl can be used to obtain the scores to submit in the AR challenge.
    2. Download all the videos of EPIC-KITCHENS-100 using the download scripts found here, where you can also find detailed instructions on using the scripts.
    3. Extract audio from the videos by running:
    python audio_extraction/extract_audio.py /path/to/videos /output/path 
    
    1. Save audio in HDF5 format by running:
    python audio_extraction/wav_to_hdf5.py /path/to/audio /output/hdf5/EPIC-KITCHENS-100_audio.hdf5
    

Training/validation on EPIC-KITCHENS-100

To train the model run (fine-tuning from VGG-Sound pretrained model):

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model

To train from scratch remove TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model.

You can also train the individual streams. For example, for training Slow run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOW_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model

To validate the model run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth

To obtain scores on the test set run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth 
EPICKITCHENS.TEST_LIST EPIC_100_test_timestamps.pkl EPICKITCHENS.TEST_SPLIT test

Training/validation on VGG-Sound

To train the model run:

python tools/run_net.py --cfg configs/VGG-Sound/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir VGGSOUND.AUDIO_DATA_DIR /path/to/dataset 
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations 

To validate the model run:

python tools/run_net.py --cfg configs/VGG-Sound/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir VGGSOUND.AUDIO_DATA_DIR /path/to/dataset 
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth

License

The code is published under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, found here.

Owner
Evangelos Kazakos
Evangelos Kazakos
Welcome to Nexus. Your personal virtual assistant

AI Voice Assistant Welcome to Nexus voice assistant Description Have you ever heard of voice assistants like Cortana, Siri, Google assistant, and Alex

Mustafah Zacs 1 Jan 10, 2022
A Python wrapper for the high-quality vocoder "World"

PyWORLD - A Python wrapper of WORLD Vocoder Linux Windows WORLD Vocoder is a fast and high-quality vocoder which parameterizes speech into three compo

Jeremy Hsu 583 Dec 15, 2022
NovaMusic is a music sharing robot. Users can get music and music lyrics using inline queries.

A music sharing telegram robot using Redis database and Telebot python library using Redis database.

Hesam Norin 7 Oct 21, 2022
TwitterMusicBot - A Twitter bot with Spotify integration.

A Twitter Music Bot 🤖 🎵 🎶 I created this project to learn more about APIs, so it only works for student purposes. Initially, delving into the Spoti

Gustavo Oliveira 2 Jan 02, 2022
Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals

Welcome to MARSYAS. MARSYAS is a software framework for rapid prototyping of audio applications, with flexibility and extensibility as primary concer

Marsyas Developers Group 364 Oct 31, 2022
commonfate 📦commonfate 📦 - Common Fate Model and Transform.

Common Fate Transform and Model for Python This package is a python implementation of the Common Fate Transform and Model to be used for audio source

Fabian-Robert Stöter 18 Jan 08, 2022
Mentos Music Bot With Python

Mentos Music Bot For Any Query Join Our Support Group 👥 Special Thanks - @OfficialYukki Hey Welcome To Here 💫 💫 You Can Make Your Own Music Bot Fo

Cyber Toxic 13 Oct 21, 2022
Python library for audio and music analysis

librosa A python package for music and audio analysis. Documentation See https://librosa.org/doc/ for a complete reference manual and introductory tut

librosa 5.6k Jan 06, 2023
Synthesia but open source, made in python and free

PyPiano Synthesia but open source, made in python and free Requirements are in requirements.txt If you struggle with installation of pyaudio, run : pi

DaCapo 11 Nov 06, 2022
Open-Source bot to play songs in your Telegram's Group Voice Chat. Powered by @Akki_ThePro

VcPlayer Telegram Voice-Chat Bot [PyTGCalls] ⇝ Requirements ⇜ Account requirements A Telegram account to use as the music bot, You cannot use regular

Akki ThePro 2 Dec 25, 2021
This Is Telegram Music UserBot To Play Music Without Being Admin

This Is Telegram Music UserBot To Play Music Without Being Admin

Krishna Kumar 36 Sep 13, 2022
A Python wrapper around the Soundcloud API

soundcloud-python A friendly wrapper around the Soundcloud API. Installation To install soundcloud-python, simply: pip install soundcloud Or if you'r

SoundCloud 84 Dec 31, 2022
Small Python application that links a Digico console and Reaper, handling automatic marker insertion and tracking.

Digico-Reaper-Link This is a small GUI based helper application designed to help with using Digico's Copy Audio function with a Reaper DAW used for re

Justin Stasiw 10 Oct 24, 2022
TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

TONet Introduction The official implementation of "TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music", in ICASSP 2022 We

Knut(Ke) Chen 29 Dec 01, 2022
Synchronize a local directory of songs' (MP3, MP4) metadata (genre, ratings) and playlists with a Plex server.

PlexMusicSync Synchronize a local directory of songs' (MP3, MP4) metadata (genre, ratings) and playlists (m3u, m3u8) with a Plex server. The song file

Tom Goetz 9 Jul 07, 2022
Conferencing Speech Challenge

ConferencingSpeech 2021 challenge This repository contains the datasets list and scripts required for the ConferencingSpeech challenge. For more detai

73 Nov 29, 2022
Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

Y-Net Official implementation of A cappella: Audio-visual Singing VoiceSeparation, British Machine Vision Conference 2021 Project page: ipcv.github.io

Juan F. Montesinos 12 Oct 22, 2022
F.R.I.D.A.Y. ----- Female Replacement Intelligent Digital Assistant Youth

F.R.I.D.A.Y. Female Replacement Intelligent Digital Assistant Youth--Jarvis-- the virtual assistant made by python Overview This is a virtual assistan

JIB - Just Innovative Bro 4 Feb 26, 2022
Just-Music - Spotify API Driven Music Web app, that allows to listen and control and share songs

Just Music... Just Music Is A Web APP That Allows Users To Play Song Using Spoti

Ayush Mishra 3 May 01, 2022
An app made in Python using the PyTube and Tkinter libraries to download videos and MP3 audio.

yt-dl (GUI Edition) An app made in Python using the PyTube and Tkinter libraries to download videos and MP3 audio. How do I download this? Windows: Fi

1 Oct 23, 2021