YOLOX_AUDIO is an audio event detection model based on YOLOX

Overview

Introduction

YOLOX_AUDIO is an audio event detection model based on YOLOX, an anchor-free version of YOLO. This repo is an implementated by PyTorch. Main goal of YOLOX_AUDIO is to detect and classify pre-defined audio events in multi-spectrogram domain using image object detection frameworks.

Updates!!

  • 【2021/11/15】 We released YOLOX_AUDIO to public

Quick Start

Installation

Step1. Install YOLOX_AUDIO.

git clone https://github.com/intflow/YOLOX_AUDIO.git
cd YOLOX_AUDIO
pip3 install -U pip && pip3 install -r requirements.txt
pip3 install -v -e .  # or  python3 setup.py develop

Step2. Install pycocotools.

pip3 install cython; pip3 install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
Data Preparation

Step1. Prepare audio wavform files for training. AUDIO_DATAPATH/wav

Step2. Write audio annotation files for training. AUDIO_DATAPATH/label.json

{
    "00000.wav": {
        "speaker": [
            "W",
            "M",
            "C",
            "W"
        ],
        "on_offset": [
            [
                1.34425,
                2.4083125
            ],
            [
                4.0082708333333334,
                4.5560625
            ],
            [
                6.2560416666666665,
                7.956104166666666
            ],
            [
                9.756083333333333,
                10.876624999999999
            ]
        ]
    },
    "00001.wav": {
        "speaker": [
            "W",
            "M",
            "C",
            "M",
            "W",
            "C"
        ],
        "on_offset": [
            [
                1.4325416666666666,
                2.7918958333333332
            ],
            [
                2.1762916666666667,
                4.109729166666667
            ],
            [
                7.109708333333334,
                8.530916666666666
            ],
            [
                8.514125,
                9.306104166666668
            ],
            [
                12.606083333333334,
                14.3345625
            ],
            [
                14.148958333333333,
                15.362958333333333
            ]
        ]
    },
    ...
}

Step3. Convert audio files into spectrogram images.

python tools/json_gen_audio2coco.py

Please change the dataset path and file names for your needs

root = '/data/AIGC_3rd_2021/GIST_tr2_veryhard5000_all_tr2'
os.system('rm -rf '+root+'/img/')
os.system('mkdir '+root+'/img/')
wav_folder_path = os.path.join(root, 'wav')
img_folder_path = os.path.join(root, 'img')
train_label_path = os.path.join(root, 'tr2_devel_5000.json')
train_label_merge_out = os.path.join(root, 'label_coco_bbox.json')
Training

Step1. Change Data loading path of exps/yolox_audio__tr2/yolox_x.py

        self.train_path = '/data/AIGC_3rd_2021/GIST_tr2_veryhard5000_all_tr2'
        self.val_path = '/data/AIGC_3rd_2021/tr2_set_01_tune'
        self.train_ann = "label_coco_bbox.json"
        self.val_ann = "label_coco_bbox.json"

Step2. Begin training:

python3 tools/train.py -expn yolox_audio__tr2 -n yolox_audio_x \
-f exps/yolox_audio__tr2/yolox_x.py -d 4 -b 32 --fp16 \
-c /data/pretrained/yolox_x.pth
  • -d: number of gpu devices
  • -b: total batch size, the recommended number for -b is num-gpu * 8
  • -f: path of experiement file
  • --fp16: mixed precision training
  • --cache: caching imgs into RAM to accelarate training, which need large system RAM.

We are encouraged to use pretrained YOLOX model for the training. https://github.com/Megvii-BaseDetection/YOLOX

Inference Run following demo_audio.py
python3 tools/demo.py --demo image -expn yolox_audio__tr2 -n yolox_audio_x \
-f exps/yolox_audio__tr2/yolox_x.py \
-c YOLOX_outputs/yolox_audio__tr2/best_ckpt.pth \
--path /data/AIGC_3rd_2021/GIST_tr2_100/img/ \
--save_folder /data/yolox_out \
--conf 0.2 --nms 0.65 --tsize 256 --save_result --device gpu

From the demo_audio.py you can get on-offset VAD time and class of each audio chunk.

References

  • YOLOX baseline implemented by PyTorch: YOLOX
 @article{yolox2021,
  title={YOLOX: Exceeding YOLO Series in 2021},
  author={Ge, Zheng and Liu, Songtao and Wang, Feng and Li, Zeming and Sun, Jian},
  journal={arXiv preprint arXiv:2107.08430},
  year={2021}
}
  • Librosa for audio feature extraction: librosa
McFee, Brian, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. “librosa: Audio and music signal analysis in python.” In Proceedings of the 14th python in science conference, pp. 18-25. 2015.

Acknowledgement

This work was supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-00014).

Owner
intflow Inc.
Official Code Repositories of intflow.ai
intflow Inc.
Animal Sound Classification (Cats Vrs Dogs Audio Sentiment Classification)

this is a simple artificial neural network model using deep learning and torch-audio to classify cats and dog sounds.

crispengari 3 Dec 05, 2022
ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs This is the code of paper ConE: Cone Embeddings for Multi-Hop Reasoning over Knowl

MIRA Lab 33 Dec 07, 2022
MultiMix: Sparingly Supervised, Extreme Multitask Learning From Medical Images (ISBI 2021, MELBA 2021)

MultiMix This repository contains the implementation of MultiMix. Our publications for this project are listed below: "MultiMix: Sparingly Supervised,

Ayaan Haque 27 Dec 22, 2022
Source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network

D-HAN The source code of D-HAN This is the source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network. However, only the co

30 Sep 22, 2022
A scanpy extension to analyse single-cell TCR and BCR data.

Scirpy: A Scanpy extension for analyzing single-cell immune-cell receptor sequencing data Scirpy is a scalable python-toolkit to analyse T cell recept

ICBI 145 Jan 03, 2023
Code release of paper Improving neural implicit surfaces geometry with patch warping

NeuralWarp: Improving neural implicit surfaces geometry with patch warping Project page | Paper Code release of paper Improving neural implicit surfac

François Darmon 167 Dec 30, 2022
Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

End-to-End Optimization of Scene Layout Code release for: End-to-End Optimization of Scene Layout CVPR 2020 (Oral) Project site, Bibtex For help conta

Andrew Luo 41 Dec 09, 2022
Auto-Encoding Score Distribution Regression for Action Quality Assessment

DAE-AQA It is an open source program reference to paper Auto-Encoding Score Distribution Regression for Action Quality Assessment. 1.Introduction DAE

13 Nov 16, 2022
Alternatives to Deep Neural Networks for Function Approximations in Finance

Alternatives to Deep Neural Networks for Function Approximations in Finance Code companion repo Overview This is a repository of Python code to go wit

15 Dec 17, 2022
This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

Yogi-Optimizer_Keras This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras) The NeurIPS-Paper can be found here: http://papers.nips.c

14 Sep 13, 2022
Framework that uses artificial intelligence applied to mathematical models to make predictions

LiconIA Framework that uses artificial intelligence applied to mathematical models to make predictions Interface Overview Table of contents [TOC] 1 Ar

4 Jun 20, 2021
This is the source code for: Context-aware Entity Typing in Knowledge Graphs.

This is the source code for: Context-aware Entity Typing in Knowledge Graphs.

9 Sep 01, 2022
Stereo Hybrid Event-Frame (SHEF) Cameras for 3D Perception, IROS 2021

For academic use only. Stereo Hybrid Event-Frame (SHEF) Cameras for 3D Perception Ziwei Wang, Liyuan Pan, Yonhon Ng, Zheyu Zhuang and Robert Mahony Th

Ziwei Wang 11 Jan 04, 2023
A modular, open and non-proprietary toolkit for core robotic functionalities by harnessing deep learning

A modular, open and non-proprietary toolkit for core robotic functionalities by harnessing deep learning Website • About • Installation • Using OpenDR

OpenDR 304 Dec 28, 2022
An Active Automata Learning Library Written in Python

AALpy An Active Automata Learning Library AALpy is a light-weight active automata learning library written in pure Python. You can start learning auto

TU Graz - SAL Dependable Embedded Systems Lab (DES Lab) 78 Dec 30, 2022
Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Auto-ViML Automatically Build Variant Interpretable ML models fast! Auto_ViML is pronounced "auto vimal" (autovimal logo created by Sanket Ghanmare) N

AutoViz and Auto_ViML 397 Dec 30, 2022
A few stylization coreML models that I've trained with CreateML

CoreML-StyleTransfer A few stylization coreML models that I've trained with CreateML You can open and use the .mlmodel files in the "models" folder in

Doron Adler 8 Aug 18, 2022
DM-ACME compatible implementation of the Arm26 environment from Mujoco

ACME-compatible implementation of Arm26 from Mujoco This repository contains a customized implementation of Mujoco's Arm26 model, that can be used wit

1 Dec 24, 2021
A collection of models for image<->text generation in ACM MM 2021.

Bi-directional Image and Text Generation UMT-BITG (image & text generator) Unifying Multimodal Transformer for Bi-directional Image and Text Generatio

Multimedia Research 63 Oct 30, 2022