The official repository for Audio ALBERT

Last update: Dec 11, 2022

Related tags

Overview

AALBERT

Here is also the official repository of AALBERT, which is Pytorch lightning reimplementation of the paper, Audio ALBERT: A Lite Bert for Self-Supervised Learning of Audio Representation. The original code is in AlbertNew branch of s3prl repo. In the paper, we proposed Audio ALBERT, which achieves performance comparable with massive pre-trained networks in the downstream tasks while having 91% fewer parameters.

Dependencies

Python 3.8
Computing power (high-end GPU) and memory space (both RAM/GPU's RAM) is extremely important if you'd like to train your own model.
Required packages and their use are listed requirements.txt.
pip install -r requirements.txt

Pretrain Stage

We use LibriSpeech as our pretraining stage dataset. You can download dataset by this link.

Stage 1: modify dataset path to your local dataset path:

AALBERT: config path: upstream/aalbert/pretrain_config.yaml

    line 16: datarc:
            {Your dataset key name}: {your local dataset path}

Mockingjay: upstream/mockingjay/pretrain_config.yaml

    line 16: datarc:
            {Your dataset key name}: {your local dataset path}

Stage 2: run pretraining script

python run_pretrain.py -n aalbert_pretrained -u aalbert
- -n : experiment_name
- -u : upstream model: {two option: aalbert / mockingjay}
- model will save on result folder after finish pretraining stage.

Downstream Stage

Here, we take voxceleb1 speaker classification as our downstream task. You can download dataset from their official website.

After pretraining, We can extract the pretrained model feature on different downstream tasks.

Stage 1: modify dataset path to your local dataset path
- voxceleb1_speaker: config path: downstream/voxceleb1_speaker/train_config.yaml
```
line  9: datarc:
line 10:    file_path: {your dataset folder path}
line 11:    meta_path: {your label file path}
```

Stage 2: run downstream script

voxceleb1_speaker:

python run_downstream.py \
-c downstream/voxceleb1_speaker/train_config.yaml \
-g result/pretrain/{your_pretrained_model_folder}/model_config.yaml  \
-t result/pretrain/{your_pretrained_model_folder}/pretrained_config.yaml \
-u aalbert \
-d voxceleb1_speaker \
-k result/pretrained/{your pretrained_model_folder}/checkpoints/{checkpoint_you_want_to_use.ckpt} \
-n voxceleb1_result

-n: experiment name
-c: downstream training config
-g: pretrained model config
-t: load pretrained model pretrained config
-u: upstream model: {two option: aalbert / mockingjay}
-d: downstream task name
-k: model checkpoint path
-f: finetune pretrained model or not, default=False

The official repository for Audio ALBERT

Related tags

Overview

AALBERT

Dependencies

Pretrain Stage

Downstream Stage

Owner

pohan

Accompanying code for our paper "Point Cloud Audio Processing"

Anki vector Music ❤ is the best and only Telegram VC player with playlists, Multi Playback, Channel play and more

Praat in Python, the Pythonic way

Code to work with wave files!

A Python wrapper for the high-quality vocoder "World"

The project aims to develop a personal-assistant for Windows & Linux-based systems

eyeD3 is a Python module and command line program for processing ID3 tags. Information about mp3 files (i.e bit rate, sample frequency, play time, etc.) is also provided. The formats supported are ID3v1 (1.0/1.1) and ID3v2 (2.3/2.4).

Noinoi music is smoothly playing music on voice chat of telegram.

extract unpack asset file (form unreal engine 4 pak) with extenstion *.uexp which contain awb/acb (cri/cpk like) sound or music resource

Full LAKH MIDI dataset converted to MuseNet MIDI output format (9 instruments + drums)

Open Sound Strip, Sequence or Record in Audacity

This is an OverPowered Vc Music Player! Will work for you and play music in Voice Chatz

MusicBrainz Picard

Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

gentle forced aligner

Conferencing Speech Challenge

PyAbsorp is a python module that has the main focus to help estimate the Sound Absorption Coefficient.

MUSIC-AVQA, CVPR2022 (ORAL)

digital audio workstation, instrument and effect plugins, wave editor

GNU Radio – the Free and Open Software Radio Ecosystem