STFT_Transformer

Code for STFT Transformer used in BirdCLEF 2021 competition.

The STFT Transformer is a new way to use Transformers similar to Vision Transformers on audio data. It has been developed for the BirdCLEF 2021 competition hosted on Kaggle. The pdf document gives more context. It has been submitted to the BIRDCLEF 2021 workshop.

The code is provided as is, it has not been rewritten. Given competitions are done in a hurry, code may not meet usual open source standard.

The code assumes this directory structure:

<base_dir>/code

<base_dir>/input

<base_dir>/input/freefield1010

<base_dir>/checkpoints

<base_dir>/data

Code has to be run in the code directory. Competition data has to be downloaded in the input directory. freefield1010 data must also be downloaded in the freefield1010 directory. data_final.py should be run first. It reads audio files from input and stores the relevant part in data directory as numpy files.

Then stft_transformer_final.py can be run to train one fold model. During the competition I ran 5 folds, by editing the FOLD global variable in the script (I know, this is sub standard).

Once all 5 models are trained one can upload the weights to a kaggle dataset and use the submission notebook I used. This should get a score worth the 15th rank in the competition. Achieving this rank with a single model is significant, as all top teams used an ensemble of models.

Code for STFT Transformer used in BirdCLEF 2021 competition.

Related tags

Overview

STFT_Transformer

Owner

Jean-François Puget

ADSPM: Attribute-Driven Spontaneous Motion in Unpaired Image Translation

Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

Get 2D point positions (e.g., facial landmarks) projected on 3D mesh

Malware Analysis Neural Network project.

Run Keras models in the browser, with GPU support using WebGL

U-Net Implementation: Convolutional Networks for Biomedical Image Segmentation" using the Carvana Image Masking Dataset in PyTorch

Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."

PyTorch common framework to accelerate network implementation, training and validation

Using machine learning to predict and analyze high and low reader engagement for New York Times articles posted to Facebook.

General Multi-label Image Classification with Transformers

The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color

DataCLUE: 国内首个以数据为中心的AI测评（含模型分析报告）

This repository is the code of the paper Accelerating Deep Reinforcement Learning for Digital Twin Network Optimization with Evolutionary Strategies

Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集

A Human-in-the-Loop workflow for creating HD images from text

Tutorial to set up TensorFlow Object Detection API on the Raspberry Pi

Implementation of "Debiasing Item-to-Item Recommendations With Small Annotated Datasets" (RecSys '20)

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

Official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers