Code for STFT Transformer used in BirdCLEF 2021 competition.

Overview

STFT_Transformer

Code for STFT Transformer used in BirdCLEF 2021 competition.

The STFT Transformer is a new way to use Transformers similar to Vision Transformers on audio data. It has been developed for the BirdCLEF 2021 competition hosted on Kaggle. The pdf document gives more context. It has been submitted to the BIRDCLEF 2021 workshop.

The code is provided as is, it has not been rewritten. Given competitions are done in a hurry, code may not meet usual open source standard.

The code assumes this directory structure:

<base_dir>/code

<base_dir>/input

<base_dir>/input/freefield1010

<base_dir>/checkpoints

<base_dir>/data

Code has to be run in the code directory. Competition data has to be downloaded in the input directory. freefield1010 data must also be downloaded in the freefield1010 directory. data_final.py should be run first. It reads audio files from input and stores the relevant part in data directory as numpy files.

Then stft_transformer_final.py can be run to train one fold model. During the competition I ran 5 folds, by editing the FOLD global variable in the script (I know, this is sub standard).

Once all 5 models are trained one can upload the weights to a kaggle dataset and use the submission notebook I used. This should get a score worth the 15th rank in the competition. Achieving this rank with a single model is significant, as all top teams used an ensemble of models.

Owner
Jean-François Puget
NVIDIA Distinguished engineer, and twice Kaggle Grandmaster (CPMP). Works on machine learning and competes on Kaggle.
Jean-François Puget
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022
Revealing and Protecting Labels in Distributed Training

Revealing and Protecting Labels in Distributed Training

Google Interns 0 Nov 09, 2022
PyTorch framework for Deep Learning research and development.

Accelerated DL & RL PyTorch framework for Deep Learning research and development. It was developed with a focus on reproducibility, fast experimentati

Catalyst-Team 29 Jul 13, 2022
Implementations of LSTM: A Search Space Odyssey variants and their training results on the PTB dataset.

An LSTM Odyssey Code for training variants of "LSTM: A Search Space Odyssey" on Fomoro. Check out the blog post. Training Install TensorFlow. Clone th

Fomoro AI 95 Apr 13, 2022
Recurrent Conditional Query Learning

Recurrent Conditional Query Learning (RCQL) This repository contains the Pytorch implementation of One Model Packs Thousands of Items with Recurrent C

Dongda 4 Nov 28, 2022
A trusty face recognition research platform developed by Tencent Youtu Lab

Introduction TFace: A trusty face recognition research platform developed by Tencent Youtu Lab. It provides a high-performance distributed training fr

Tencent 956 Jan 01, 2023
ArtEmis: Affective Language for Art

ArtEmis: Affective Language for Art Created by Panos Achlioptas, Maks Ovsjanikov, Kilichbek Haydarov, Mohamed Elhoseiny, Leonidas J. Guibas Introducti

Panos 268 Dec 12, 2022
simple demo codes for Learning to Teach with Dynamic Loss Functions

Learning to Teach with Dynamic Loss Functions This repo contains the simple demo for the NeurIPS-18 paper: Learning to Teach with Dynamic Loss Functio

Lijun Wu 15 Dec 30, 2021
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Softlearning Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is

Robotic AI & Learning Lab Berkeley 997 Dec 30, 2022
A Python wrapper for Google Tesseract

Python Tesseract Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and "read" the text embedded i

Matthias A Lee 4.6k Jan 05, 2023
Semantic Segmentation with Pytorch-Lightning

This is a simple demo for performing semantic segmentation on the Kitti dataset using Pytorch-Lightning and optimizing the neural network by monitoring and comparing runs with Weights & Biases.

Boris Dayma 58 Nov 18, 2022
Official Implementation of SWAGAN: A Style-based Wavelet-driven Generative Model

Official Implementation of SWAGAN: A Style-based Wavelet-driven Generative Model SWAGAN: A Style-based Wavelet-driven Generative Model Rinon Gal, Dana

55 Dec 06, 2022
Realtime YOLO Monster Detection With Non Maximum Supression

Realtime-YOLO-Monster-Detection-With-Non-Maximum-Supression Table of Contents In

5 Oct 07, 2022
Flower classification model that classifies flowers in 10 classes made using transfer learning (~85% accuracy).

flower-classification-inceptionV3 Flower classification model that classifies flowers in 10 classes. Training and validation are done using a pre-anot

Ivan R. Mršulja 1 Dec 12, 2021
Repo for flood prediction using LSTMs and HAND

Abstract Every year, floods cause billions of dollars’ worth of damages to life, crops, and property. With a proper early flood warning system in plac

1 Oct 27, 2021
A robotic arm that mimics hand movement through MediaPipe tracking.

La-Z-Arm A robotic arm that mimics hand movement through MediaPipe tracking. Hardware NVidia Jetson Nano Sparkfun Pi Servo Shield Micro Servos Webcam

Alfred 1 Jun 05, 2022
Books, Presentations, Workshops, Notebook Labs, and Model Zoo for Software Engineers and Data Scientists wanting to learn the TF.Keras Machine Learning framework

Books, Presentations, Workshops, Notebook Labs, and Model Zoo for Software Engineers and Data Scientists wanting to learn the TF.Keras Machine Learning framework

Google Cloud Platform 792 Dec 28, 2022
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English ⚖️ 🏆 🧑‍🎓 👩‍⚖️ Dataset Summary Inspired by the recent widespread use of th

95 Dec 08, 2022
Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis

WASP2 (Currently in pre-development): Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis Requ

McVicker Lab 2 Aug 11, 2022
Predictive AI layer for existing databases.

MindsDB is an open-source AI layer for existing databases that allows you to effortlessly develop, train and deploy state-of-the-art machine learning

MindsDB Inc 12.2k Jan 03, 2023