Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Last update: Nov 12, 2022

Overview

Light-SERNet

This is the Tensorflow 2.x implementation of our paper "Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition", submitted in ICASSP 2022.

In this paper, we propose an efficient and lightweight fully convolutional neural network(FCNN) for speech emotion recognition in systems with limited hardware resources. In the proposed FCNN model, various feature maps are extracted via three parallel paths with different filter sizes. This helps deep convolution blocks to extract high-level features, while ensuring sufficient separability. The extracted features are used to classify the emotion of the input speech segment. While our model has a smaller size than that of the state-of-the-art models, it achieves a higher performance on the IEMOCAP and EMO-DB datasets.

Run

1. Clone Repository

$ git clone https://github.com/AryaAftab/LIGHT-SERNET.git
$ cd LIGHT-SERNET/

2. Requirements

Tensorflow >= 2.3.0
Numpy >= 1.19.2
Tqdm >= 4.50.2
Matplotlib> = 3.3.1
Scikit-learn >= 0.23.2

$ pip install -r requirements.txt

3. Data:

Download EMO-DB and IEMOCAP(requires permission to access) datasets
extract them in data folder

4. Prepare datasets :

Use the following code to convert each dataset to the desired size(second):

$ python utils/segment/segment_dataset.py -dp data/{dataset_folder} -ip utils/DATASET_INFO.json -d {datasetname_in_jsonfile} -l {desired_size(seconds)}

For example, for EMO-DB Dataset :

$ python utils/segment/segment_dataset.py -dp data/EMO-DB -ip utils/DATASET_INFO.json -d EMO-DB -l 3

5. Set hyperparameters and training config :

You only need to change the constants in the hyperparameters.py to set the hyperparameters and the training config.

6. Strat training:

Use the following code to train the model on the desired dataset with the desired cost function.

Note 1: The database name is the name of the database folder after segmentation.
Note 2: The results for the confusion matrix are saved in the result folder.

$ python train.py -dn {dataset_name_after_segmentation} -ln {cost_function_name}

For example, for EMO-DB Dataset :

$ python train.py -dn EMO-DB_3s_Segmented -ln focal

Citation

If you find our code useful for your research, please consider citing:

@article{aftab2021light,
  title={Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition},
  author={Aftab, Arya and Morsali, Alireza and Ghaemmaghami, Shahrokh and Champagne, Benoit},
  journal={arXiv preprint arXiv:2110.03435},
  year={2021}
}

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Related tags

Overview

Light-SERNet

Run

1. Clone Repository

2. Requirements

3. Data:

4. Prepare datasets :

5. Set hyperparameters and training config :

6. Strat training:

Citation

Owner

Arya Aftab

This is the pytorch implementation for the paper: Learning Accurate Performance Predictors for Ultrafast Automated Model Compression, which is in submission to TPAMI

The FIRST GANs-based omics-to-omics translation framework

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

Self-supervised Label Augmentation via Input Transformations (ICML 2020)

An efficient and easy-to-use deep learning model compression framework

🔥 Real-time Super Resolution enhancement (4x) with content loss and relativistic adversarial optimization 🔥

SeqAttack: a framework for adversarial attacks on token classification models

AI创造营：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

CVPR2021 Workshop - HDRUNet: Single Image HDR Reconstruction with Denoising and Dequantization.

Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

FaceOcc: A Diverse, High-quality Face Occlusion Dataset for Human Face Extraction

ADSPM: Attribute-Driven Spontaneous Motion in Unpaired Image Translation

Ppq - A powerful offline neural network quantization tool with custimized IR

Lightweight Cuda Renderer with Python Wrapper.

This is official implementaion of paper "Token Shift Transformer for Video Classification".

A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019).

tmm_fast is a lightweight package to speed up optical planar multilayer thin-film device computation.

A Lighting Pytorch Framework for Recommendation System, Easy-to-use and Easy-to-extend.

Data visualization app for H&M competition in kaggle

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Related tags

Overview

Light-SERNet

Run

1. Clone Repository

2. Requirements

3. Data:

4. Prepare datasets :

5. Set hyperparameters and training config :

6. Strat training:

Citation

Owner

Arya Aftab

This is the pytorch implementation for the paper: *Learning Accurate Performance Predictors for Ultrafast Automated Model Compression*, which is in submission to TPAMI

The FIRST GANs-based omics-to-omics translation framework

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

Self-supervised Label Augmentation via Input Transformations (ICML 2020)

An efficient and easy-to-use deep learning model compression framework

🔥 Real-time Super Resolution enhancement (4x) with content loss and relativistic adversarial optimization 🔥

SeqAttack: a framework for adversarial attacks on token classification models

AI创造营 ：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

CVPR2021 Workshop - HDRUNet: Single Image HDR Reconstruction with Denoising and Dequantization.

Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

FaceOcc: A Diverse, High-quality Face Occlusion Dataset for Human Face Extraction

ADSPM: Attribute-Driven Spontaneous Motion in Unpaired Image Translation

Ppq - A powerful offline neural network quantization tool with custimized IR

Lightweight Cuda Renderer with Python Wrapper.

This is official implementaion of paper "Token Shift Transformer for Video Classification".

A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019).

tmm_fast is a lightweight package to speed up optical planar multilayer thin-film device computation.

A Lighting Pytorch Framework for Recommendation System, Easy-to-use and Easy-to-extend.

Data visualization app for H&M competition in kaggle

This is the pytorch implementation for the paper: Learning Accurate Performance Predictors for Ultrafast Automated Model Compression, which is in submission to TPAMI

AI创造营：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人