Identify the emotion of multiple speakers in an Audio Segment

Overview

PR MIT License made-with-python


Logo

MevonAI - Speech Emotion Recognition

Identify the emotion of multiple speakers in a Audio Segment
Report Bug · Request Feature

Try the Demo Here

Open In Colab

Table of Contents

About The Project

Logo

The main aim of the project is to Identify the emotion of multiple speakers in a call audio as a application for customer satisfaction feedback in call centres.

Built With

Getting Started

Follow the Below Instructions for setting the project up on your local Machine.

Installation

  1. Create a python virtual environment
sudo apt install python3-venv
mkdir mevonAI
cd mevonAI
python3 -m venv mevon-env
source mevon-env/bin/activate
  1. Clone the repo
git clone https://github.com/SuyashMore/MevonAI-Speech-Emotion-Recognition.git
  1. Install Dependencies
cd MevonAI-Speech-Emotion-Recognition/
cd src/
sudo chmod +x setup.sh
./setup.sh

Running the Application

  1. Add audio files in .wav format for analysis in src/input/ folder

  2. Run Speech Emotion Recognition using

python3 speechEmotionRecognition.py
  1. By Default , the application will use the Pretrained Model Available in "src/model/"

  2. Diarized files will be stored in "src/output/" folder

  3. Predicted Emotions will be stored in a separate .csv file in src/ folder

Here's how it works:

Speaker Diarization

  • Speaker diarisation (or diarization) is the process of partitioning an input audio stream into homogeneous segments according to the speaker identity. It can enhance the readability of an automatic speech transcription by structuring the audio stream into speaker turns and, when used together with speaker recognition systems, by providing the speaker’s true identity. It is used to answer the question "who spoke when?" Speaker diarisation is a combination of speaker segmentation and speaker clustering. The first aims at finding speaker change points in an audio stream. The second aims at grouping together speech segments on the basis of speaker characteristics.

Logo

Feature Extraction

  • When we do Speech Recognition tasks, MFCCs is the state-of-the-art feature since it was invented in the 1980s.This shape determines what sound comes out. If we can determine the shape accurately, this should give us an accurate representation of the phoneme being produced. The shape of the vocal tract manifests itself in the envelope of the short time power spectrum, and the job of MFCCs is to accurately represent this envelope.

Logo

The Above Image represents the audio Waveform , the below image shows the converted MFCC Output on which we will Run our CNN Model.

CNN Model

  • Use Convolutional Neural Network to recognize emotion on the MFCCs with the following Architecture
model = Sequential()

#Input Layer
model.add(Conv2D(32, 5,strides=2,padding='same',
                 input_shape=(13,216,1)))
model.add(Activation('relu'))
model.add(BatchNormalization())

#Hidden Layer 1
model.add(Conv2D(64, 5,strides=2,padding='same',))
model.add(Activation('relu'))
model.add(BatchNormalization())

#Hidden Layer 2
model.add(Conv2D(64, 5,strides=2,padding='same',))
model.add(Activation('relu'))
model.add(BatchNormalization())

#Flatten Conv Net
model.add(Flatten())

#Output Layer
model.add(Dense(7))
model.add(Activation('softmax'))

Training the Model

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgements

FAQ

  • How do I do specifically so and so?
    • Create an Issue to this repo , we will respond to the query
Muzic: Music Understanding and Generation with Artificial Intelligence

Muzic is a research project on AI music that empowers music understanding and generation with deep learning and artificial intelligence.

Microsoft 2.6k Dec 30, 2022
Graphical interface to control granular sound synthesis.

Granular sound synthesis interface SoundGrain is a graphical interface where users can draw and edit trajectories to control granular sound synthesis

Olivier Bélanger 122 Dec 10, 2022
Music Streaming Platform based on full implementation of DBSM

Symphony Music Streaming Platform based on full implementation of DBSM List of Commands Insert User (INSERT) Function to implement input in USER Get a

Parth Maradia 1 Nov 12, 2021
Audio processor to map oracle notes in the VoG raid in Destiny 2 to call outs.

vog_oracles Audio processor to map oracle notes in the VoG raid in Destiny 2 to call outs. Huge thanks to mzucker on GitHub for the note detection cod

19 Sep 29, 2022
SomaFM Plugin for Kodi

SomaFM XBMC Plugin This description is a bit outdated. You can simply install this addon by browsing the official repositories from within Kodi. Insta

7 Jan 21, 2022
An 8D music player made to enjoy Halloween this year!🤘

HAPPY HALLOWEEN buddy! Split Player Hello There! Welcome to SplitPlayer... Supposed To Be A 8DPlayer.... You Decide.... It can play the ordinary audio

Akshat Kumar Singh 1 Nov 04, 2021
cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

beetbox 419 Dec 26, 2022
ᴀ ʙᴏᴛ ᴛʜᴀᴛ ᴄᴀɴ ᴘʟᴀʏ ᴍᴜꜱɪᴄ ɪɴ ᴛᴇʟᴇɢʀᴀᴍ ɢʀᴏᴜᴘ ᴏɴ ᴠᴏɪᴄᴇ ᴄᴀʟʟ

GJ516 LOVER'S ııllıllı ♥️ ➤⃝Gᴊ516_ᴍᴜꜱɪᴄ_ʙᴏᴛ ♥️ ıllıllı ᴀ ʙᴏᴛ ᴛʜᴀᴛ ᴄᴀɴ ᴘʟᴀʏ ᴍᴜꜱɪᴄ ɪɴ ᴛᴇʟᴇɢʀᴀᴍ ɢʀᴏᴜᴘ ᴏɴ ᴠᴏɪᴄᴇ ᴄᴀʟʟ Requirements 📝 FFmpeg NodeJS nodesou

1 Nov 22, 2021
LibXtract is a simple, portable, lightweight library of audio feature extraction functions.

LibXtract LibXtract is a simple, portable, lightweight library of audio feature extraction functions. The purpose of the library is to provide a relat

Jamie Bullock 215 Nov 16, 2022
Dataset and baseline code for the VocalSound dataset (ICASSP2022).

VocalSound: A Dataset for Improving Human Vocal Sounds Recognition Introduction Citing Download VocalSound Dataset Details Baseline Experiment Contact

Yuan Gong 58 Jan 03, 2023
🎵 A music bot for discord servers!

music bot A music bot for Discord Servers Features Play songs in your discord server Get the lyrics without going on a web explorer Commands Command P

1 Jul 25, 2022
DaisyXmusic ❤ A bot that can play music on Telegram Group and Channel Voice Chats

DaisyXmusic ❤ is the best and only Telegram VC player with playlists, Multi Playback, Channel play and more

TeamOfDaisyX 34 Oct 22, 2022
Speech Algorithms Collections

Speech Algorithms Collections

Ryuk 498 Jan 06, 2023
Okaeri-Music is a telegram music bot project, allow you to play music on voice chat group telegram.

🗄️ PROJECT MUSIC,THIS IS MAINTAINED Okaeri-Music is a telegram bot project that's allow you to play music on telegram voice chat group Features 🔥 Th

Okaeri-Project 2 Dec 23, 2021
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

Basic Pitch is a Python library for Automatic Music Transcription (AMT), using lightweight neural network developed by Spotify's Audio Intelligence La

Spotify 1.4k Jan 01, 2023
Library for working with sound files of the format: .ogg, .mp3, .wav

Library for working with sound files of the format: .ogg, .mp3, .wav. By work is meant - playing sound files in a straight line and in the background, obtaining information about the sound file (auth

Romanin 2 Dec 15, 2022
Audio book player for senior visually impaired.

PI Zero W Audio Book Motivation and requirements My dad is practically blind and at 80 years has trouble hearing and operating tiny or more complicate

Andrej Hosna 29 Dec 25, 2022
A python program for visualizing MIDI files, and displaying them in a spiral layout

SpiralMusic_python A python program for visualizing MIDI files, and displaying them in a spiral layout For a hardware version using Teensy & LED displ

Gavin 6 Nov 23, 2022
Real-Time Spherical Microphone Renderer for binaural reproduction in Python

ReTiSAR Implementation of the Real-Time Spherical Microphone Renderer for binaural reproduction in Python [1][2]. Contents: | Requirements | Setup | Q

Division of Applied Acoustics at Chalmers University of Technology 51 Dec 17, 2022
Omniscient Mozart, being able to transcribe everything in the music, including vocal, drum, chord, beat, instruments, and more.

OMNIZART Omnizart is a Python library that aims for democratizing automatic music transcription. Given polyphonic music, it is able to transcribe pitc

MCTLab 1.3k Jan 08, 2023