Animal Sound Classification (Cats Vrs Dogs Audio Sentiment Classification)

Overview

Animal Sound Classification (Cats Vrs Dogs Audio Sentiment Classification)

This is a simple audio classification api build to classify the sound of an audio, weather it is the cat or dog sound.

alt

Response

Given a .wav audio the model will classify what does the sound the audio belongs to either cat or dog.

{
  "predictions": {
    "class": "dog",
    "label": 1,
    "probability": 1.0
  },
  "success": true
}

Starting the server

To start server and start audio classification first you need to make sure you are in the server folder and run the following commands:

  1. creating a virtual environment
virtualenv venv && .\venv\Scripts\activate.bat
  1. installing packages
pip install -r requirements.txt
  1. Starting the server
python api/app.py

The server will start on a default port of 3001 and you will be able to make api request to the server to do audio classification.

Model Metrics

The following table shows all the metrics summary we get after training the model for few 15 epochs.

model name model description test accuracy validation accuracy train accuracy test loss validation loss train loss
cats-dogs-sound-cnn.pt audio sentiment classification for dogs and cats CNN. 90.7% 90.7% 93.5% 0.621 0.218 0.209

Classification report

The following is the classification report for the model on the test dataset.

# precision recall f1-score support
accuracy - - 90% 2305
macro avg 91% 90% 90% 2305
weighted avg 92% 89% 90% 2305

Confusion matrix

The following figure shows a confusion matrix for the classification model.

Audio Sentiment classification

If you hit the server at http://localhost:3001/classify you will be able to get the following expected response that is if the request method is POST and you provide the file expected by the server.

Expected Response

The expected response at http://localhost:3001/classify with a file audio of the right format will yield the following json response to the client.

{
  "predictions": {
    "class": "dog",
    "label": 1,
    "probability": 1.0
  },
  "success": true
}

Using curl

Make sure that you have the audio named cat.wav in the current folder that you are running your cmd otherwise you have to provide an absolute or relative path to the audio.

To make a curl POST request at http://localhost:3001/classify with the file cat.wav we run the following command.

# for cat
curl -X POST -F [email protected] http://127.0.0.1:3001/classify

# for dog
curl -X POST -F [email protected] http://127.0.0.1:3001/classify

Using Postman client

To make this request with postman we do it as follows:

  1. Change the request method to POST at http://127.0.0.1:3001/classify
  2. Click on form-data
  3. Select type to be file on the KEY attribute
  4. For the KEY type audio and select the audio you want to predict under value
  5. Click send

If everything went well you will get the following response depending on the face you have selected:

{
  "predictions": { "class": "dog", "label": 1, "probability": 1.0 },
  "success": true
}

Using JavaScript fetch api.

  1. First you need to get the input from html
  2. Create a formData object
  3. make a POST requests
res.json()) .then((data) => console.log(data));">
const input = document.getElementById("input").files[0];
let formData = new FormData();
formData.append("audio", input);
fetch("http://127.0.0.1:3001/classify", {
  method: "POST",
  body: formData,
})
  .then((res) => res.json())
  .then((data) => console.log(data));

If everything went well you will be able to get expected response.

{
  "predictions": { "class": "dog", "label": 1, "probability": 1.0 },
  "success": true
}

Notebooks

  • All notebooks for training and saving the models are found in the notebooks folder of this repository.
Owner
crispengari
ai || software development. (creator of initialiseur)
crispengari
Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers

Motionformer This is an official pytorch implementation of paper Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers. In this rep

Facebook Research 192 Dec 23, 2022
ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

Ibai Gorordo 18 Nov 06, 2022
Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Noah Getz 3 Jun 22, 2022
GitHub repository for "Improving Video Generation for Multi-functional Applications"

Improving Video Generation for Multi-functional Applications GitHub repository for "Improving Video Generation for Multi-functional Applications" Pape

Bernhard Kratzwald 328 Dec 07, 2022
Sharpness-Aware Minimization for Efficiently Improving Generalization

Sharpness-Aware-Minimization-TensorFlow This repository provides a minimal implementation of sharpness-aware minimization (SAM) (Sharpness-Aware Minim

Sayak Paul 54 Dec 08, 2022
Data and extra materials for the food safety publications classifier

Data and extra materials for the food safety publications classifier The subdirectories contain detailed descriptions of their contents in the README.

1 Jan 20, 2022
Code for our paper "MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction" published at ICCV 2021.

MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction This repository contains the code for the p

Sven 30 Jan 05, 2023
Pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"

Graph Neural Topic Model (GNTM) This is the pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Persp

Dazhong Shen 8 Sep 14, 2022
links and status of cool gradio demos

awesome-demos This is a list of some wonderful demos & applications built with Gradio. Here's how to contribute yours! 🖊️ Natural language processing

Gradio 96 Dec 30, 2022
Mengzi Pretrained Models

中文 | English Mengzi 尽管预训练语言模型在 NLP 的各个领域里得到了广泛的应用,但是其高昂的时间和算力成本依然是一个亟需解决的问题。这要求我们在一定的算力约束下,研发出各项指标更优的模型。 我们的目标不是追求更大的模型规模,而是轻量级但更强大,同时对部署和工业落地更友好的模型。

Langboat 424 Jan 04, 2023
[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021) Overview We release the code of the MVFNet (Multi-View Fusion Network).

Wenhao Wu 114 Nov 27, 2022
Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.

Optimum Transformers Accelerated NLP pipelines for fast inference 🚀 on CPU and GPU. Built with 🤗 Transformers, Optimum and ONNX runtime. Installatio

Aleksey Korshuk 115 Dec 16, 2022
tf2onnx - Convert TensorFlow, Keras and Tflite models to ONNX.

tf2onnx converts TensorFlow (tf-1.x or tf-2.x), tf.keras and tflite models to ONNX via command line or python api.

Open Neural Network Exchange 1.8k Jan 08, 2023
PushForKiCad - AISLER Push for KiCad EDA

AISLER Push for KiCad Push your layout to AISLER with just one click for instant

AISLER 31 Dec 29, 2022
Collection of sports betting AI tools.

sports-betting sports-betting is a collection of tools that makes it easy to create machine learning models for sports betting and evaluate their perf

George Douzas 109 Dec 31, 2022
PyTorch code for the "Deep Neural Networks with Box Convolutions" paper

Box Convolution Layer for ConvNets Single-box-conv network (from `examples/mnist.py`) learns patterns on MNIST What This Is This is a PyTorch implemen

Egor Burkov 515 Dec 18, 2022
Selective Wavelet Attention Learning for Single Image Deraining

SWAL Code for Paper "Selective Wavelet Attention Learning for Single Image Deraining" Prerequisites Python 3 PyTorch Models We provide the models trai

Bobo 9 Jun 17, 2022
A Simulation Environment to train Robots in Large Realistic Interactive Scenes

iGibson: A Simulation Environment to train Robots in Large Realistic Interactive Scenes iGibson is a simulation environment providing fast visual rend

Stanford Vision and Learning Lab 493 Jan 04, 2023
Robotics environments

Robotics environments Details and documentation on these robotics environments are available in OpenAI's blog post and the accompanying technical repo

Farama Foundation 121 Dec 28, 2022
Source code for our paper "Empathetic Response Generation with State Management"

Source code for our paper "Empathetic Response Generation with State Management" this repository is maintained by both Jun Gao and Yuhan Liu Model Ove

Yuhan Liu 3 Oct 08, 2022