Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

Last update: Dec 12, 2022

Overview

Visual Transformer for Facial Emotion Recognition (FER)

This project has the aim to build an efficient Visual Transformer for the Facial Emotion Recognition (FER) task. Project is interally on Python Notebook, hosted on Google Colab with a runtime environment given by NVIDIA P100 setup.

Dataset

Dataset is formed by 8 different classes integrated by 3 different subsets:

FER-2013: It contains approximately 35,000 facial RGB images of different expressions with size restricted to 48×48, and the main labels of it can be divided into 7 types: 0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral. The Disgust expression has the minimal number of images – 600, while other labels have nearly 5,000 samples each.
CK+: The Extended Cohn-Kanade (CK+) dataset contains some images extrapolated from 593 video sequences from a total of 123 different subjects, ranging from 18 to 50 years of age with a variety of genders and heritage. Each video shows a facial shift from the neutral expression to a targeted peak expression, recorded at 30 frames per second (FPS) with a resolution of either 640x490 or 640x480 pixels. Unfortunately, we don't have the entire generated datasets but we stored only 1000 images with high variance from a kaggle repository.
AffectNet: It is a large facial expression dataset with 41.000 images classified in eight categories (neutral, happy, angry, sad, fear, surprise, disgust, contempt) of facial expressions along with the intensity of valence and arousal.

Data loading, integration and analysis are in the first part of the ViT-Emotion-Recognition.ipynb notebook. The result dataset is an integration divided by two subset (train an val folder) with 8 subfolder with the scope of the class label.

Data Management

Given an eterogeneous dataset on a fine-tuned transformer, we had to manage some image features:

Data Scaling: Pre-trained models are transformers with different configurations that train them on ImageNet dataset for the object detection with images on 224x224. We use the same scale and convert input data to this size.
Data Channels: We use RGB channels for each images for the same reason of the previous point.
Data Augmentation: We use brightness, rotation, scaling, translation and zooming augmentation to improve the amount of the samples and balance the dataset classes variation.

Model

Overview of the model: The input image is split into fixed-sized patches; the embedding phase is preceded by a convolutional layer with a kernel 16x16 with a stride of 16x16. The output of the convolution is then used for the embedding phase where the resulting vector is given by the sum of the position embedding and a linear embedding in a projection space of 768 dimensions. The embedded patches are then processed by a set of 11 sequential Transformer Encoders. For the classification task, the final layer is a linear layer with a 8 dimensional output for our eight emotions. The model we rely on is pretrained on ImageNet and finetuned with the datased described above.

Source: https://github.com/google-research/vision_transformer

Authors

Andrea Gurioli (@andreagurioli1995)
Mario Sessa (@kode-git)

License

You might also like...

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction. It uses a customized encoder decoder architecture with spatio-temporal convolutions and channel gating to capture and interpolate complex motion trajectories between frames to generate realistic high frame rate videos. This repository contains original source code for the paper accepted to CVPR 2021.

280 Dec 23, 2022

Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Demonstration of OpenVINO techniques - Model-division and a simplest-way to support custom layers Description: Model Optimizer in Intel(r) OpenVINO(tm

12 Nov 9, 2022

Automatic Attendance marker for LMS Practice School Division, BITS Pilani

LMS Attendance Marker Automatic script for lazy people to mark attendance on LMS for Practice School 1. Setup Add your LMS credentials and time slot t

3 Jun 12, 2021

Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

fwhr-calc-website This project is to automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azur

1 Feb 7, 2022

Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 different colors, eraser and a recording option that records your session and saves it in a "recordings" folder. Use index finger to draw and two or more fingers to move around and select items. Future version will contain more functionalities like changeable thickness, color palette, integration with zoom and google meet etc.

hand-write Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 differ

27 Dec 16, 2022

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

简介通过PaddlePaddle框架复现了论文 Real-time Convolutional Neural Networks for Emotion and Gender Classification 中提出的两个模型，分别是SimpleCNN和MiniXception。利用 imdb_crop

8 Mar 11, 2022

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation Ported from https://github.com/hzwer/arXiv2020-RIFE Dependencies NumPy

49 Jan 7, 2023

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

3k Jan 4, 2023

A Moonraker plug-in for real-time compensation of frame thermal expansion

Frame Expansion Compensation A Moonraker plug-in for real-time compensation of frame thermal expansion. Installation Credit to protoloft, from whom I

58 Jan 2, 2023

Comments

Pre-processing phase removes some images
After the Data Analysis on the AVFER, data from the splitting phase is different after the pre-processing, we need to check

Check the removing of png can influence the number

Control if there are some changes after the reshaping

Be care about the possible miss-indentation of the os.remove(fl)

I need to run again the data integration and data analysis of the AVFER before test features variation on the pre-processing phase.
bug
opened by kode-git 2

Releases(0.3.12)

0.3.12(May 16, 2022)
Adding presentation and official documentation

Splitting notebook per sections

Adding additional comments to the code

Source code(tar.gz)
Source code(zip)
0.3.11(May 14, 2022)
Adding ViT-B/16/S model on 25 epochs with constant learning rate

Checking on training and validation accuracy/loss parameters according to the training log

Display results on standalone plots

Source code(tar.gz)
Source code(zip)
vfer_small_25.pth(327.37 MB)
vfer_small_25_history_loss.pkl(490 bytes)
vfer_small_25_history_train.pkl(233 bytes)
vfer_small_25_history_val.pkl(233 bytes)
0.3.10(May 13, 2022)
Adding evaluation for ResNet18

Debugging on SAM model evaluation

Improvment Training Plot support curves on N < 5 lines

Model adaptation during loading on evaluation (standalone) with adapting on backbones

Source code(tar.gz)
Source code(zip)
0.3.9(May 12, 2022)
Adding ResNet 18 (11M parameters)

Upload history for loss and accuracy

Upload epoch 20 dump

Upload final model checkpoint

Source code(tar.gz)
Source code(zip)
resnet18_25.pth(42.72 MB)
resnet18_25_history_loss.pkl(490 bytes)
resnet18_25_history_train.pkl(7.05 KB)
resnet18_25_history_val.pkl(7.05 KB)
0.3.8(May 11, 2022)
Adding ViT-B/16/SG

Gradual learning rate every 10 epochs

SGD optimization

Adding loss and accuracy histories

Source code(tar.gz)
Source code(zip)
vfer_grad_25.pth(327.37 MB)
vfer_grad_25_history_loss.pkl(490 bytes)
vfer_grad_25_history_train.pkl(233 bytes)
vfer_grad_25_history_val.pkl(233 bytes)
0.3.7(May 11, 2022)
Adding VIT-B/16 model checkpoint using customized learning rate scheduler

Adding SAM to the model as a optimization algorithm to smooth the loss landscape

Adding history for training and validation loss

Adding history for training and validation accuracy

Source code(tar.gz)
Source code(zip)
vfer_sam_25.pth(327.37 MB)
vfer_sam_25_history_loss.pkl(490 bytes)
vfer_sam_25_history_train.pkl(233 bytes)
vfer_sam_25_history_val.pkl(233 bytes)
0.3.6(May 9, 2022)
Configuration of resnet18 with gradual learning rate

Starting learning rate at 0.01

Epochs 50 with plateau at 25

Loading training and validation accuracy histories

Source code(tar.gz)
Source code(zip)
resnet18.pth(44.69 MB)
resnet18_25_history_loss.pkl(490 bytes)
resnet18_history_train.pkl(14.17 KB)
resnet18_history_val.pkl(14.17 KB)
0.3.5(May 9, 2022)
Adding SAM optimization for VIT-B/16

Defining closure for sharpness-aware minimization efficiency

Debugging model loader for the checkpoints recovery

Source code(tar.gz)
Source code(zip)
0.2.5(May 7, 2022)
Upload optimal model on AffectNet

Defines evaluation plots on accuracy and loss values

Source code(tar.gz)
Source code(zip)
vfer_grad_25.pth(327.37 MB)
vfer_grad_25_history_loss.pkl(130 bytes)
vfer_grad_25_history_train.pkl(1.48 KB)
vfer_grad_25_history_val.pkl(1.48 KB)
0.2.4(May 6, 2022)
Adding gradual learning rate

Modify dataset with AffectNet in validation and testing set

Adding scheduler for learning rate adjustment

Source code(tar.gz)
Source code(zip)
vfer_grad_50.pth(327.37 MB)
vfer_grad_50_history_train.pkl(2.86 KB)
vfer_grad_50_history_val.pkl(2.86 KB)
0.2.3(Apr 29, 2022)
Extends data analysis for the AffectNet, CK+48 and FER-2013

Creation of AVFER with the following features

Splitting initial dataset in training and testing set with ratio 80/20

Splitting validation and training set with ratio 90/10

Testing and validation set contains only samples from AffectNet (RGB and high quality images)

Drive of AVFER: https://drive.google.com/drive/folders/1-8WG_CNrU3chL_OHpkM8EYx3Bm129cnE?usp=sharing
Source code(tar.gz)
Source code(zip)
0.2.2(Apr 27, 2022)
Adjust train and test splitting

Balancing augmentation over 150.000 samples

Removing augmentation on validation to increment variability

Loading of vfer for 5, 15 and 25 epochs of training on the result dataset

Loading history for training and validation accuracy/loss

Source code(tar.gz)
Source code(zip)
epoch_15_vfer_small_50(327.37 MB)
epoch_15_vfer_small_50.pth(327.37 MB)
epoch_25_vfer_small.pth(327.37 MB)
epoch_25_vfer_small_50(327.37 MB)
epoch_5_vfer_small_50(327.37 MB)
vfer_small_15_on_50_history_loss.pkl(220 bytes)
vfer_small_15_on_50_history_train.pkl(3.00 KB)
vfer_small_15_on_50_history_val.pkl(3.00 KB)
vfer_small_25_on_50_history_loss.pkl(220 bytes)
vfer_small_25_on_50_history_train.pkl(3.00 KB)
vfer_small_25_on_50_history_val.pkl(3.00 KB)
0.2.1(Apr 24, 2022)
Adding integration with partial training during the transformer weights improvements (best-fit)

Updating of the VFER model on 5/50 training epochs with 62% accuracy (state-of-art of AffectNet visual transformer)

Integrating with fluid system for face detection in the cropping phase

Source code(tar.gz)
Source code(zip)
epoch_5_vfer_small_50(327.37 MB)
0.2.0(Apr 22, 2022)
Adjust normalization parameters from [0.48, 0.28] to 0.5

Balancing dataset with not augment element in validation

Resize the training set on double capacity for less epochs on training phase

Adding featuring and inference on video capture tools in OpenCV for models applications

Source code(tar.gz)
Source code(zip)
0.1.0(Apr 18, 2022)
Model dump for batch 50 on 12 epochs for the VFER transformer, accuracy of 69%

Model dump for batch 60 on 24 epochs for the VFER transformer, accuracy of 70%

Model dump for batch 60 on 50 epochs for the VFER transformer, accuracy of 71%

Debugging notebook for the loss evaluation

Adding every section until the evaluation

Integration of the dataset available here

Source code(tar.gz)
Source code(zip)
vfer_base_12.zip(304.26 MB)
vfer_base_24.zip(304.25 MB)
vfer_base_50.zip(608.51 MB)

Owner

Mario Sessa

Computer Scientist for /dev/null. Master Student in Computer Science.

GitHub Repository

Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019

Class-Balanced Loss Based on Effective Number of Samples Tensorflow code for the paper: Class-Balanced Loss Based on Effective Number of Samples Yin C

546 Jan 08, 2023

The Fundamental Clustering Problems Suite (FCPS) summaries 54 state-of-the-art clustering algorithms, common cluster challenges and estimations of the number of clusters as well as the testing for cluster tendency.

FCPS Fundamental Clustering Problems Suite The package provides over sixty state-of-the-art clustering algorithms for unsupervised machine learning pu

9 Nov 27, 2022

An OpenAI Gym environment for multi-agent car racing based on Gym's original car racing environment.

Multi-Car Racing Gym Environment This repository contains MultiCarRacing-v0 a multiplayer variant of Gym's original CarRacing-v0 environment. This env

56 Nov 01, 2022

Pytorch implementation of Deep Recursive Residual Network for Super Resolution (DRRN)

DRRN-pytorch This is an unofficial implementation of "Deep Recursive Residual Network for Super Resolution (DRRN)", CVPR 2017 in Pytorch. [Paper] You

192 Dec 12, 2022

I will implement Fastai in each projects present in this repository.

DEEP LEARNING FOR CODERS WITH FASTAI AND PYTORCH The repository contains a list of the projects which I have worked on while reading the book Deep Lea

43 Dec 20, 2022

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks We provide the code (in PyTorch) and datasets for our paper "On Size-Orient

4 Jun 18, 2022

DeiT: Data-efficient Image Transformers

DeiT: Data-efficient Image Transformers This repository contains PyTorch evaluation code, training code and pretrained models for DeiT (Data-Efficient

3.2k Jan 06, 2023

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

182 Dec 30, 2022

This git repo contains the implementation of my ML project on Heart Disease Prediction

Introduction This git repo contains the implementation of my ML project on Heart Disease Prediction. This is a real-world machine learning model/proje

1 Feb 02, 2022

PFFDTD is an open-source FDTD simulator for 3D room acoustics

34 Nov 24, 2022

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

TUCH This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright License fo

45 Jan 07, 2023

Generalized and Efficient Blackbox Optimization System.

OpenBox Doc | OpenBox中文文档 OpenBox: Generalized and Efficient Blackbox Optimization System OpenBox is an efficient and generalized blackbox optimizatio

238 Dec 29, 2022

AI pipelines for Nvidia Jetson Platform

Jetson Multicamera Pipelines Easy-to-use realtime CV/AI pipelines for Nvidia Jetson Platform. This project: Builds a typical multi-camera pipeline, i.

96 Dec 23, 2022

Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds Björn Michele1), Alexandre Boulch1), Gilles Puy1), Maxime Bucher1) and Rena

15 Dec 22, 2022

A privacy-focused, intelligent security camera system.

Self-Hosted Home Security Camera System A privacy-focused, intelligent security camera system. Features: Multi-camera support w/ minimal configuration

175 Jan 01, 2023

Adversarial examples to the new ConvNeXt architecture

Adversarial examples to the new ConvNeXt architecture To get adversarial examples to the ConvNeXt architecture, run the Colab: https://github.com/stan

19 Sep 18, 2022

Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021).

AA-RMVSNet Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021) in PyTorch. paper link: arXiv | CVF Change Log Ju

97 Dec 30, 2022

Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.

Optimum Transformers Accelerated NLP pipelines for fast inference 🚀 on CPU and GPU. Built with 🤗 Transformers, Optimum and ONNX runtime. Installatio

115 Dec 16, 2022

Official code repository for Continual Learning In Environments With Polynomial Mixing Times

Official code for Continual Learning In Environments With Polynomial Mixing Times Continual Learning in Environments with Polynomial Mixing Times This

1 Dec 19, 2021

A `Neural = Symbolic` framework for sound and complete weighted real-value logic

Logical Neural Networks LNNs are a novel Neuro = symbolic framework designed to seamlessly provide key properties of both neural nets (learning) and s

138 Dec 19, 2022

Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

Related tags

Overview

Visual Transformer for Facial Emotion Recognition (FER)

Dataset

Data Management

Model

Authors

License

You might also like...

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Automatic Attendance marker for LMS Practice School Division, BITS Pilani

Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

A Moonraker plug-in for real-time compensation of frame thermal expansion

Comments

Pre-processing phase removes some images

Releases(0.3.12)

0.3.12(May 16, 2022)

0.3.11(May 14, 2022)

0.3.10(May 13, 2022)

0.3.9(May 12, 2022)

0.3.8(May 11, 2022)

0.3.7(May 11, 2022)

0.3.6(May 9, 2022)

0.3.5(May 9, 2022)

0.2.5(May 7, 2022)

0.2.4(May 6, 2022)

0.2.3(Apr 29, 2022)

0.2.2(Apr 27, 2022)

0.2.1(Apr 24, 2022)

0.2.0(Apr 22, 2022)

0.1.0(Apr 18, 2022)

Owner

Mario Sessa

Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019

The Fundamental Clustering Problems Suite (FCPS) summaries 54 state-of-the-art clustering algorithms, common cluster challenges and estimations of the number of clusters as well as the testing for cluster tendency.

An OpenAI Gym environment for multi-agent car racing based on Gym's original car racing environment.

Pytorch implementation of Deep Recursive Residual Network for Super Resolution (DRRN)

I will implement Fastai in each projects present in this repository.

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

DeiT: Data-efficient Image Transformers

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

This git repo contains the implementation of my ML project on Heart Disease Prediction

PFFDTD is an open-source FDTD simulator for 3D room acoustics

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

Generalized and Efficient Blackbox Optimization System.

AI pipelines for Nvidia Jetson Platform

Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

A privacy-focused, intelligent security camera system.

Adversarial examples to the new ConvNeXt architecture

Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021).

Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.

Official code repository for Continual Learning In Environments With Polynomial Mixing Times

A `Neural = Symbolic` framework for sound and complete weighted real-value logic