Unimodal Face Classification with Multimodal Training

This is a PyTorch implementation of the following paper:

Unimodal Face Classification with Multimodal Training

Wenbin Teng (Boston University), Chongyang Bai (Dartmouth College)

Abstract: We propose a Multimodal Training Unimodal Test (MTUT) framework for robust face classification, which exploits the cross-modality relationship during training and applies it as a complementary of the imperfect single modality input during testing. Technically, during training, the framework (1) builds both intra-modality and cross-modality autoencoders with the aid of facial attributes to learn latent embeddings as multimodal descriptors, (2) proposes a novel multimodal embedding divergence loss to align the heterogeneous features from different modalities, which also adaptively avoids the useless modality (if any) from confusing the model. This way, the learned autoencoders can generate robust embeddings in single-modality face classification on test stage. We evaluate our framework in two face classification datasets and two kinds of testing input: (1) poor-condition image and (2) point cloud or 3D face mesh, when both 2D and 3D modalities are available for training.

The proposed method applies both 2D and 3D encoder to extract the embeddings of each individual modalities. Divergence between both embeddings is minimized adaptively through measuring the classification loss. Based on the type of testing modality, we use certain decoder to reconstruct 2D and 3D inputs from feature embeddings. An overview of the proposed network is shown in the following picture:

Unimodal Face Classification with Multimodal Training

Related tags

Overview

Unimodal Face Classification with Multimodal Training

Owner

Wenbin Teng

SGPT: Multi-billion parameter models for semantic search

Repo 4 basic seminar §How to make human machine readable"

Codes for 'Dual Parameterization of Sparse Variational Gaussian Processes'

Unsupervised clustering of high content screen samples

A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN)

MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

Training a deep learning model on the noisy CIFAR dataset

Implementation of ReSeg using PyTorch

Self-labelling via simultaneous clustering and representation learning. (ICLR 2020)

Plugin adapted from Ultralytics to bring YOLOv5 into Napari

Deep learning PyTorch library for time series forecasting, classification, and anomaly detection

git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

Semi-Supervised Learning, Object Detection, ICCV2021

Clean and readable code for Decision Transformer: Reinforcement Learning via Sequence Modeling

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

My freqtrade strategies

The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"

The code for paper Efficiently Solve the Max-cut Problem via a Quantum Qubit Rotation Algorithm

YoloAll is a collection of yolo all versions. you you use YoloAll to test yolov3/yolov5/yolox/yolo_fastest

Not Suitable for Work (NSFW) classification using deep neural network Caffe models.