Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Overview

Face Recognition: Too Bias, or Not Too Bias?

Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition: too bias, or not too bias? " In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0-1. 2020.
@inproceedings{robinson2020face,
               title={Face recognition: too bias, or not too bias?},
               author={Robinson, Joseph P and Livitz, Gennady and Henon, Yann and Qin, Can and Fu, Yun and Timoner, Samson},
               booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
               pages={0--1},
               year={2020}
             }
    

Robinson, Joseph P., Can Qin, Yann Henon, Samson Timoner, and Yun Fu. "Balancing Biases and Preserving Privacy on Balanced Faces in the Wild." In CoRR arXiv:2103.09118, (2021).
@article{robinson2021balancing,
        title={Balancing Biases and Preserving Privacy on Balanced Faces in the Wild},
        author={Robinson, Joseph P and Qin, Can and Henon, Yann and Timoner, Samson and Fu, Yun},
        journal={arXiv preprint arXiv:2103.09118},
        year={2021}
       }
    

Teaser

Balanced Faces in the Wild (BFW): Data, Code, Evaluations

version: 0.4.5 (following Semantic Versioning Scheme-- learn more here, https://semver.org)

Intended to address problems of bias in facial recognition, we built BFW as a labeled data resource made available for evaluating recognition systems on a corpus of facial imagery made-up of EQUAL face count for all subjects: EQUAL across demographics, and, thus, face data balanced in faces per subject, individuals per ethnicity, and ethnicities per gender or vise versa.

Data can be accessed via Google form or Microsft form. Do not hesitate to report an issue for any and all inquiries.

Project Overview

This project investigates bias in automatic facial recognition (FR). Specifically, subjects are grouped into predefined subgroups based on gender, ethnicity, and soon-to-be age. For this, we propose a novel image collection called Balanced Faces in the Wild (BFW), which is balanced across eight subgroups (i.e., 800 face images of 100 subjects, each with 25 face samples). Thus, along with the name (i.e., identification) labels and task protocols (e.g., list of pairs for face verification, pre-packaged data-table with additional metadata and labels, etc.), BFW clearly groups into ethnicities (i.e., Asian (A), Black (B), Indian (I), and White (W)) and genders (i.e., Females (F) and Males (M)). Thus, the motivation and intent are that BFW will provide a proxy to characterize FR systems with demographic-specific analysis now possible. For instance, various confusion metrics, along with the predefined criteria (i.e., score threshold), are fundamental when characterizing performance ratings of FR systems. The following visualization summarizes the confusion metrics in a way that relates to the different measurements.

metrics

As discussed, the motivation for designing, building, and releasing BFW for research purposes has been discussed. We expect the data, all-in-all, will continue to evolve. Nonetheless, as is, there are vast options on ways to advance technology and our understanding thereof. Let us now focus on the contents of the repo (i.e., code-base) for which was created to support the data of BFW (i.e., data proxy), making all experiments in paper easily reproducible and, thus, the work more friendly for getting started.

Experimental-based contributions and findings

Several observations were made that widened our understanding of bias in FR. Views were demonstrated experimentally, with all code used in experiments added as a part of this repo.

Score sensitivity

For instance, it is shown that the scoring sensitivity within different subgroups verifies. That is, faces of the same identity tend to shift in expected values (e.g., given a correct pair of Black faces, on average, have similarity scores smaller than a true pair of White, and the middle range of scores for Males compared to Females). This is demonstrated using fundamental signal detection models (SDM), along with detection error trade-off (DET) curves.

Global threshold

Once an FR system is deployed, a criterion (i.e., threshold) is set (or tunable) such that similarity scores that do not pass are assumed false matches and are filtered out of the candidate pool for potential true pairs. In other words, thresholds act as decision boundaries that map scores (or distances) to nominal values such as genuine or imposter. Considering the variable sensitivity found prior, intuition tells us that a variable threshold is optimal. Thus, returning to the fundamental concepts of signal detection theory, we show that using a single, global threshold yields skewed performance ratings across different subgroups. For this, we demonstrate that subgroup-specific thresholds are optimal in terms of overall performance and balance across subgroups.

All-in-all

All of this and more (i.e., evaluation and analysis of FR systems on BFW data, along with data structures and implementation schemes optimized for the problems at hand, are included in modules making up the project and demonstrated in notebook tutorials). We will continue to add tools for a fair analysis of FR systems. Thus, not only the experiments but also the data we expect to grow. All contributions are not only welcome but are entirely encouraged.

Here are quick links to key aspects of this resource.

Register and download via this form.

Final note. Thee repo is a work-in-progress. Certainly, it is ready to be cloned and used; however, expect regular improvements, both in the implementation and documentation (i.e., getting started instructions will be enhanced). For now, it is recommended to begin with README files listed just above, along with the tutorial notebooks found in code-> notebooks with brief descriptions in README and more detail inline of each notebook. Again, PRs are more than welcome :)

Paper abstract

We reveal critical insights into bias problems in state-of-the-art facial recognition (FR) systems using a novel Balanced Faces In the Wild (BFW) dataset: data balanced for gender and ethnic groups. We show variations in the optimal scoring threshold for face pairs across different subgroups. Thus, the conventional approach of learning a global threshold for all pairs results in performance gaps between subgroups. By learning subgroup-specific thresholds, we reduce performance gaps and show a notable boost in overall performance. Furthermore, we do a human evaluation to measure human bias, which supports the hypothesis that an analogous bias exists in human perception. For the BFW database, source code, and more, visit https://github.com/visionjo/facerec-bias-bfw.

To Do

  • Begin Template
  • Create demo notebooks
  • Add manuscript
  • Documentation (sphinx)
  • Update README (this)
  • Pre-commit, formatter (Black) and .gitignore
  • Complete test harness
  • Modulate (refactor) code
  • Complete datatable (i.e., extend pandas.DataFrame)
  • Add scripts and CLI

License

All source code is made available under a BSD 3-clause license. You can freely use and modify the code without warranty, so long as you provide attribution to the authors. See LICENSE.md (LICENSE) for the full license text.

The manuscript text is not open source. The authors reserve the rights to the article content, which is currently submitted for publication in the 2020 IEEE Conference on AMFG.

Acknowledgement

We would like to thank the PINGA organization on Github for the project template used to structure this project.

Comments
  • About the MTCNN face detections and preprocessing

    About the MTCNN face detections and preprocessing

    Hi,

    It would be great if you could clarify a few questions regarding this dataset please.

    1. Is it possible for you to provide the MTCNN output face detections (bounding boxes and facial landmarks) for the face samples in BFW?

    2. Am I right in assuming MTCNN takes as input the images in "face-samples" folder of the dataset? If yes, what settings do we use with MTCNN in order for us to detect a face correctly on all the facial images provided in face-samples? If not, can you help us reproduce your face detection results by providing us with the original images on which the MTCNN was run to obtain the results in face-samples?

    3. Are the images in facial-samples actually crops which are aligned?

    Thanks in advance for your help.

    opened by manisoftwartist 2
  • Create a wrapper function to unify pipeline that produces the 3 figures (detailed below) from embedding data

    Create a wrapper function to unify pipeline that produces the 3 figures (detailed below) from embedding data

    3 Figures based on the paper "Face Recognition: Too Bias, or Not Too Bias" are

    1. DET curves: FPR versus FNR by moving threshold
    2. Score distributions for genuine and imposter using violin plots
    3. Confusion matrix for Rank 1 and any Rank.
    opened by suchanv 2
  • Devel

    Devel

    Develop branch-- prepare for next version release.

    Aim for the following for version 0.1.1:

    • [ ] Notebook is updated to use interface recently modulated (#21)
    • [ ] Update Documentation to explain steps to run (#21)
      • [ ] add to README in root
      • [x] move results from README in root to the README in results/
      • [x] move data section from README in root to the README in data/
        • [x] save curve data along with PDF (i.e., in results/)
      • [ ] Add simple (brief) docstring where missing)
      • [ ] A sample (toy) set is run end-to-end (demonstrate in README)
        • [ ] if small enough, add to repo (i.e., < 40 MB or so)
    • [ ] Finish script to generate Tar @ Far table
    • [ ] Improve annotation in notebooks; more description, i.e., tutorial-like.
    • [ ] create pdf versions of notebooks and add to project in notebooks/pdfs (or create nbviewer and point to it)
    • [ ] add assertions (and tests) where appropriate-- at least critical cases, such a specific type is expected.
    • [ ] Consider moving some of the analysis functions to visualizations.
      • [ ] modulate the handling of plt.axes objects
      • [ ] add optional input arguments for the title and other figure cosmetics or settings
    • [x] Add benchmarks for sphereface features. Make these the results showcased throughout.
    documentation enhancement Benchmark Project-level 
    opened by visionjo 1
  • Questions on verification_RFW and training procedure

    Questions on verification_RFW and training procedure

    Hi, Thanks for your great work and sharing of the code on these two papers ! It takes me days to read the paper and go through the repository and I have a few questions:

    (2) Do you have the code for training the features (asian_females, asian_males, black_females, black_males, indian_females, indian_males,...). Since I have a hard time finding something like train.py (e.g. the loss function and training process). (I suppose the released code is mainly on image pre-processing and result analysis) (Since BFW dataset is not as large as other face dataset and it may possible for me to train it from scratch on one GPU)

    (3) I am little confused about how the BFW is used in two papers, as I understand:

    in paper Face Recognition: Too Bias, or Not Too Bias? , the train and test model are as follows: train: CASIA_webface trained using Sphereface loss test: LFW where does BFW dataset not used in training in this set of experiments?

    in paper Balancing Biases and Preserving Privacy on Balanced Faces in the Wild the train, test model are as follows: tain: (1) MS1M trained using Arcface loss --> to get 512-dim embedding (f_in in Fig.6) (2) BFW dataset is used to train the encoder and two classifiers in Fig 6 test: 4-folds used for training and 1-fold used for testing (using the best threshold chosen)

    is that right?

    (4) There are some difference from "bfw-v0.1.5-datatable.csv" and the TABLE-2 in paper 2: for example: there are 921379 records in TABLE-2 while ther are 923898 records from the csv file? and there is no "{dir_meta}thresholds.pkl" file.

    Thanks for your time and any help would be appreciated !

    opened by lizhenstat 7
  • Regarding face identification

    Regarding face identification

    Hey,

    Thanks for the awesome work!

    I wanted to know how I can modify the repo to use for face identification task instead of verification.

    Any help would be highly appreciated.

    opened by shivmgg 1
  • Sphinx documentation

    Sphinx documentation

    Setup the project for sphinx.

    Include clear instruction on how to maintain (i.e., once in place, we'll include as part of the build process (see in docs/)

    Setup for tutorials on the different concepts and experiments done as part of this line of work (i.e., facial bias and BFW database)

    documentation enhancement 
    opened by visionjo 0
  • Create plan for Dash interface

    Create plan for Dash interface

    Project plan (lead: Dylan; support: Rohan):

    • [ ] what features to include
    • [ ] Specifications
    • [ ] Interface layout (use lucidchart or equivalent)
    • [ ] Division of tasks and proposed timeline
    Plan and design Project-level 
    opened by visionjo 0
Releases(v0.0.3)
Owner
Joseph P. Robinson
Ph.D., Northeastern, 2020. Focus: applied machine learning, mostly vision. At Vicarious Surgical's ASDAI group, an AI Engineer working on our surgical robot.
Joseph P. Robinson
AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning (NeurIPS 2020) Introduction AdaShare is a novel and differentiable approach fo

94 Dec 22, 2022
Prometheus Exporter for data scraped from datenplattform.darmstadt.de

darmstadt-opendata-exporter Scrapes data from https://datenplattform.darmstadt.de and presents it in the Prometheus Exposition format. Pull requests w

Martin Weinelt 2 Apr 12, 2022
The implementation of the paper "HIST: A Graph-based Framework for Stock Trend Forecasting via Mining Concept-Oriented Shared Information".

The HIST framework for stock trend forecasting The implementation of the paper "HIST: A Graph-based Framework for Stock Trend Forecasting via Mining C

Wentao Xu 110 Dec 27, 2022
Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

Qing-Long Zhang 199 Jan 08, 2023
Semantic Image Synthesis with SPADE

Semantic Image Synthesis with SPADE New implementation available at imaginaire repository We have a reimplementation of the SPADE method that is more

NVIDIA Research Projects 7.3k Jan 07, 2023
Temporally Coherent GAN SIGGRAPH project.

TecoGAN This repository contains source code and materials for the TecoGAN project, i.e. code for a TEmporally COherent GAN for video super-resolution

Duc Linh Nguyen 2 Jan 18, 2022
Gapmm2: gapped alignment using minimap2 (align transcripts to genome)

gapmm2: gapped alignment using minimap2 This tool is a wrapper for minimap2 to r

Jon Palmer 2 Jan 27, 2022
Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers.

Customer-Transaction-Analysis - This analysis is based on a synthesised transaction dataset containing 3 months worth of transactions for 100 hypothetical customers. It contains purchases, recurring

Ayodeji Yekeen 1 Jan 01, 2022
Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

Punctuation Restoration using Transformer Models This repository contins official implementation of the paper Punctuation Restoration using Transforme

Tanvirul Alam 142 Jan 01, 2023
Companion code for the paper "An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence" (NeurIPS 2021)

ReLU-GP Residual (RGPR) This repository contains code for reproducing the following NeurIPS 2021 paper: @inproceedings{kristiadi2021infinite, title=

Agustinus Kristiadi 4 Dec 26, 2021
Multi Camera Calibration

Multi Camera Calibration 'modules/camera_calibration/app/camera_calibration.cpp' is for calculating extrinsic parameter of each individual cameras. 'm

7 Dec 01, 2022
dualPC.R contains the R code for the main functions.

dualPC.R contains the R code for the main functions. dualPC_sim.R contains an example run with the different PC versions; it calls dualPC_algs.R whic

3 May 30, 2022
Official implementation of the paper "AAVAE: Augmentation-AugmentedVariational Autoencoders"

AAVAE Official implementation of the paper "AAVAE: Augmentation-AugmentedVariational Autoencoders" Abstract Recent methods for self-supervised learnin

Grid AI Labs 48 Dec 12, 2022
Global-Local Context Network for Person Search

Global-Local Context Network for Person Search Abstract: Person search aims to jointly localize and identify a query person from natural, uncropped im

Peng Zheng 15 Oct 17, 2022
tf2onnx - Convert TensorFlow, Keras and Tflite models to ONNX.

tf2onnx converts TensorFlow (tf-1.x or tf-2.x), tf.keras and tflite models to ONNX via command line or python api.

Open Neural Network Exchange 1.8k Jan 08, 2023
Code associated with the paper "Towards Understanding the Data Dependency of Mixup-style Training".

Mixup-Data-Dependency Code associated with the paper "Towards Understanding the Data Dependency of Mixup-style Training". Running Alternating Line Exp

Muthu Chidambaram 0 Nov 11, 2021
CVPRW 2021: How to calibrate your event camera

E2Calib: How to Calibrate Your Event Camera This repository contains code that implements video reconstruction from event data for calibration as desc

Robotics and Perception Group 104 Nov 16, 2022
Code and Data for the paper: Molecular Contrastive Learning with Chemical Element Knowledge Graph [AAAI 2022]

Knowledge-enhanced Contrastive Learning (KCL) Molecular Contrastive Learning with Chemical Element Knowledge Graph [ AAAI 2022 ]. We construct a Chemi

Fangyin 58 Dec 26, 2022
BTC-Generator - BTC Generator With Python

Что такое BTC-Generator? Это генератор чеков всеми любимого @BTC_BANKER_BOT Для

DoomGod 3 Aug 24, 2022
Seeing if I can put together an interactive version of 3b1b's Manim in Streamlit

streamlit-manim Seeing if I can put together an interactive version of 3b1b's Manim in Streamlit Installation I had to install pango with sudo apt-get

Adrien Treuille 6 Aug 03, 2022