The aim of the game, as in the original one, is to find a specific image from a group of different images of a person's face

Overview

GUESS WHO

Main Links: [Github] [App]

Related Links: [CLIP] [Celeba]

The aim of the game, as in the original one, is to find a specific image from a group of different images of a person's face. To discover the image, the player must ask questions that can be answered with a binary response, such as "Yes and No". After every question made by the player, the images that don't share the same answer that the winning one are discarded automatically. The answer to the player's questions, and thus, the process of discarding the images will be established by CLIP. When all the images but one have been discarded, the game is over.

The "Guess Who?" game has a handicap when it uses real images, because it is necessary to always ensure that the same criteria are applied when the images are discarded. The original game uses images with characters that present simple and limited features like a short set of different types of hair colors, what makes it very easy to answer true or false when a user asks for a specific hair color. However, with real images it is possible to doubt about if a person is blond haired or brown haired, for example, and it is necessary to apply a method which ensures that the winning image is not discarded by mistake. To solve this problem, CLIP is used to discard the images that do not coincide with the winner image after each prompt. In this way, when the user asks a question, CLIP is used to classify the images in two groups: the set of images that continue because they have the same prediction than the winning image, and the discarded set that has the opposite prediction. The next figure shows the screen that is prompted after calling CLIP on each image in the game board, where the discarded images are highlighted in red and the others in green. CLIP

Select Images

The first step of the game is to select the images to play. The player can press a button to randomly change the used images, which are taken from the CelebA data set. This data set contains 202,599 face images of the size 178×218 from 10,177 celebrities, each annotated with 40 binary labels indicating facial attributes like hair color, gender and age. (see next figure). CLIP

Ask Questions

The game will allow the player to ask the questions in 4 different ways:

1. Default Question

This option consist on select a question from a list. A drop-down list allows the player to select the question to be asked from a group of pre-set questions, taken from the set of binary labels of the Celeba data set. Under the hood, each question is translated into a pair of textual prompts for the CLIP model to allow for the binary classification based on that question. When they are passed to CLIP along with an image, the model responds by giving a greater value to the prompt that is most related to the image. (see next figure). CLIP

2. Write your own prompt

This option is used to allow the player introducing a textual prompt for CLIP with his/her own words. The player text will be then confronted with the neutral prompt, "A picture of a person", and the pair of prompts will be passed to CLIP as in the previous case. (see next figure) CLIP

3. Write your own two prompts

In this case two text input are used to allow the player write two sentences. The player must use two opposite sentences, that is, with an opposite meaning. (see next figure). CLIP

4. Select a winner

This option does not use the CLIP model to make decisions, the player can simply choose one of the images as the winner and if the player hits the winning image, the game is over. (see next figure). CLIP

Punctuation

To motivate the players in finding the winning image with the minimum number of questions, a scoring system is established so that it begins with a certain number of points (100 in the example), and decreases with each asked question. The score is decreased by subtracting the number of remaining images after each question. Furthermore, there are two extra penalties. The first is applied when the player uses the option "Select a winner". This penalty depends on the number of remaining images, so that the fewer images are left, the bigger will be the penalty. Finally, the score is also decreased by two extra points if, after the player makes a question, no image can be discarded.

Acknowledgements

This work has been supported by the company Dimai S.L and next research projects: FightDIS (PID2020-117263GB-100), IBERIFIER (2020-EU-IA-0252:29374659), and the CIVIC project (BBVA Foundation Grants For Scientific Research Teams SARS-CoV-2 and COVID-19).

Owner
Arnau - DIMAI
Arnau - DIMAI
4st place solution for the PBVS 2022 Multi-modal Aerial View Object Classification Challenge - Track 1 (SAR) at PBVS2022

A Two-Stage Shake-Shake Network for Long-tailed Recognition of SAR Aerial View Objects 4st place solution for the PBVS 2022 Multi-modal Aerial View Ob

LinpengPan 5 Nov 09, 2022
Kaggle DSTL Satellite Imagery Feature Detection

Kaggle DSTL Satellite Imagery Feature Detection

Konstantin Lopuhin 206 Oct 29, 2022
Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

About subwAI subwAI - a project for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation

82 Jan 01, 2023
fastgradio is a python library to quickly build and share gradio interfaces of your trained fastai models.

fastgradio is a python library to quickly build and share gradio interfaces of your trained fastai models.

Ali Abdalla 34 Jan 05, 2023
GPU-accelerated Image Processing library using OpenCL

pyclesperanto pyclesperanto is a python package for clEsperanto - a multi-language framework for GPU-accelerated image processing. clEsperanto uses Op

17 Dec 25, 2022
RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

[Paper] [Хабр] [Model Card] [Colab] [Kaggle] RuDOLPH 🦌 🎄 ☃️ One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP Russian Diffusio

AI Forever 232 Jan 04, 2023
Benchmarking Pipeline for Prediction of Protein-Protein Interactions

B4PPI Benchmarking Pipeline for the Prediction of Protein-Protein Interactions How this benchmarking pipeline has been built, and how to use it, is de

Loïc Lannelongue 4 Jun 27, 2022
The code used for the free [email protected] Webinar series on Reinforcement Learning in Finance

Reinforcement Learning in Finance [email protected] Webinar This repository provides the code f

Yves Hilpisch 62 Dec 22, 2022
modelvshuman is a Python library to benchmark the gap between human and machine vision

modelvshuman is a Python library to benchmark the gap between human and machine vision. Using this library, both PyTorch and TensorFlow models can be evaluated on 17 out-of-distribution datasets with

Bethge Lab 244 Jan 03, 2023
Official implementation of the ICCV 2021 paper "Joint Inductive and Transductive Learning for Video Object Segmentation"

JOINT This is the official implementation of Joint Inductive and Transductive learning for Video Object Segmentation, to appear in ICCV 2021. @inproce

Yunyao 35 Oct 16, 2022
A custom DeepStack model for detecting 16 human actions.

DeepStack_ActionNET This repository provides a custom DeepStack model that has been trained and can be used for creating a new object detection API fo

MOSES OLAFENWA 16 Nov 11, 2022
Code for "Layered Neural Rendering for Retiming People in Video."

Layered Neural Rendering in PyTorch This repository contains training code for the examples in the SIGGRAPH Asia 2020 paper "Layered Neural Rendering

Google 154 Dec 16, 2022
PaRT: Parallel Learning for Robust and Transparent AI

PaRT: Parallel Learning for Robust and Transparent AI This repository contains the code for PaRT, an algorithm for training a base network on multiple

Mahsa 0 May 02, 2022
This is the official github repository of the Met dataset

The Met dataset This is the official github repository of the Met dataset. The official webpage of the dataset can be found here. What is it? This cod

Nikolaos-Antonios Ypsilantis 35 Dec 17, 2022
This repository is the official implementation of Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models

Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models Link to paper Abstract We study prediction of future out

Rickard Karlsson 2 Aug 19, 2022
A unified 3D Transformer Pipeline for visual synthesis

Overview This is the official repo for the paper: "NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion". NÜWA is a unified multimodal

Microsoft 2.6k Jan 03, 2023
OntoProtein: Protein Pretraining With Ontology Embedding

OntoProtein This is the implement of the paper "OntoProtein: Protein Pretraining With Ontology Embedding". OntoProtein is an effective method that mak

ZJUNLP 80 Dec 14, 2022
It's final year project of Diploma Engineering. This project is based on Computer Vision.

Face-Recognition-Based-Attendance-System It's final year project of Diploma Engineering. This project is based on Computer Vision. Brief idea about ou

Neel 10 Nov 02, 2022
adversarial_multi_armed_bandit_variable_plays

Adversarial Multi-Armed Bandit with Variable Plays This code is for paper: Adversarial Online Learning with Variable Plays in the Evasion-and-Pursuit

Yiyang Wang 1 Oct 28, 2021
MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images

MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images This repository contains the implementation of our paper MetaAvatar: Learni

sfwang 96 Dec 13, 2022