Towards End-to-end Video-based Eye Tracking

Related tags

Deep LearningEVE
Overview

Towards End-to-end Video-based Eye Tracking

The code accompanying our ECCV 2020 publication and dataset, EVE.

Setup

Preferably, setup a Docker image or virtual environment (virtualenvwrapper is recommended) for this repository. Please note that we have tested this code-base in the following environments:

  • Ubuntu 18.04 / A Linux-based cluster system (CentOS 7.8)
  • Python 3.6 / Python 3.7
  • PyTorch 1.5.1

Clone this repository somewhere with:

git clone [email protected]:swook/EVE
cd EVE/

Then from the base directory of this repository, install all dependencies with:

pip install -r requirements.txt

Please note the PyTorch official installation guide for setting up the torch and torchvision packages on your specific system.

You will also need to setup ffmpeg for video decoding. On Linux, we recommend installing distribution-specific packages (usually named ffmpeg). If necessary, check out the official download page or compilation instructions.

Usage

Information on the code framework

Configuration file system

All available configuration parameters are defined in src/core/config_default.py.

In order to override the default values, one can do:

  1. Pass the parameter via a command-line parameter to train.py or inference.py. Note that in this case, replace all _ characters with -. E.g. the config. parameter refine_net_enabled becomes --refine-net-enabled 1. Note that boolean parameters can be passed in via either 0/no/false or 1/yes/true.
  2. Create a JSON file such as src/configs/eye_net.json or src/configs/refine_net.json.

The order of application are:

  1. Default parameters
  2. JSON-provided parameters, in order of JSON file declaration. For instance, in the command python train.py config1.json config2.json, config2.json overrides config1.json entries should there be any overlap.
  3. CLI-provided parameters.

Automatic logging to Google Sheets

This framework implements an automatic logging code of all parameters, loss terms, and metrics to a Google Sheets document. This is done by the gspread library. To enable this possibility, follow these instructions:

  1. Follow the instructions at https://gspread.readthedocs.io/en/latest/oauth2.html#for-end-users-using-oauth-client-id
  2. Set --gsheet-secrets-json-file to a path to the credentials JSON file, and set --gsheet-workbook-key to the document key. This key is the part after https://docs.google.com/spreadsheets/d/ and before any query or hash parameters.

An example config JSON file can be found at src/configs/sample_gsheet.json.

Training a model

To train a model, simply run python train.py from src/ with the appropriate configuration changes that are desired (see "Configuration file system" above).

Note, that in order to resume the training of an existing model you must provide the path to the output folder via the --resume-from argument.

Also, at every fresh run of train.py, a unique identifier is generated to produce a unique output folder in outputs/EVE/. Hence, it is recommended to use the Google Sheets logging feature (see "Automatic logging to Google Sheets") to keep track of your models.

Running inference

The single-sample inference script at src/inference.py takes in the same arguments as train.py but expects two arguments in particular:

  • --input-path is the path to a basler.mp4 or webcam_l.mp4 or webcam_c.mp4 or webcam_r.mp4 that exists in the EVE dataset.
  • --output-path is a path to a desired output location (ending in .mp4).

This script works for both training, validation, and test samples and shows the reference point-of-gaze ground-truth when available.

Citation

If using this code-base and/or the EVE dataset in your research, please cite the following publication:

@inproceedings{Park2020ECCV,
  author    = {Seonwook Park and Emre Aksan and Xucong Zhang and Otmar Hilliges},
  title     = {Towards End-to-end Video-based Eye-Tracking},
  year      = {2020},
  booktitle = {European Conference on Computer Vision (ECCV)}
}

Q&A

Q: How do I use this code for screen-based eye tracking?

A: This code does not offer actual eye tracking. Rather, it concerns the benchmarking of the video-based gaze estimation methods outlined in the original paper. Extending this code to support an easy-to-use software for screen-based eye tracking is somewhat non-trivial, due to requirements on camera calibration (intrinsics, extrinsics), and an efficient pipeline for accurate and stable real-time eye or face patch extraction. Thus, we consider this to be beyond the scope of this code repository.

Q: Where are the test set labels?

A: Our public evaluation server and leaderboard are hosted by Codalab at https://competitions.codalab.org/competitions/28954. This allows for evaluations on our test set to be consistent and reliable, and encourage competition in the field of video-based gaze estimation. Please note that the performance reported by Codalab is not strictly speaking comparable to the original paper's results, as we only perform evaluation on a large subset of the full test set. We recommend acquiring the updated performance figures from the leaderboard.

Comments
  • use against new dataset

    use against new dataset

    Hi,

    Can this code be used at inference time against in-the-wild mp4 that do not necessarily provide an accompanying H5? The more I work with this codebase, the more it looks obvious that w/o the mp4 being TOBII generated, this will not work. Is this true?

    thank you

    opened by inisar 0
  • File name parser

    File name parser

    File name parser can be made more robust to your own dataset files.
    Currently doesn't work for both webcam_l.mp4 and webcam_l_eyes.mp4 Please see below for filename and correction I made to make it work. src/core/inference.py try: camera_type = components[-1][:-4] except AssertionError: camera_type = camera_type[:-5]

    opened by inisar 0
  • How to synchronize the data from camera and eye tracker?

    How to synchronize the data from camera and eye tracker?

    Hi, @swook . I use OpenCV to capture the frames, what borthers me is that I don't know how to attach a timestamp to each frame and ensure the interval of each timestamp nearly the same. By using the datetime.time(), I can get the current time and regard it as the timestamp, but the interval between each of the timestamps seems to be different and has a big gap. So could you share me some details about your method which is used to synchronize the data?Or It would be very nice if you can share the source code or your method with me. Thanks.

    opened by Kihensarn 0
  • How to get the 3D gaze origin

    How to get the 3D gaze origin

    Hi, @swook Thanks for your great job, but I have a question about how to get the 3D gaze origin(determined during data pre-processing). The paper said "In pre-processing the EVEdataset, we apply a 3DMM fitting approach with interocular-distance-based scale-normalization to alleviate these issues" . However, I'm not sure about the specific process of this step. What should I do if I want to convert from landmark to 3D gaze origin? Besides, if it is possible to open some code of this part? Thanks a lot!

    opened by TeresaKumo 0
  • About the result

    About the result

    I trained the eve model with eve data, ran eval_codalab.py and got pkl file as a result. I also ran eval_codalabl.py and got pkl file from the pretrained model weights(from https://github.com/swook/EVE/releases/tag/v0.0 - eve_refinenet_CGRU_oa_skip.pt) Then, I compared these two results and the numbers seem to match. For example, from the pretrained model, I got [960. 540.] for PoG_px_final and got [963.0835 650.5635] for my model.

    However, in the eve paper, table3 shows that the PoG_px in GRU model with oa+skip is 95.59 Numbers in paper is 1/10 of the numbers i got from eval_codalab and not sure what went wrong. Are they supposed to match? If they are not supposed to match, how do you calculate the numbers?

    Also, in the result page of codalab, the gaze direction(angular error) is shown, but the eval_codalab.py doesn't store gaze direction. (Keys_to_store=['left pupil size' , 'right pupil', 'pog__px_initial', 'pog_px_final', 'timestamp']) How should I get gaze direction error in degree?

    opened by chaeyoun 1
Owner
Seonwook Park
Seonwook Park
A font family with a great monospaced variant for programmers.

Fantasque Sans Mono A programming font, designed with functionality in mind, and with some wibbly-wobbly handwriting-like fuzziness that makes it unas

Jany Belluz 6.3k Jan 08, 2023
Catalyst.Detection

Accelerated DL R&D PyTorch framework for Deep Learning research and development. It was developed with a focus on reproducibility, fast experimentatio

Catalyst-Team 12 Oct 25, 2021
Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Official implementation of GOCor This is the official implementation of our paper : GOCor: Bringing Globally Optimized Correspondence Volumes into You

Prune Truong 71 Nov 18, 2022
Personal thermal comfort models using digital twins: Preference prediction with BIM-extracted spatial-temporal proximity data from Build2Vec

Personal thermal comfort models using digital twins: Preference prediction with BIM-extracted spatial-temporal proximity data from Build2Vec This repo

Building and Urban Data Science (BUDS) Group 5 Dec 02, 2022
Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021)

Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021) Tensorflow implementation of Bridging the Gap between Label- and Reference-ba

huangqiusheng 8 Jul 13, 2022
A Web API for automatic background removal using Deep Learning. App is made using Flask and deployed on Heroku.

Automatic_Background_Remover A Web API for automatic background removal using Deep Learning. App is made using Flask and deployed on Heroku. 👉 https:

Gaurav 16 Oct 29, 2022
Code for NAACL 2021 full paper "Efficient Attentions for Long Document Summarization"

LongDocSum Code for NAACL 2021 paper "Efficient Attentions for Long Document Summarization" This repository contains data and models needed to reprodu

56 Jan 02, 2023
shufflev2-yolov5:lighter, faster and easier to deploy

shufflev2-yolov5: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size

pogg 1.5k Jan 05, 2023
Self-Supervised CNN-GCN Autoencoder

GCNDepth Self-Supervised CNN-GCN Autoencoder GCNDepth: Self-supervised monocular depth estimation based on graph convolutional network To be published

53 Dec 14, 2022
Research code for the paper "Variational Gibbs inference for statistical estimation from incomplete data".

Variational Gibbs inference (VGI) This repository contains the research code for Simkus, V., Rhodes, B., Gutmann, M. U., 2021. Variational Gibbs infer

Vaidotas Å imkus 1 Apr 08, 2022
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

195 Dec 07, 2022
A fast Evolution Strategy implementation in Python

Evostra: Evolution Strategy for Python Evolution Strategy (ES) is an optimization technique based on ideas of adaptation and evolution. You can learn

Mika 251 Dec 08, 2022
Toolbox to analyze temporal context invariance of deep neural networks

PyTCI A toolbox that estimates the integration window of a sensory response using the "Temporal Context Invariance" paradigm (TCI). The TCI method Int

4 Oct 23, 2022
Unofficial implementation of the ImageNet, CIFAR 10 and SVHN Augmentation Policies learned by AutoAugment using pillow

AutoAugment - Learning Augmentation Policies from Data Unofficial implementation of the ImageNet, CIFAR10 and SVHN Augmentation Policies learned by Au

Philip Popien 1.3k Jan 02, 2023
Boundary IoU API (Beta version)

Boundary IoU API (Beta version) Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov [arXiv] [Project] [BibTeX] This API is

Bowen Cheng 177 Dec 29, 2022
DeepFill v1/v2 with Contextual Attention and Gated Convolution, CVPR 2018, and ICCV 2019 Oral

Generative Image Inpainting An open source framework for generative image inpainting task, with the support of Contextual Attention (CVPR 2018) and Ga

2.9k Dec 16, 2022
CvT2DistilGPT2 is an encoder-to-decoder model that was developed for chest X-ray report generation.

CvT2DistilGPT2 Improving Chest X-Ray Report Generation by Leveraging Warm-Starting This repository houses the implementation of CvT2DistilGPT2 from [1

The Australian e-Health Research Centre 21 Dec 28, 2022
Pytorch Implementation for CVPR2018 Paper: Learning to Compare: Relation Network for Few-Shot Learning

LearningToCompare Pytorch Implementation for Paper: Learning to Compare: Relation Network for Few-Shot Learning Howto download mini-imagenet and make

Jackie Loong 246 Dec 19, 2022
A certifiable defense against adversarial examples by training neural networks to be provably robust

DiffAI v3 DiffAI is a system for training neural networks to be provably robust and for proving that they are robust. The system was developed for the

SRI Lab, ETH Zurich 202 Dec 13, 2022
Anime Face Detector using mmdet and mmpose

Anime Face Detector This is an anime face detector using mmdetection and mmpose. (To avoid copyright issues, I use generated images by the TADNE model

198 Jan 07, 2023