Data Competition: automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly

Overview

Table of contents

  1. Introduction
  2. Dataset
  3. Model & Metrics
  4. How to Run

DATA COMPETITION

The COVID-19 pandemic, which is caused by the SARS-CoV-2 virus, is still continuing strong, infecting hundreds of millions of people and killing millions. Face masks reduce transmission by preventing aerosols and droplets from spreading too far into the atmosphere. As a result, there is a growing demand for automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly. This competition was designed in order to solve the problem mentioned above. This competition is unlike any other that has come before it. With a fixed model, participants will receive model code and configuration code that organizers use to train models. The candidate's task is to use data processing and generation techniques to improve the model's performance, then submit the dataset to the organizing team for training and evaluation on the private test set. The winner is the team with the highest score on the private test set.

Dataset

  • A dataset of 1100 images will be sent to you. This is an object detection dataset consisting of employee images at the office. The dataset has been assigned 3 labels by us which are no mask, mask, and incorrect mask, with the numbers 0,1,2 corresponding to each.

  • The dataset has been divided into three parts for you: train, valid, and public test. We have prepared a private test to be able to evaluate the candidate's model. This private test will be made public after the contest ends. In the public test, you can get a basic idea of the private test. Download the dataset here

  • To improve the model's performance, you can re-label it and employ data augmentation to generate more images (up to 3000).

The number of each label in each part is shown below:

No mask Mask incorrect mask
Train 308 882 51
Val 97 190 9
Public_test 47 95 13

Model & Metrics

  • The challenge is defined as object detection challenge. In the competition, We use YOLOv5s and also use a pre-trained model trained with easy mask dataset to greatly reduce training time.

  • We fix all hyperparameters of the model and do not use any augmentation tips in the source code. Therefore, each participant need to build the best possible dataset by relabeling incorrect labels, splitting train/val, augmentation tips, adding new dataset, etc.

  • In training process, Early Stopping method with patience setten to 100 iterations is used to keep track of validation set's [email protected]. Detail about [email protected] metric:

[email protected] = [email protected] = 0.2 * AP50_w + 0.3 * AP50_nw + 0.5 * AP50_wi

Where,
AP50_w: AP50 on valid mask boxes
AP50_nw: AP50 on non-mask boxes
AP50_wi: AP50 on invalid mask boxes

  • The [email protected] metric is also used as the main metric to evaluate participant's submission on private testing set.

How to Run

QuickStart

Click the image below

Open In Colab

Install requirements

  • All requirements are included in requirements.txt

  • Run the script below to clone and install all requirements

git clone https://github.com/fsoft-ailab/Data-Competition
cd Data-Competition
pip3 install -r requirements.txt

Training

  • Put your dataset into the Data-Competition folder. The structure of dataset folder is followed as folder structure below:
folder-name
├── images
│   ├── train
│   │   ├── train_img1.jpg
│   │   ├── train_img2.jpg
│   │   └── ...
│   │   
│   └── val
│       ├── val_img1.jpg
│       ├── val_img2.jpg
│       └── ...
│   
└── labels
    ├── train
    │   ├── train_img1.txt
    │   ├── train_img2.txt
    │   └── ...
    │   
    └── val
        ├── val_img1.txt
        ├── val_img2.txt
        └── ...
  • Change relative paths to train and val images folder in config/data_cfg.yaml file

  • train_cfg.yaml where we set up the model during training. You should not change such hyperparameters because it will result in incorrect results. The training results are saved in the results/train/ .

  • Run the script below to train the model. Specify particular name to identify your experiment:

python3 train.py --batch-size 64 --device 0 --name 
    

   

Note: If you get out of memory error, you can decrease batch-size to multiple of 2 as 32, 16.

Evaluation

  • Run script below to evaluate on particular dataset.
  • The --task's value is only one of train, val, or test, respectively evaluating on the training set, validation set, or public testing set.
  • Note: Specify relative path to images folder which you evaluate in config/data_cfg.yaml file.
python3 val.py --weights 
   
     --task test --name 
    
      --batch-size 64 --device 0
                                                 val
                                                 train

    
   
  • Results are saved at results/evaluate/ / .

Detection

  • You can use this script to make inferences on particular folder

  • Results are saved at .

python3 detect.py --weights 
   
     --source 
    
      --dir 
     
       --device 0

     
    
   
  • You can find more default arguments at detect.py

References

Owner
Thanh Dat Vu
Thanh Dat Vu
BinTuner is a cost-efficient auto-tuning framework, which can deliver a near-optimal binary code that reveals much more differences than -Ox settings.

BinTuner is a cost-efficient auto-tuning framework, which can deliver a near-optimal binary code that reveals much more differences than -Ox settings. it also can assist the binary code analysis rese

BinTuner 42 Dec 16, 2022
TheMachineScraper 🐱‍👤 is an Information Grabber built for Machine Analysis

TheMachineScraper 🐱‍👤 is a tool made purely for analysing machine data for any reason.

doop 5 Dec 01, 2022
Finding project directories in Python (data science) projects, just like there R rprojroot and here packages

Find relative paths from a project root directory Finding project directories in Python (data science) projects, just like there R here and rprojroot

Daniel Chen 102 Nov 16, 2022
WaveFake: A Data Set to Facilitate Audio DeepFake Detection

WaveFake: A Data Set to Facilitate Audio DeepFake Detection This is the code repository for our NeurIPS 2021 (Track on Datasets and Benchmarks) paper

Chair for Sys­tems Se­cu­ri­ty 27 Dec 22, 2022
A Python package for the mathematical modeling of infectious diseases via compartmental models

A Python package for the mathematical modeling of infectious diseases via compartmental models. Originally designed for epidemiologists, epispot can be adapted for almost any type of modeling scenari

epispot 12 Dec 28, 2022
Techdegree Data Analysis Project 2

Basketball Team Stats Tool In this project you will be writing a program that reads from the "constants" data (PLAYERS and TEAMS) in constants.py. Thi

2 Oct 23, 2021
BAyesian Model-Building Interface (Bambi) in Python.

Bambi BAyesian Model-Building Interface in Python Overview Bambi is a high-level Bayesian model-building interface written in Python. It's built on to

861 Dec 29, 2022
This is a repo documenting the best practices in PySpark.

Spark-Syntax This is a public repo documenting all of the "best practices" of writing PySpark code from what I have learnt from working with PySpark f

Eric Xiao 447 Dec 25, 2022
Statistical package in Python based on Pandas

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F

Raphael Vallat 1.2k Dec 31, 2022
nrgpy is the Python package for processing NRG Data Files

nrgpy nrgpy is the Python package for processing NRG Data Files Website and source: https://github.com/nrgpy/nrgpy Documentation: https://nrgpy.github

NRG Tech Services 23 Dec 08, 2022
Spectral Analysis in Python

SPECTRUM : Spectral Analysis in Python contributions: Please join https://github.com/cokelaer/spectrum contributors: https://github.com/cokelaer/spect

Thomas Cokelaer 280 Dec 16, 2022
bigdata_analyse 大数据分析项目

bigdata_analyse 大数据分析项目 wish 采用不同的技术栈,通过对不同行业的数据集进行分析,期望达到以下目标: 了解不同领域的业务分析指标 深化数据处理、数据分析、数据可视化能力 增加大数据批处理、流处理的实践经验 增加数据挖掘的实践经验

Way 2.4k Dec 30, 2022
Calculate multilateral price indices in Python (with Pandas and PySpark).

IndexNumCalc Calculate multilateral price indices using the GEKS-T (CCDI), Time Product Dummy (TPD), Time Dummy Hedonic (TDH), Geary-Khamis (GK) metho

Dr. Usman Kayani 3 Apr 27, 2022
Collections of pydantic models

pydantic-collections The pydantic-collections package provides BaseCollectionModel class that allows you to manipulate collections of pydantic models

Roman Snegirev 20 Dec 26, 2022
DataPrep — The easiest way to prepare data in Python

DataPrep — The easiest way to prepare data in Python

SFU Database Group 1.5k Dec 27, 2022
A data structure that extends pyspark.sql.DataFrame with metadata information.

MetaFrame A data structure that extends pyspark.sql.DataFrame with metadata info

Invent Analytics 8 Feb 15, 2022
Integrate bus data from a variety of sources (batch processing and real time processing).

Purpose: This is integrate bus data from a variety of sources such as: csv, json api, sensor data ... into Relational Database (batch processing and r

1 Nov 25, 2021
PipeChain is a utility library for creating functional pipelines.

PipeChain Motivation PipeChain is a utility library for creating functional pipelines. Let's start with a motivating example. We have a list of Austra

Michael Milton 2 Aug 07, 2022
Kennedy Institute of Rheumatology University of Oxford Project November 2019

TradingBot6M Kennedy Institute of Rheumatology University of Oxford Project November 2019 Run Change api.txt to binance api key: https://www.binance.c

Kannan SAR 2 Nov 16, 2021
Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit

pgmpy 2.2k Dec 25, 2022