Fuzzy Overclustering (FOC)

Overview

Fuzzy Overclustering (FOC)

In real-world datasets, we need consistent annotations between annotators to give a certain ground-truth label. However, in many applications these consistent annotations can not be given due to issues like intra- and interobserver variability. We call these inconsistent label fuzzy. Our method Fuzzy Overclustering overclusters the data and can therefore handle these fuzzy labels better than Out-of-the-Box Semi-Supervised Methods.

More details are given in the accpeted full paper at https://doi.org/10.3390/s21196661 or in the preprint https://arxiv.org/abs/2012.01768

The main idea is illustrated below. The graphic and caption are taken from the original work.

main idea of paper

Illustration of fuzzy data and overclustering -- The grey dots represent unlabeled data and the colored dots labeled data from different classes. The dashed lines represent decision boundaries. For certain data, a clear separation of the different classes with one decision boundary is possible and both classes contain the same amount of data (top). For fuzzy data determining a decision boundary is difficult because of intermediate datapoints between the classes (middle). These fuzzy datapoints can often not be easily sorted into one consistent class between annotators. If you overcluster the data, you get smaller but more consistent substructures in the fuzzy data (bottom). The images illustrate possible examples for \certain data (cat & dog) and \fuzzy plankton data (trichodesmium puff and tuft). The center plankton image was considered to be trichodesmium puff or tuft by around half of the annotators each. The left and right plankton image were consistently annotated as their respective class.

Installation

We advise to use docker for the experiments. We recommend a python3 container with tesnorflow 1.14 preinstalled. Additionally the following commands need to be executed:

apt-get update
apt-get install -y libsm6 libxext6 libxrender-dev libgl1-mesa-glx

After this ensure that the requirements from requirements.txt are installed. The most important packages are keras, scipy and opencv.

Usage

The parameters are given in arguments.yaml with their description. Most of the parameters can be left at the default value. Especially the dataset, batch size and epoch related parameters are imported.

As a rule of thumb the following should be applied:

  • overcluster_k = 5-6 * the number of classes
  • batch_size = repetition * overcluster_k * 2-3

You need to define three directories for the execution with docker:

  • DATASET_ROOT, this folder contains a folder with the dataset name. This folder contains a trainand val folder. It needs a folder unlabeled if the parameter unlabeled_data is used. Each folder contains subfolder with the given classes.
  • LOG_ROOT, inside a subdiretory logs all experimental results will be stored with regard to the given IDs and a time stamp
  • SRC_ROOT root of the this project source code

The DOCKER_IMAGE is the above defined image.

You can visualize the results with tensorboard --logdir . from inside the log_dir

Example Usages

bash % test pipeline running docker run -it --rm -v :/data-ssd -v :/data1 -v :/src -w="/src" python main.py --IDs foc experiment_name not_use_mi --dataset [email protected] --unlabeled_data [email protected] --frozen_batch_size 130 --batch_size 130 --overcluster_k 60 --num_gpus 1 --normal_epoch 2 --frozen_epoch 1 % training FOC-Light docker run -it --rm -v :/data-ssd -v :/data1 -v :/home -w="/home" python main.py --experiment_identifiers foc experiment_name not_use_mi --dataset stl10 --frozen_batch_size 130 --batch_size 130 --overcluster_k 60 --num_gpus 1 % training FOC (no warmup) % needs multiple GPUs or very large ones (change num gpu to 1 in this case) docker run -it --rm -v :/data-ssd -v :/data1 -v :/home -w="/home" python main.py --experiment_identifiers foc experiment_name not_use_mi --dataset stl10 --frozen_batch_size 390 --batch_size 390 --overcluster_k 60 --num_gpus 3 --lambda_m 1 --sample_repetition 3 ">
% test container
docker run -it --rm -v 
               
                :/data-ssd -v 
                
                 :/data1   -v 
                 
                  :/src -w="/src" 
                  
                    bash


% test pipeline running
docker run -it --rm -v 
                   
                    :/data-ssd -v 
                    
                     :/data1 -v 
                     
                      :/src -w="/src" 
                      
                        python main.py --IDs foc experiment_name not_use_mi --dataset [email protected] --unlabeled_data [email protected] --frozen_batch_size 130 --batch_size 130 --overcluster_k 60 --num_gpus 1 --normal_epoch 2 --frozen_epoch 1 % training FOC-Light docker run -it --rm -v 
                       
                        :/data-ssd -v 
                        
                         :/data1 -v 
                         
                          :/home -w="/home" 
                          
                            python main.py --experiment_identifiers foc experiment_name not_use_mi --dataset stl10 --frozen_batch_size 130 --batch_size 130 --overcluster_k 60 --num_gpus 1 % training FOC (no warmup) % needs multiple GPUs or very large ones (change num gpu to 1 in this case) docker run -it --rm -v 
                           
                            :/data-ssd -v 
                            
                             :/data1 -v 
                             
                              :/home -w="/home" 
                              
                                python main.py --experiment_identifiers foc experiment_name not_use_mi --dataset stl10 --frozen_batch_size 390 --batch_size 390 --overcluster_k 60 --num_gpus 3 --lambda_m 1 --sample_repetition 3 
                              
                             
                            
                           
                          
                         
                        
                       
                      
                     
                    
                   
                  
                 
                
               
ReAct: Out-of-distribution Detection With Rectified Activations

ReAct: Out-of-distribution Detection With Rectified Activations This is the source code for paper ReAct: Out-of-distribution Detection With Rectified

38 Dec 05, 2022
High-fidelity 3D Model Compression based on Key Spheres

High-fidelity 3D Model Compression based on Key Spheres This repository contains the implementation of the paper: Yuanzhan Li, Yuqi Liu, Yujie Lu, Siy

5 Oct 11, 2022
Reinforcement Learning for Portfolio Management

qtrader Reinforcement Learning for Portfolio Management Why Reinforcement Learning? Learns the optimal action, rather than models the market. Adaptive

Angelos Filos 406 Jan 01, 2023
Simple reference implementation of GraphSAGE.

Reference PyTorch GraphSAGE Implementation Author: William L. Hamilton Basic reference PyTorch implementation of GraphSAGE. This reference implementat

William L Hamilton 861 Jan 06, 2023
The Official Implementation of Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose [NIPS 2021].

Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose Release Notes The offical PyTorch implementation of Neural View Sy

Angtian Wang 20 Oct 09, 2022
Jetson Nano-based smart camera system that measures crowd face mask usage in real-time.

MaskCam MaskCam is a prototype reference design for a Jetson Nano-based smart camera system that measures crowd face mask usage in real-time, with all

BDTI 212 Dec 29, 2022
Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery (ICCV 2021 Oral) Run this model on Replicate Optimization: Global directions: Mapper: Check ou

3.3k Jan 05, 2023
Fuse radar and camera for detection

SAF-FCOS: Spatial Attention Fusion for Obstacle Detection using MmWave Radar and Vision Sensor This project hosts the code for implementing the SAF-FC

ChangShuo 18 Jan 01, 2023
Weakly-supervised object detection.

Wetectron Wetectron is a software system that implements state-of-the-art weakly-supervised object detection algorithms. Project CVPR'20, ECCV'20 | Pa

NVIDIA Research Projects 342 Jan 05, 2023
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

NExT-QA We reproduce some SOTA VideoQA methods to provide benchmark results for our NExT-QA dataset accepted to CVPR2021 (with 1 'Strong Accept' and 2

Junbin Xiao 50 Nov 24, 2022
Lecture materials for Cornell CS5785 Applied Machine Learning (Fall 2021)

Applied Machine Learning (Cornell CS5785, Fall 2021) This repo contains executable course notes and slides for the Applied ML course at Cornell and Co

Volodymyr Kuleshov 103 Dec 31, 2022
Code base for NeurIPS 2021 publication titled Kernel Functional Optimisation (KFO)

KernelFunctionalOptimisation Code base for NeurIPS 2021 publication titled Kernel Functional Optimisation (KFO) We have conducted all our experiments

2 Jun 29, 2022
Pytorch based library to rank predicted bounding boxes using text/image user's prompts.

pytorch_clip_bbox: Implementation of the CLIP guided bbox ranking for Object Detection. Pytorch based library to rank predicted bounding boxes using t

Sergei Belousov 50 Nov 27, 2022
Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals.

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals This repo contains the Pytorch implementation of our paper: Unsupervised Seman

Wouter Van Gansbeke 335 Dec 28, 2022
working repo for my xumx-sliCQ submissions to the ISMIR 2021 MDX

Music Demixing Challenge - xumx-sliCQ This repository is the GitHub mirror of my working submission repository for the AICrowd ISMIR 2021 Music Demixi

4 Aug 25, 2021
Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021]

Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021] Paper: https://arxiv.org/abs/2104.11208 Introduction Despite the significa

76 Dec 07, 2022
Pytorch implementation of SELF-ATTENTIVE VAD, ICASSP 2021

SELF-ATTENTIVE VAD: CONTEXT-AWARE DETECTION OF VOICE FROM NOISE (ICASSP 2021) Pytorch implementation of SELF-ATTENTIVE VAD | Paper | Dataset Yong Rae

97 Dec 23, 2022
A framework for annotating 3D meshes using the predictions of a 2D semantic segmentation model.

Semantic Meshes A framework for annotating 3D meshes using the predictions of a 2D semantic segmentation model. Paper If you find this framework usefu

Florian 40 Dec 09, 2022
Code for the paper: Sketch Your Own GAN

Sketch Your Own GAN Project | Paper | Youtube | Slides Our method takes in one or a few hand-drawn sketches and customizes an off-the-shelf GAN to mat

677 Dec 28, 2022
Keras like implementation of Deep Learning architectures from scratch using numpy.

Mini-Keras Keras like implementation of Deep Learning architectures from scratch using numpy. How to contribute? The project contains implementations

MANU S PILLAI 5 Oct 10, 2021