The Fundamental Clustering Problems Suite (FCPS) summaries 54 state-of-the-art clustering algorithms, common cluster challenges and estimations of the number of clusters as well as the testing for cluster tendency.

Overview

CRAN_Status_Badge DOI CRAN RStudio mirror downloads CRAN RStudio mirror downloads

FCPS

Fundamental Clustering Problems Suite

The package provides over sixty state-of-the-art clustering algorithms for unsupervised machine learning published in [Thrun and Stier 2021].

Table of contents

  1. Description
  2. Installation
  3. Tutorial Examples
  4. Manual
  5. Use cases
  6. Additional information
  7. References

Description

The Fundamental Clustering Problems Suite (FCPS) summaries over sixty state-of-the-art clustering algorithms available in R language. An important advantage is that the input and output of clustering algorithms is simplified and consistent in order to enable users a swift execution of cluster analysis. By combining mirrored-density plots (MD plots) with statistical testing FCPS provides a tool to investigate the cluster tendency quickly prior to the cluster analysis itself [Thrun 2020]. Common clustering challenges can be generated with arbitrary sample size [Thrun and Ultsch 2020a]. Additionally, FCPS sums 26 indicators with the goal to estimate the number of clusters up and provides an appropriate implementation of the clustering accuracy for more than two clusters [Thrun and Ultsch 2021]. A subset of methods was used in a benchmarking of algorithms published in [Thrun and Ultsch 2020b].

Installation

Installation using CRAN

Install automatically with all dependencies via

install.packages("FCPS",dependencies = T)

# Optionally, for the automatic installation
# of all suggested packages:
Suggested=c("kernlab", "cclust", "dbscan", "kohonen",
            "MCL", "ADPclust", "cluster", "DatabionicSwarm",
            "orclus", "subspace", "flexclust", "ABCanalysis",
            "apcluster", "pracma", "EMCluster", "pdfCluster", "parallelDist",
            "plotly", "ProjectionBasedClustering", "GeneralizedUmatrix",
            "mstknnclust", "densityClust", "parallel", "energy", "R.utils",
            "tclust", "Spectrum", "genie", "protoclust", "fastcluster", 
			"clusterability", "signal", "reshape2", "PPCI", "clustrd", "smacof",
			"rgl", "prclust", "dendextend",
            "moments", "prabclus", "VarSelLCM", "sparcl", "mixtools",
            "HDclassif", "clustvarsel", "knitr", "rmarkdown")

for(i in 1:length(Suggested)) {
  if (!requireNamespace(Suggested[i], quietly = TRUE)) {
    message(paste("Installing the package", Suggested[i]))
    install.packages(Suggested[i], dependencies = T)
  }
}

Installation using Github

Please note, that dependecies have to be installed manually.

remotes::install_github("Mthrun/FCPS")

Installation using R Studio

Please note, that dependecies have to be installed manually.

Tools -> Install Packages -> Repository (CRAN) -> FCPS

Tutorial Examples

The tutorial with several examples can be found on in the vignette on CRAN:

https://cran.r-project.org/web/packages/FCPS/vignettes/FCPS.html

Manual

The full manual for users or developers is available here: https://cran.r-project.org/web/packages/FCPS/FCPS.pdf

Use Cases

Cluster Analysis of High-dimensional Data

The package FCPS provides a clear and consistent access to state-of-the-art clustering algorithms:

library(FCPS)
data("Leukemia")
Data=Leukemia$Distance
Classification=Leukemia$Cls
ClusterNo=6
CA=ADPclustering(Leukemia$DistanceMatrix,ClusterNo)
Cls=ClusterRenameDescendingSize(CA$Cls)
ClusterPlotMDS(Data,Cls,main =Leukemia’,Plotter3D =plotly’)
ClusterAccuracy(Cls,Classification)
[1] 0.9963899

Generating Typical Challenges for Clustering Algorithms

Several clustering challenge can be generated with an arbitrary sample size:

set.seed(600)
library(FCPS)
DataList=ClusterChallenge("Chainlink", SampleSize = 750,
PlotIt=TRUE)
Data=DataList$Chainlink
Cls=DataList$Cls
> ClusterCount(Cls)
$CountPerCluster
$NumberOfClusters
$ClusterPercentages
[1] 377 373
[1] 2
[1] 50.26667 49.73333

Cluster-Tendency

For many applications, it is crucial to decide if a dataset possesses cluster structures:

library(FCPS)
set.seed(600)
DataList=ClusterChallenge("Chainlink",SampleSize = 750)
Data=DataList$Chainlink
Cls=DataList$Cls
library(ggplot2)
ClusterabilityMDplot(Data)+theme_bw()

Estimation of Number of Clusters

The “FCPS” package provides up to 26 indicators to determine the number of clusters:

library(FCPS)
set.seed(135)
DataList=ClusterChallenge("Chainlink",SampleSize = 900)
Data=DataList$Chainlink
Cls=DataList$Cls
Tree=HierarchicalClustering(Data,0,"SingleL")[[3]]
ClusterDendrogram(Tree,4,main="Single Linkage")
MaximumNumber=7
clsm <- matrix(data = 0, nrow = dim(Data)[1], ncol = MaximumNumber)
for (i in 2:(MaximumNumber+1)) {
clsm[,i-1] <- cutree(Tree,i)
}
out=ClusterNoEstimation(Data, ClsMatrix = clsm,
MaxClusterNo = MaximumNumber, PlotIt = TRUE)

Additional information

Authors website http://www.deepbionics.org/
License GPL-3
Dependencies R (>= 3.5.0)
Bug reports https://github.com/Mthrun/FCPS/issues

References

  1. [Thrun/Stier, 2021] Thrun, M. C., & Stier, Q.: Fundamental Clustering Algorithms Suite SoftwareX, Vol. 13(C), pp. 100642. doi 10.1016/j.softx.2020.100642, 2021.
  2. [Thrun, 2020] Thrun, M. C.: Improving the Sensitivity of Statistical Testing for Clusterability with Mirrored-Density Plot, in Archambault, D., Nabney, I. & Peltonen, J. (eds.), Machine Learning Methods in Visualisation for Big Data, DOI 10.2312/mlvis.20201102, The Eurographics Association, Norrköping , Sweden, May, 2020.
  3. [Thrun/Ultsch, 2020a] Thrun, M. C., & Ultsch, A.: Clustering Benchmark Datasets Exploiting the Fundamental Clustering Problems, Data in Brief,Vol. 30(C), pp. 105501, DOI 10.1016/j.dib.2020.105501 , 2020.
  4. [Thrun/Ultsch, 2021] Thrun, M. C., and Ultsch, A.: Swarm Intelligence for Self-Organized Clustering, Artificial Intelligence, Vol. 290, pp. 103237, \doi{10.1016/j.artint.2020.103237}, 2021.
  5. [Thrun/Ultsch, 2020b] Thrun, M. C., & Ultsch, A. : Using Projection based Clustering to Find Distance and Density based Clusters in High-Dimensional Data, Journal of Classification, \doi{10.1007/s00357-020-09373-2}, Springer, 2020.
You might also like...
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

Learning Opinion Summarizers by Selecting Informative Reviews This repository contains the codebase and the dataset for the corresponding EMNLP 2021

On Generating Extended Summaries of Long Documents

ExtendedSumm This repository contains the implementation details and datasets used in On Generating Extended Summaries of Long Documents paper at the

Automatic voice-synthetised summaries of latest research papers on arXiv

PaperWhisperer PaperWhisperer is a Python application that keeps you up-to-date with research papers. How? It retrieves the latest articles from arXiv

Code for ACL 21: Generating Query Focused Summaries from Query-Free Resources

marge This repository releases the code for Generating Query Focused Summaries from Query-Free Resources. Please cite the following paper [bib] if you

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time. This is the unofficial code of  Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.
Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles This project is for the paper: Detecting Errors and Estimating

This repository is related to an Arabic tutorial, within the tutorial we discuss the common data structure and algorithms and their worst and best case for each, then implement the code using Python.

Data Structure and Algorithms with Python This repository is related to the Arabic tutorial here, within the tutorial we discuss the common data struc

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

Comments
  • Mo gclustering2 model based clustering

    Mo gclustering2 model based clustering

    IMPORTANT UPDATE: MoGclustering renamed to ModelBasedClustering MoG Clustering is now defined es Mixture of Gaussians based on EM This is a change contrary to the book [Thrun, 2018]! Additionally density based clustering methods added.

    opened by Mthrun 1
  • Missing function

    Missing function

    `

    install.packages("FCPS") Installing package into ‘/home/roc/R/x86_64-pc-linux-gnu-library/4.0’ (as ‘lib’ is unspecified) trying URL 'https://cloud.r-project.org/src/contrib/FCPS_1.2.7.tar.gz' Content type 'application/x-gzip' length 2859121 bytes (2.7 MB) ================================================== downloaded 2.7 MB

    • installing source package ‘FCPS’ ... ** package ‘FCPS’ successfully unpacked and MD5 sums checked ** using staged installation ** R ** data *** moving datasets to lazyload DB ** inst ** byte-compile and prepare package for lazy loading ** help *** installing help indices ** building package indices ** installing vignettes ** testing if installed package can be loaded from temporary location ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path
    • DONE (FCPS)

    The downloaded source packages are in ‘/tmp/Rtmpzb0Mdh/downloaded_packages’

    data('Hepta')

    out=HierarchicalClusterDists(as.matrix(dist(Hepta$Data)),ClusterNo=7) Error in HierarchicalClusterDists(as.matrix(dist(Hepta$Data)), ClusterNo = 7) : could not find function "HierarchicalClusterDists" `

    opened by technocrat 3
Releases(1.2.3)
  • 1.2.3(Jun 23, 2020)

    Many conventional clustering algorithms are provided in this package with consistent input and output, which enables the user to try out algorithms swiftly. Additionally, 26 statistical approaches for the estimation of the number of clusters as well as the the mirrored density plot (MD-plot) of clusterability are implemented. Moreover, the fundamental clustering problems suite (FCPS) offers a variety of clustering challenges any algorithm should handle when facing real world data, see Thrun, M.C., Ultsch A.: "Clustering Benchmark Datasets Exploiting the Fundamental Clustering Problems" (2020), Data in Brief, DOI:10.1016/j.dib.2020.105501.

    Source code(tar.gz)
    Source code(zip)
Owner
Dr. rer. nat. Michael C. Thrun, promovierte 2017 an der Philipps-Universität Marburg am Lehrstuhl für Neuroinformatik unter Prof. Dr. rer. nat. Alfred Ultsch.
Repo for paper "Dynamic Placement of Rapidly Deployable Mobile Sensor Robots Using Machine Learning and Expected Value of Information"

Repo for paper "Dynamic Placement of Rapidly Deployable Mobile Sensor Robots Using Machine Learning and Expected Value of Information" Notes I probabl

Berkeley Expert System Technologies Lab 0 Jul 01, 2021
CBKH: The Cornell Biomedical Knowledge Hub

Cornell Biomedical Knowledge Hub (CBKH) CBKG integrates data from 18 publicly available biomedical databases. The current version of CBKG contains a t

44 Dec 21, 2022
LibMTL: A PyTorch Library for Multi-Task Learning

LibMTL LibMTL is an open-source library built on PyTorch for Multi-Task Learning (MTL). See the latest documentation for detailed introductions and AP

765 Jan 06, 2023
Julia package for contraction of tensor networks, based on the sweep line algorithm outlined in the paper General tensor network decoding of 2D Pauli codes

Julia package for contraction of tensor networks, based on the sweep line algorithm outlined in the paper General tensor network decoding of 2D Pauli codes

Christopher T. Chubb 35 Dec 21, 2022
The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)

EMANet News The bug in loading the pretrained model is now fixed. I have updated the .pth. To use it, download it again. EMANet-101 gets 80.99 on the

Xia Li 李夏 663 Nov 30, 2022
PRIN/SPRIN: On Extracting Point-wise Rotation Invariant Features

PRIN/SPRIN: On Extracting Point-wise Rotation Invariant Features Overview This repository is the Pytorch implementation of PRIN/SPRIN: On Extracting P

Yang You 17 Mar 02, 2022
Next-Best-View Estimation based on Deep Reinforcement Learning for Active Object Classification

next_best_view_rl Setup Clone the repository: git clone --recurse-submodules ... In 'third_party/zed-ros-wrapper': git checkout devel Install mujoco `

Christian Korbach 1 Feb 15, 2022
Code for Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Piggyback: https://arxiv.org/abs/1801.06519 Pretrained masks and backbones are available here: https://uofi.box.com/s/c5kixsvtrghu9yj51yb1oe853ltdfz4q

Arun Mallya 165 Nov 22, 2022
Pytorch implementation of DeePSiM

Pytorch implementation of DeePSiM

1 Nov 05, 2021
Using modified BiSeNet for face parsing in PyTorch

face-parsing.PyTorch Contents Training Demo References Training Prepare training data: -- download CelebAMask-HQ dataset -- change file path in the pr

zll 1.6k Jan 08, 2023
For storing the complete exploration of Visual Question Answering for our B.Tech Project

Multi-Image vqa @authors: Akhilesh, Janhavi, Harsh Paper summary, Ideas tried and their corresponding results: on wiki Other discussions: on discussio

Harsh Raj 3 Jun 16, 2022
Dense Prediction Transformers

Vision Transformers for Dense Prediction This repository contains code and models for our paper: Vision Transformers for Dense Prediction René Ranftl,

Intel ISL (Intel Intelligent Systems Lab) 1.3k Dec 28, 2022
Official implementation of the article "Unsupervised JPEG Domain Adaptation For Practical Digital Forensics"

Unsupervised JPEG Domain Adaptation for Practical Digital Image Forensics @WIFS2021 (Montpellier, France) Rony Abecidan, Vincent Itier, Jeremie Boulan

Rony Abecidan 6 Jan 06, 2023
This repository contains a toolkit for collecting, labeling and tracking object keypoints

This repository contains a toolkit for collecting, labeling and tracking object keypoints. Object keypoints are semantic points in an object's coordinate frame.

ETHZ ASL 13 Dec 12, 2022
RGB-D Local Implicit Function for Depth Completion of Transparent Objects

RGB-D Local Implicit Function for Depth Completion of Transparent Objects [Project Page] [Paper] Overview This repository maintains the official imple

NVIDIA Research Projects 43 Dec 12, 2022
SAAVN - Sound Adversarial Audio-Visual Navigation,ICLR2022 (In PyTorch)

SAAVN SAAVN Code release for paper "Sound Adversarial Audio-Visual Navigation,IC

YinfengYu 10 Aug 30, 2022
Hand Gesture Volume Control is AIML based project which uses image processing to control the volume of your Computer.

Hand Gesture Volume Control Modules There are basically three modules Handtracking Program Handtracking Module Volume Control Program Handtracking Pro

VITTAL 1 Jan 12, 2022
MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

offline-MBPO This repository contains the code of a version of model-based RL algorithm MBPO, which is modified to perform in offline RL settings Pape

LxzGordon 1 Oct 24, 2021
🚀 An end-to-end ML applications using PyTorch, W&B, FastAPI, Docker, Streamlit and Heroku

🚀 An end-to-end ML applications using PyTorch, W&B, FastAPI, Docker, Streamlit and Heroku

Made With ML 82 Jun 26, 2022
Code used to generate the results appearing in "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

Train longer, generalize better - Big batch training This is a code repository used to generate the results appearing in "Train longer, generalize bet

Elad Hoffer 145 Sep 16, 2022