DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN

Last update: Dec 08, 2022

Overview

Amazon DenseClus

DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN. Allowing for both categorical and numerical data, DenseClus makes it possible to incorporate all features in clustering.

Installation

python3 -m pip install Amazon-DenseClus

Usage

DenseClus requires a Panda's dataframe as input with both numerical and categorical columns. All preprocessing and extraction are done under the hood, just call fit and then retrieve the clusters!

from denseclus import DenseClus

clf = DenseClus(
    umap_combine_method="intersection_union_mapper",
)
clf.fit(df)

print(clf.score())

Examples

A hands-on example with an overview of how to use is currently available in the form of a Jupyter Notebook.

References

@article{mcinnes2018umap-software,
  title={UMAP: Uniform Manifold Approximation and Projection},
  author={McInnes, Leland and Healy, John and Saul, Nathaniel and Grossberger, Lukas},
  journal={The Journal of Open Source Software},
  volume={3},
  number={29},
  pages={861},
  year={2018}
}

@article{mcinnes2017hdbscan,
  title={hdbscan: Hierarchical density based clustering},
  author={McInnes, Leland and Healy, John and Astels, Steve},
  journal={The Journal of Open Source Software},
  volume={2},
  number={11},
  pages={205},
  year={2017}
}

DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN

Related tags

Overview

Amazon DenseClus

Installation

Usage

Examples

References

Owner

Amazon Web Services - Labs

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

Python package for analyzing sensor-collected human motion data

WaveFake: A Data Set to Facilitate Audio DeepFake Detection

A utility for functional piping in Python that allows you to access any function in any scope as a partial.

DefAP is a program developed to facilitate the exploration of a material's defect chemistry

A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset

A computer algebra system written in pure Python

This cosmetics generator allows you to generate the new Fortnite cosmetics, Search pak and search cosmetics!

MeSH2Matrix - A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

An orchestration platform for the development, production, and observation of data assets.

Repository created with LinkedIn profile analysis project done

Monitor the stability of a pandas or spark dataframe ⚙︎

Jupyter notebooks for the book "The Elements of Statistical Learning".

Data imputations library to preprocess datasets with missing data

This creates a ohlc timeseries from downloaded CSV files from NSE India website and makes a SQLite database for your research.

In this tutorial, raster models of soil depth and soil water holding capacity for the United States will be sampled at random geographic coordinates within the state of Colorado.

An Indexer that works out-of-the-box when you have less than 100K stored Documents

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

We're Team Arson and we're using the power of predictive modeling to combat wildfires.

Binance Kline Data With Python