Hierarchical Uniform Manifold Approximation and Projection

Overview

pypi_version pypi_downloads

HUMAP exploration on Fashion MNIST dataset

HUMAP

Hierarchical Manifold Approximation and Projection (HUMAP) is a technique based on UMAP for hierarchical non-linear dimensionality reduction. HUMAP allows to:

  1. Focus on important information while reducing the visual burden when exploring whole datasets;
  2. Drill-down the hierarchy according to information demand.

The details of the algorithm can be found in our paper on ArXiv.

Installation

HUMAP was written in C++ for performance purposes, and it has an intuitive Python interface. It depends upon common machine learning libraries, such as scikit-learn and NumPy. It also needs the pybind11 due to the interface between C++ and Python.

Requirements:

  • Python 3.6 or greater
  • numpy
  • scipy
  • scikit-learn
  • pybind11
  • Eigen (C++)

If you have these requirements installed, use PyPI:

pip install humap

For Windows users:

The Eigen library does not have to be installed. Just add the files to C:Eigen or use the manual installation to change Eigen location.

Manual installation:

For manually installing HUMAP, download the project and proceed as follows:

python setup.py bdist_wheel
pip install dist/humap*.whl

Usage examples

HUMAP package follows the same idea of sklearn classes, in which you need to fit and transform data.

Fitting the hierarchy

import humap
from sklearn.datasets import fetch_openml


X, y = fetch_openml('mnist_784', version=1, return_X_y=True)

hUmap = humap.HUMAP()
hUmap.fit(X, y)

HUMAP embedding of top-level MNIST digits

By now, you can control six parameters related to the hierarchy construction and the embedding performed by UMAP.

  • levels: Controls the number of hierarchical levels + the first one (whole dataset). This parameter also controls how many data points are in each hierarchical level. The default is [0.2, 0.2], meaning the HUMAP will produce three levels: The first one with the whole dataset, the second one with 20% of the first level, and the third with 20% of the second level.
  • n_neighbors: This parameter controls the number of neighbors for approximating the manifold structures. Larger values produce embedding that preserves more of the global relations. In HUMAP, we recommend and set the default value to be 100.
  • min_dist: This parameter, used in UMAP dimensionality reduction, controls the allowance to cluster data points together. According to UMAP documentation, larger values allow evenly distributed embeddings, while smaller values encode the local structures better. We set this parameter as 0.15 as default.
  • knn_algorithm: Controls which knn approximation will be used, in which NNDescent is the default. Another option is ANNOY or FLANN if you have Python installations of these algorithms at the expense of slower run-time executions than NNDescent.
  • init: Controls the method for initing the low-dimensional representation. We set Spectral as default since it yields better global structure preservation. You can also use random initialization.
  • verbose: Controls the verbosity of the algorithm.

Embedding a hierarchical level

After fitting the dataset, you can generate the embedding for a hierarchical level by specifying the level.

embedding_l2 = hUmap.transform(2)
y_l2 = hUmap.labels(2)

Notice that the .labels() method only works for levels equal or greater than one.

Drilling down the hierarchy by embedding a subset of data points based on indices

Embedding data subsets throughout HUMAP hierarchy

When interested in a set of data samples, HUMAP allows for drilling down the hierarchy for those samples.

embedding, y, indices = hUmap.transform(2, indices=indices_of_interest)

This method returns the embedding coordinates, the labels (y), and the data points' indices in the current level. Notice that the current level is now level 1 since we used the hierarchy level 2 for drilling down operation.

Drilling down the hierarchy by embedding a subset of data points based on labels

You can apply the same concept as above to embed data points based on labels.

embedding, y, indices = hUmap.transform(2, indices=np.array([4, 9]), class_based=True)

C++ UMAP implementation

You can also fit a one-level HUMAP hierarchy, which essentially corresponds to a UMAP projection.

umap_reducer = humap.HUMAP(np.array([]))
umap_reducer.fit(X, y)

embedding = umap_reducer.transform(0)

Citation

Please, use the following reference to cite HUMAP in your work:

@misc{marciliojr_humap2021,
  title={HUMAP: Hierarchical Uniform Manifold Approximation and Projection},
  author={Wilson E. Marcílio-Jr and Danilo M. Eler and Fernando V. Paulovich and Rafael M. Martins},
  year={2021},
  eprint={2106.07718},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
    }

License

HUMAP follows the 3-clause BSD license and it uses the open-source NNDescent implementation from EFANNA. It also uses a C++ implementation of UMAP for embedding hierarchy levels; this project would not be possible without UMAP's fantastic technique and package.

E-mail me (wilson_jr at outlook.com) if you like to contribute.


You might also like...
Finite-temperature variational Monte Carlo calculation of uniform electron gas using neural canonical transformation.

CoulombGas This code implements the neural canonical transformation approach to the thermodynamic properties of uniform electron gas. Building on JAX,

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

Nonuniform-to-Uniform Quantization This repository contains the training code of N2UQ introduced in our CVPR 2022 paper: "Nonuniform-to-Uniform Quanti

Curved Projection Reformation
Curved Projection Reformation

Description Assuming that we already know the image of the centerline, we want the lumen to be displayed on a plane, which requires curved projection

Implementation of
Implementation of "Bidirectional Projection Network for Cross Dimension Scene Understanding" CVPR 2021 (Oral)

Bidirectional Projection Network for Cross Dimension Scene Understanding CVPR 2021 (Oral) [ Project Webpage ] [ arXiv ] [ Video ] Existing segmentatio

my graduation project is about live human face augmentation by projection mapping by using CNN
my graduation project is about live human face augmentation by projection mapping by using CNN

Live-human-face-expression-augmentation-by-projection my graduation project is about live human face augmentation by projection mapping by using CNN o

Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology

Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology Sharon Zhou, Eric Zelikman

Code for Learning Manifold Patch-Based Representations of Man-Made Shapes, in ICLR 2021.
Code for Learning Manifold Patch-Based Representations of Man-Made Shapes, in ICLR 2021.

LearningPatches | Webpage | Paper | Video Learning Manifold Patch-Based Representations of Man-Made Shapes Dmitriy Smirnov, Mikhail Bessmeltsev, Justi

A Pytorch implementation of
A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generative Modeling" (ICCV 2021)

Manifold Matching via Deep Metric Learning for Generative Modeling A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generat

Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning

Manifold-SCA Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning The repo is org

Comments
  • [Packaging] Requesting conda-forge package

    [Packaging] Requesting conda-forge package

    Hi,

    Just putting it out there that you might want to consider putting up your package on conda-forge. Many other packages like numpy, scikit-learn, umap, are all available on conda-forge, and managing them through conda cli makes it easy to be up-to-date and not worry about dependencies like MKL, which pip doesn't handle well.

    As a bonus, I see that this package depends on Eigen, which needs to be manually configured on Windows. Conda-forge already has eigen available, which might make this much less error-prone for Windows users, which I assume will be a substantial chunk.

    Just as an FYI, here is a link for conda-forge submission process.

    Thanks!

    opened by stallam-unb 6
  • RuntimeError: Some rows contain fewer than n_neighbors distances

    RuntimeError: Some rows contain fewer than n_neighbors distances

    Problems when computing hierarchy for small datasets. I tried to execute HUMAP on Iris dataset using 100, 15, and 10 n_neighbors.

    RuntimeError: Some rows contain fewer than n_neighbors distances

    opened by wilsonjr 1
  • Transform with new data?

    Transform with new data?

    Semi-related to #4 , but my case is that I want to use HUMAP on a supervised data where I have a training data with labels, and I want to be able to project new test data with the same embeddings. UMAP supports this use case, I was wondering if this would be theoretically possible with HUMAP as well? Would be nice to be able to use HUMAP to interpret classifier decisions.

    opened by stallam-unb 0
  • Semi-supervised learning?

    Semi-supervised learning?

    Thanks for writing this awesome library, only recently discovered it. Do you have plans to support semi-supervised umap? From my first try outs of your library, this is the fastest (h)umap implementation which has nndescent. I would like to use it for semi-supervised learning, too.

    enhancement 
    opened by KnutJaegersberg 6
Releases(v0.2.1)
Owner
Wilson Estécio Marcílio Júnior
PhD Candidate in Computer Science. Interested in ML and Explainability.
Wilson Estécio Marcílio Júnior
CowHerd is a partially-observed reinforcement learning environment

CowHerd is a partially-observed reinforcement learning environment, where the player walks around an area and is rewarded for milking cows. The cows try to escape and the player can place fences to h

Danijar Hafner 6 Mar 06, 2022
Anatomy of Matplotlib -- tutorial developed for the SciPy conference

Introduction This tutorial is a complete re-imagining of how one should teach users the matplotlib library. Hopefully, this tutorial may serve as insp

Matplotlib Developers 1.1k Dec 29, 2022
Parallel Latent Tree-Induction for Faster Sequence Encoding

FastTrees This repository contains the experimental code supporting the FastTrees paper by Bill Pung. Software Requirements Python 3.6, NLTK and PyTor

Bill Pung 4 Mar 29, 2022
Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

mmc installation git clone https://github.com/dmarx/Multi-Modal-Comparators cd 'Multi-Modal-Comparators' pip install poetry poetry build pip install d

David Marx 37 Nov 25, 2022
Official implementation of "A Unified Objective for Novel Class Discovery", ICCV2021 (Oral)

A Unified Objective for Novel Class Discovery This is the official repository for the paper: A Unified Objective for Novel Class Discovery Enrico Fini

Enrico Fini 118 Dec 26, 2022
(Personalized) Page-Rank computation using PyTorch

torch-ppr This package allows calculating page-rank and personalized page-rank via power iteration with PyTorch, which also supports calculation on GP

Max Berrendorf 69 Dec 03, 2022
This repository is an implementation of paper : Improving the Training of Graph Neural Networks with Consistency Regularization

CRGNN Paper : Improving the Training of Graph Neural Networks with Consistency Regularization Environments Implementing environment: GeForce RTX™ 3090

THUDM 28 Dec 09, 2022
Differentiable molecular simulation of proteins with a coarse-grained potential

Differentiable molecular simulation of proteins with a coarse-grained potential This repository contains the learned potential, simulation scripts and

UCL Bioinformatics Group 44 Dec 10, 2022
Semiconductor Machine learning project

Wafer Fault Detection Problem Statement: Wafer (In electronics), also called a slice or substrate, is a thin slice of semiconductor, such as a crystal

kunal suryawanshi 1 Jan 15, 2022
上海交通大学全自动抢课脚本,支持准点开抢与抢课后持续捡漏两种模式。2021/06/08更新。

Welcome to Course-Bullying-in-SJTU-v3.1! 2021/6/8 紧急更新v3.1 更新说明 为了更好地保护用户隐私,将原来用户名+密码的登录方式改为微信扫二维码+cookie登录方式,不再需要配置使用pytesseract。在使用扫码登录模式时,请稍等,二维码将马

87 Sep 13, 2022
How to Train a GAN? Tips and tricks to make GANs work

(this list is no longer maintained, and I am not sure how relevant it is in 2020) How to Train a GAN? Tips and tricks to make GANs work While research

Soumith Chintala 10.8k Dec 31, 2022
Few-Shot Object Detection via Association and DIscrimination

Few-Shot Object Detection via Association and DIscrimination Code release of our NeurIPS 2021 paper: Few-Shot Object Detection via Association and DIs

Cao Yuhang 49 Dec 18, 2022
Code implementation of Data Efficient Stagewise Knowledge Distillation paper.

Data Efficient Stagewise Knowledge Distillation Table of Contents Data Efficient Stagewise Knowledge Distillation Table of Contents Requirements Image

IvLabs 112 Dec 02, 2022
GE2340 project source code without credentials.

GE2340-Project-Public GE2340 project source code without credentials. Run the bot.py to start the bot Telegram: @jasperwong_ge2340_bot If the bot does

0 Feb 10, 2022
Code for LIGA-Stereo Detector, ICCV'21

LIGA-Stereo Introduction This is the official implementation of the paper LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based

Xiaoyang Guo 75 Dec 09, 2022
a short visualisation script for pyvideo data

PyVideo Speakers A CLI that visualises repeat speakers from events listed in https://github.com/pyvideo/data Not terribly efficient, but you know. Ins

Katie McLaughlin 3 Nov 24, 2021
Official implementation of "SinIR: Efficient General Image Manipulation with Single Image Reconstruction" (ICML 2021)

SinIR (Official Implementation) Requirements To install requirements: pip install -r requirements.txt We used Python 3.7.4 and f-strings which are in

47 Oct 11, 2022
[SDM 2022] Towards Similarity-Aware Time-Series Classification

SimTSC This is the PyTorch implementation of SDM2022 paper Towards Similarity-Aware Time-Series Classification. We propose Similarity-Aware Time-Serie

Daochen Zha 49 Dec 27, 2022
State of the Art Neural Networks for Deep Learning

pyradox This python library helps you with implementing various state of the art neural networks in a totally customizable fashion using Tensorflow 2

Ritvik Rastogi 60 May 29, 2022
Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

Princeton Vision & Learning Lab 115 Jan 04, 2023