Neighbourhood Retrieval with Distance Correlation

Assign Pseudo class labels to datapoints in the latent space.

NNDC is a slim wrapper around FAISS.
NNDC transforms the space such that the Inner Product Index in FAISS (IndexFlatIP) computes the Distance Correlation.
Support for KernelPCA (non-linear PCA) for dimensionality reduction.

Installation

pip install git+https://github.com/The-Learning-Machines/nndc

Usage

dim = 128 
n = 20000

import nndc
import numpy as np

index = nndc.DCIndex(
    in_dim=dim, # Dimensionality of the input vectors
    threshold=0.2, # How far away from a vector is the neighbourhood
    out_dim=32, # Dimensionality of the vectors after PCA (only needed if using PCA)
    use_pca=True, # Use KernelPCA
    verbose=True,
    kernel="rbf" # Use Radial Basis Function as the kernel for KernelPCA
)

# Generate Random data
np.random.seed(1234)             
xb = np.random.random((n, dim)).astype('float32')
xb[:, 0] += np.arange(n) / 1000.
xq = np.random.random((100, dim)).astype('float32')
xq[:, 0] += np.arange(100) / 1000.

# Fit KernelPCA
index.add_pca_training_data(xb[:1000, :])
index.fit_pca()

# Add vectors to the Index
vector_ids = np.arange(xb.shape[0])
index.add(xb, vector_ids)

# Build a nerighbourhood graph
index.build_neighbourhood()

# Query the neighbours of vector with ID=0
neighbour_ids, neighbour_similarity = index[0]

Neighbourhood Retrieval (Nearest Neighbours) with Distance Correlation.

Related tags

Overview

Neighbourhood Retrieval with Distance Correlation

Installation

Usage

Owner

The Learning Machines

Python Research Framework

database for artificial intelligence/machine learning data

Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort

Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas.

moDel Agnostic Language for Exploration and eXplanation

cuML - RAPIDS Machine Learning Library

The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Implementations of Machine Learning models, Regularizers, Optimizers and different Cost functions.

mlpack: a scalable C++ machine learning library --

SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow

Machine Learning University: Accelerated Natural Language Processing Class

The project's goal is to show a real world application of image segmentation using k means algorithm

icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

Sequence learning toolkit for Python

onelearn: Online learning in Python

Conducted ANOVA and Logistic regression analysis using matplot library to visualize the result.

MooGBT is a library for Multi-objective optimization in Gradient Boosted Trees.

A single Python file with some tools for visualizing machine learning in the terminal.

AP1 Transcription Factor Binding Site Prediction