A Graph Learning library for Humans

Overview

A Graph Learning library for Humans

These novel algorithms include but are not limited to:

  • A graph construction and graph searching class can be found here (NodeGraph). It was developed and invented as a faster alternative for hierarchical DAG construction and searching.
  • A fast DBSCAN method utilizing my connectivity code as invented during my PhD.
  • A NLP pattern matching algorithm useful for sequence alignment clustering.
  • High dimensional alignment code for aligning models to data.
  • An SVD based variant of the Distance Geometry algorithm. For going from relative to absolute coordinates.

License DOI Downloads

Visit the active code via : https://github.com/richardtjornhammar/graphtastic

Pip installation with :

pip install graphtastic

Version controlled installation of the Graphtastic library

The Graphtastic library

In order to run these code snippets we recommend that you download the nix package manager. Nix package manager links from Februari 2022:

https://nixos.org/download.html

$ curl -L https://nixos.org/nix/install | sh

If you cannot install it using your Wintendo then please consider installing Windows Subsystem for Linux first:

https://docs.microsoft.com/en-us/windows/wsl/install-win10

In order to run the code in this notebook you must enter a sensible working environment. Don't worry! We have created one for you. It's version controlled against python3.9 (and experimental python3.10 support) and you can get the file here:

https://github.com/richardtjornhammar/graphtastic/blob/master/env/env39.nix

Since you have installed Nix as well as WSL, or use a Linux (NixOS) or bsd like system, you should be able to execute the following command in a termnial:

$ nix-shell env39.nix

Now you should be able to start your jupyter notebook locally:

$ jupyter-notebook graphhaxxor.ipynb

and that's it.

EXAMPLE 0

Running

import graphtastic.graphs as gg
import graphtastic.clustering as gl
import graphtastic.fit as gf
import graphtastic.convert as gc

Should work if the install was succesful

Example 1 : Absolute and relative coordinates

In this example, we will use the SVD based distance geometry method to go between absolute coordinates, relative coordinate distances and back to ordered absolute coordinates. Absolute coordinates are float values describing the position of something in space. If you have several of these then the same information can be conveyed via the pairwise distance graph. Going from absolute coordinates to pairwise distances is simple and only requires you to calculate all the pairwise distances between your absolute coordinates. Going back to mutually orthogonal ordered coordinates from the pariwise distances is trickier, but a solved problem. The distance geometry can be obtained with SVD and it is implemented in the graphtastic.fit module under the name distance_matrix_to_absolute_coordinates. We start by defining coordinates afterwhich we can calculate the pair distance matrix and transforming it back by using the code below

import numpy as np

coordinates = np.array([[-23.7100 ,  24.1000 ,  85.4400],
  [-22.5600 ,  23.7600 ,  85.6500],
  [-21.5500 ,  24.6200 ,  85.3800],
  [-22.2600 ,  22.4200 ,  86.1900],
  [-23.2900 ,  21.5300 ,  86.4800],
  [-20.9300 ,  22.0300 ,  86.4300],
  [-20.7100 ,  20.7600 ,  86.9400],
  [-21.7900 ,  19.9300 ,  87.1900],
  [-23.0300 ,  20.3300 ,  86.9600],
  [-24.1300 ,  19.4200 ,  87.2500],
  [-23.7400 ,  18.0500 ,  87.0000],
  [-24.4900 ,  19.4600 ,  88.7500],
  [-23.3700 ,  19.8900 ,  89.5200],
  [-24.8500 ,  18.0000 ,  89.0900],
  [-23.9600 ,  17.4800 ,  90.0800],
  [-24.6600 ,  17.2400 ,  87.7500],
  [-24.0800 ,  15.8500 ,  88.0100],
  [-23.9600 ,  15.1600 ,  86.7600],
  [-23.3400 ,  13.7100 ,  87.1000],
  [-21.9600 ,  13.8700 ,  87.6300],
  [-24.1800 ,  13.0300 ,  88.1100],
  [-23.2900 ,  12.8200 ,  85.7600],
  [-23.1900 ,  11.2800 ,  86.2200],
  [-21.8100 ,  11.0000 ,  86.7000],
  [-24.1500 ,  11.0300 ,  87.3200],
  [-23.5300 ,  10.3200 ,  84.9800],
  [-23.5400 ,   8.9800 ,  85.4800],
  [-23.8600 ,   8.0100 ,  84.3400],
  [-23.9800 ,   6.5760 ,  84.8900],
  [-23.2800 ,   6.4460 ,  86.1300],
  [-23.3000 ,   5.7330 ,  83.7800],
  [-22.7300 ,   4.5360 ,  84.3100],
  [-22.2000 ,   6.7130 ,  83.3000],
  [-22.7900 ,   8.0170 ,  83.3800],
  [-21.8100 ,   6.4120 ,  81.9200],
  [-20.8500 ,   5.5220 ,  81.5200],
  [-20.8300 ,   5.5670 ,  80.1200],
  [-21.7700 ,   6.4720 ,  79.7400],
  [-22.3400 ,   6.9680 ,  80.8000],
  [-20.0100 ,   4.6970 ,  82.1500],
  [-19.1800 ,   3.9390 ,  81.4700] ]);

if __name__=='__main__':

    import graphtastic.fit as gf

    distance_matrix = gf.absolute_coordinates_to_distance_matrix( coordinates )
    ordered_coordinates = gf.distance_matrix_to_absolute_coordinates( distance_matrix , n_dimensions=3 )

    print ( ordered_coordinates )

You will notice that the largest variation is now aligned with the X axis, the second most variation aligned with the Y axis and the third most, aligned with the Z axis while the graph topology remained unchanged.

Example 2 : Deterministic DBSCAN

DBSCAN is a clustering algorithm that can be seen as a way of rejecting points, from any cluster, that are positioned in low dense regions of a point cloud. This introduces holes and may result in a larger segment, that would otherwise be connected via a non dense link to become disconnected and form two segments, or clusters. The rejection criterion is simple. The central concern is to evaluate a distance matrix with an applied cutoff this turns the distances into true or false values depending on if a pair distance between point i and j is within the distance cutoff. This new binary Neighbour matrix tells you wether or not two points are neighbours (including itself). The DBSCAN criterion states that a point is not part of any cluster if it has fewer than minPts neighbors. Once you've calculated the distance matrix you can immediately evaluate the number of neighbors each point has and the rejection criterion, via . If the rejection vector R value of a point is True then all the pairwise distances in the distance matrix of that point is set to a value larger than epsilon. This ensures that a distance matrix search will reject those points as neighbours of any other for the choosen epsilon. By tracing out all points that are neighbors and assessing the connectivity (search for connectivity) you can find all the clusters.

import numpy as np
from graphtastic.clustering import dbscan, reformat_dbscan_results
from graphtastic.fit import absolute_coordinates_to_distance_matrix

N   = 100
N05 = int ( np.floor(0.5*N) )
R   = 0.25*np.random.randn(N).reshape(N05,2) + 1.5
P   = 0.50*np.random.randn(N).reshape(N05,2)

coordinates = np.array([*P,*R])

results = dbscan ( distance_matrix = absolute_coordinates_to_distance_matrix(coordinates,bInvPow=True) , eps=0.45 , minPts=4 )
clusters = reformat_dbscan_results(results)
print ( clusters )

Example 3 : NodeGraph, distance matrix to DAG

Here we demonstrate how to convert the graph coordinates into a hierarchy. The leaf nodes will correspond to the coordinate positions.

import numpy as np

coordinates = np.array([[-23.7100 ,  24.1000 ,  85.4400],
  [-22.5600 ,  23.7600 ,  85.6500],
  [-21.5500 ,  24.6200 ,  85.3800],
  [-22.2600 ,  22.4200 ,  86.1900],
  [-23.2900 ,  21.5300 ,  86.4800],
  [-20.9300 ,  22.0300 ,  86.4300],
  [-20.7100 ,  20.7600 ,  86.9400],
  [-21.7900 ,  19.9300 ,  87.1900],
  [-23.0300 ,  20.3300 ,  86.9600],
  [-24.1300 ,  19.4200 ,  87.2500],
  [-23.7400 ,  18.0500 ,  87.0000],
  [-24.4900 ,  19.4600 ,  88.7500],
  [-23.3700 ,  19.8900 ,  89.5200],
  [-24.8500 ,  18.0000 ,  89.0900],
  [-23.9600 ,  17.4800 ,  90.0800],
  [-24.6600 ,  17.2400 ,  87.7500],
  [-24.0800 ,  15.8500 ,  88.0100],
  [-23.9600 ,  15.1600 ,  86.7600],
  [-23.3400 ,  13.7100 ,  87.1000],
  [-21.9600 ,  13.8700 ,  87.6300],
  [-24.1800 ,  13.0300 ,  88.1100],
  [-23.2900 ,  12.8200 ,  85.7600],
  [-23.1900 ,  11.2800 ,  86.2200],
  [-21.8100 ,  11.0000 ,  86.7000],
  [-24.1500 ,  11.0300 ,  87.3200],
  [-23.5300 ,  10.3200 ,  84.9800],
  [-23.5400 ,   8.9800 ,  85.4800],
  [-23.8600 ,   8.0100 ,  84.3400],
  [-23.9800 ,   6.5760 ,  84.8900],
  [-23.2800 ,   6.4460 ,  86.1300],
  [-23.3000 ,   5.7330 ,  83.7800],
  [-22.7300 ,   4.5360 ,  84.3100],
  [-22.2000 ,   6.7130 ,  83.3000],
  [-22.7900 ,   8.0170 ,  83.3800],
  [-21.8100 ,   6.4120 ,  81.9200],
  [-20.8500 ,   5.5220 ,  81.5200],
  [-20.8300 ,   5.5670 ,  80.1200],
  [-21.7700 ,   6.4720 ,  79.7400],
  [-22.3400 ,   6.9680 ,  80.8000],
  [-20.0100 ,   4.6970 ,  82.1500],
  [-19.1800 ,   3.9390 ,  81.4700] ]);


if __name__=='__main__':

    import graphtastic.graphs as gg
    import graphtastic.fit as gf
    GN = gg.NodeGraph()
    #
    # bInvPow refers to the distance type. If True then R distances are returned
    # instead of R2 (R**2) distances. That is also computing the square root if True
    #
    distm = gf.absolute_coordinates_to_distance_matrix( coordinates , bInvPow=True )
    #
    # Now a Graph DAG is constructed from the pairwise distances
    GN.distance_matrix_to_graph_dag( distm )
    #
    # And write it to a json file so that we may employ JS visualisations
    # such as D3 or other nice packages to view our hierarchy
    GN.write_json( jsonfile='./graph_hierarchy.json' )

Manually updated code backups for this library :

GitLab | https://gitlab.com/richardtjornhammar/graphtastic

CSDN | https://codechina.csdn.net/m0_52121311/graphtastic

You might also like...
Fastest Gephi's ForceAtlas2 graph layout algorithm implemented for Python and NetworkX
Fastest Gephi's ForceAtlas2 graph layout algorithm implemented for Python and NetworkX

ForceAtlas2 for Python A port of Gephi's Force Atlas 2 layout algorithm to Python 2 and Python 3 (with a wrapper for NetworkX and igraph). This is the

🐍PyNode Next allows you to easily create beautiful graph visualisations and animations
🐍PyNode Next allows you to easily create beautiful graph visualisations and animations

PyNode Next A complete rewrite of PyNode for the modern era. Up to five times faster than the original PyNode. PyNode Next allows you to easily create

LabGraph is a a Python-first framework used to build sophisticated research systems with real-time streaming, graph API, and parallelism.
LabGraph is a a Python-first framework used to build sophisticated research systems with real-time streaming, graph API, and parallelism.

LabGraph is a a Python-first framework used to build sophisticated research systems with real-time streaming, graph API, and parallelism.

Automatization of BoxPlot graph usin Python MatPlotLib and Excel

BoxPlotGraphAutomation Automatization of BoxPlot graph usin Python / Excel. This file is an automation of BoxPlot-Graph using python graph library mat

Library for exploring and validating machine learning data

TensorFlow Data Validation TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be hig

Library for exploring and validating machine learning data

TensorFlow Data Validation TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be hig

Declarative statistical visualization library for Python
Declarative statistical visualization library for Python

Altair http://altair-viz.github.io Altair is a declarative statistical visualization library for Python. With Altair, you can spend more time understa

Plotting library for IPython/Jupyter notebooks
Plotting library for IPython/Jupyter notebooks

bqplot 2-D plotting library for Project Jupyter Introduction bqplot is a 2-D visualization system for Jupyter, based on the constructs of the Grammar

Cartopy - a cartographic python library with matplotlib support
Cartopy - a cartographic python library with matplotlib support

Cartopy is a Python package designed to make drawing maps for data analysis and visualisation easy. Table of contents Overview Get in touch License an

Releases(v0.12.0)
Owner
Richard Tjörnhammar
PhD in Biological physics https://richardtjornhammar.github.io
Richard Tjörnhammar
OpenStats is a library built on top of streamlit that extracts data from the Github API and shows the main KPIs

Open Stats Discover and share the KPIs of your OpenSource project. OpenStats is a library built on top of streamlit that extracts data from the Github

Pere Miquel Brull 4 Apr 03, 2022
Machine learning beginner to Kaggle competitor in 30 days. Non-coders welcome. The program starts Monday, August 2, and lasts four weeks. It's designed for people who want to learn machine learning.

30-Days-of-ML-Kaggle 🔥 About the Hands On Program 💻 Machine learning beginner → Kaggle competitor in 30 days. Non-coders welcome The program starts

Roja Achary 145 Jan 01, 2023
Python ts2vg package provides high-performance algorithm implementations to build visibility graphs from time series data.

ts2vg: Time series to visibility graphs The Python ts2vg package provides high-performance algorithm implementations to build visibility graphs from t

Carlos Bergillos 26 Dec 17, 2022
🗾 Streamlit Component for rendering kepler.gl maps

streamlit-keplergl 🗾 Streamlit Component for rendering kepler.gl maps in a streamlit app. 🎈 Live Demo 🎈 Installation pip install streamlit-keplergl

Christoph Rieke 39 Dec 14, 2022
3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)

PyVista Deployment Build Status Metrics Citation License Community 3D plotting and mesh analysis through a streamlined interface for the Visualization

PyVista 1.6k Jan 08, 2023
Matplotlib colormaps from the yt project !

cmyt Matplotlib colormaps from the yt project ! Colormaps overview The following colormaps, as well as their respective reversed (*_r) versions are av

The yt project 5 Sep 16, 2022
✅ Today I Learn

Today I Learn EDA numpy_100ex numpy_0~10 airline_satisfaction_prediction BERT_naver_movie_classification NLP_prepare NLP_Tweet_Emotion_Recognition tex

Yeonghoo_Ahn 3 Dec 15, 2022
A GUI for Pandas DataFrames

PandasGUI A GUI for analyzing Pandas DataFrames. Demo Installation Install latest release from PyPi: pip install pandasgui Install directly from Githu

Adam 2.8k Jan 03, 2023
Small project to recursively calculate and plot each successive order of the Hilbert Curve

hilbert-curve Small project to recursively calculate and plot each successive order of the Hilbert Curve. After watching 3Blue1Brown's video on Hilber

Stefan Mejlgaard 2 Nov 15, 2021
Python Data. Leaflet.js Maps.

folium Python Data, Leaflet.js Maps folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js

6k Jan 02, 2023
A python wrapper for creating and viewing effects for Matt Parker's christmas tree.

Christmas Tree Visualizer A python wrapper for creating and viewing effects for Matt Parker's christmas tree. Displays py or csv effect files and allo

4 Nov 22, 2022
Draw tree diagrams from indented text input

Draw tree diagrams This repository contains two very different scripts to produce hierarchical tree diagrams like this one: $ ./classtree.py collectio

Luciano Ramalho 8 Dec 14, 2022
Python package to visualize and cluster partial dependence.

partial_dependence A python library for plotting partial dependence patterns of machine learning classifiers. The technique is a black box approach to

NYU Visualization Lab 25 Nov 14, 2022
Attractors is a package for simulation and visualization of strange attractors.

attractors Attractors is a package for simulation and visualization of strange attractors. Installation The simplest way to install the module is via

Vignesh M 45 Jul 31, 2022
PyFlow is a general purpose visual scripting framework for python

PyFlow is a general purpose visual scripting framework for python. State Base structure of program implemented, such things as packages disco

1.8k Jan 07, 2023
daily report of @arkinvest ETF activity + data collection

ark_invest daily weekday report of @arkinvest ETF activity + data collection This script was created to: Extract and save daily csv's from ARKInvest's

T D 27 Jan 02, 2023
This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, reading data from PubSub.

Sample streaming Dataflow pipeline written in Python This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, readin

Israel Herraiz 9 Mar 18, 2022
Fast scatter density plots for Matplotlib

About Plotting millions of points can be slow. Real slow... 😴 So why not use density maps? ⚡ The mpl-scatter-density mini-package provides functional

Thomas Robitaille 473 Dec 12, 2022
Create matplotlib visualizations from the command-line

MatplotCLI Create matplotlib visualizations from the command-line MatplotCLI is a simple utility to quickly create plots from the command-line, levera

Daniel Moura 46 Dec 16, 2022