Make sankey, alluvial and sankey bump plots in ggplot

Overview

ggsankey

The goal of ggsankey is to make beautiful sankey, alluvial and sankey bump plots in ggplot2

Installation

You can install the development version of ggsankey from github with:

# install.packages("devtools")
devtools::install_github("davidsjoberg/ggsankey")

How does it work

Google defines a sankey as:

A sankey diagram is a visualization used to depict a flow from one set of values to another. The things being connected are called nodes and the connections are called links. Sankeys are best used when you want to show a many-to-many mapping between two domains or multiple paths through a set of stages.

To plot a sankey diagram with ggsankey each observation has a stage (called a discrete x-value in ggplot) and be part of a node. Furthermore, each observation needs to have instructions of which node it will belong to in the next stage. See the image below for some clarification.

Hence, to use geom_sankey the aestethics x, next_x, node and next_node are required. The last stage should point to NA. The aestethics fill and color will affect both nodes and flows.

To controll geometries (not changed by data) like fill, color, size, alpha etc for nodes and flows you can either choose to set a global value that affect both, or you can specify which one you want to alter. For example node.color = 'black' will only draw a black line around the nodes, but not the flows (links).

Example

geom_sankey

A basic sankey plot that shows how dimensions are linked.

library(ggsankey)
library(dplyr)
library(ggplot2)

df <- mtcars %>%
  make_long(cyl, vs, am, gear, carb)

ggplot(df, aes(x = x, 
               next_x = next_x, 
               node = node, 
               next_node = next_node,
               fill = factor(node))) +
  geom_sankey()

And by adding a little pimp.

  • Labels with geom_sankey_label which places labels in the center of nodes if given the same aestethics.

  • ggsankey also comes with custom minimalistic themes that can be used. Here I use theme_sankey.

ggplot(df, aes(x = x, next_x = next_x, node = node, next_node = next_node, fill = factor(node), label = node)) +
  geom_sankey(flow.alpha = .6,
              node.color = "gray30") +
  geom_sankey_label(size = 3, color = "white", fill = "gray40") +
  scale_fill_viridis_d() +
  theme_sankey(base_size = 18) +
  labs(x = NULL) +
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5)) +
  ggtitle("Car features")

geom_alluvial

Alluvial plots are very similiar to sankey plots but have no spaces between nodes and start at y = 0 instead being centered around the x-axis.

ggplot(df, aes(x = x, next_x = next_x, node = node, next_node = next_node, fill = factor(node), label = node)) +
  geom_alluvial(flow.alpha = .6) +
  geom_alluvial_text(size = 3, color = "white") +
  scale_fill_viridis_d() +
  theme_alluvial(base_size = 18) +
  labs(x = NULL) +
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5)) +
  ggtitle("Car features")

geom_sankey_bump

Sankey bump plots is mix between bump plots and sankey and mostly useful for time series. When a group becomes larger than another it bumps above it.

# install.packages("gapminder")
library(gapminder)

df <- gapminder %>%
  group_by(continent, year) %>%
  summarise(gdp = (sum_(pop * gdpPercap)/1e9) %>% round(0), .groups = "keep") %>%
  ungroup()

ggplot(df, aes(x = year,
               node = continent,
               fill = continent,
               value = gdp)) +
  geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 6) +
  scale_fill_viridis_d(option = "A", alpha = .8) +
  theme_sankey_bump(base_size = 16) +
  labs(x = NULL,
       y = "GDP ($ bn)",
       fill = NULL,
       color = NULL) +
  theme(legend.position = "bottom") +
  labs(title = "GDP development per continent")

Owner
David Sjoberg
Happy R user. Twitter: @davsjob
David Sjoberg
Streamlit dashboard examples - Twitter cashtags, StockTwits, WSB, Charts, SQL Pattern Scanner

streamlit-dashboards Streamlit dashboard examples - Twitter cashtags, StockTwits, WSB, Charts, SQL Pattern Scanner Tutorial Video https://ww

122 Dec 21, 2022
Simple and fast histogramming in Python accelerated with OpenMP.

pygram11 Simple and fast histogramming in Python accelerated with OpenMP with help from pybind11. pygram11 provides functions for very fast histogram

Doug Davis 28 Dec 14, 2022
Type-safe YAML parser and validator.

StrictYAML StrictYAML is a type-safe YAML parser that parses and validates a restricted subset of the YAML specification. Priorities: Beautiful API Re

Colm O'Connor 1.2k Jan 04, 2023
A python wrapper for creating and viewing effects for Matt Parker's christmas tree.

Christmas Tree Visualizer A python wrapper for creating and viewing effects for Matt Parker's christmas tree. Displays py or csv effect files and allo

4 Nov 22, 2022
Fast data visualization and GUI tools for scientific / engineering applications

PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h

pyqtgraph 3.1k Jan 08, 2023
Python scripts to manage Chia plots and drive space, providing full reports. Also monitors the number of chia coins you have.

Chia Plot, Drive Manager & Coin Monitor (V0.5 - April 20th, 2021) Multi Server Chia Plot and Drive Management Solution Be sure to ⭐ my repo so you can

338 Nov 25, 2022
A D3.js plugin that produces flame graphs from hierarchical data.

d3-flame-graph A D3.js plugin that produces flame graphs from hierarchical data. If you don't know what flame graphs are, check Brendan Gregg's post.

Martin Spier 740 Dec 29, 2022
Small project demonstrating the use of Grafana and InfluxDB for monitoring the speed of an internet connection

Speedtest monitor for Grafana A small project that allows internet speed monitoring using Grafana, InfluxDB 2 and Speedtest. Demo Requirements Docker

Joshua Ghali 3 Aug 06, 2021
Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

10k Jan 01, 2023
A python package for animating plots build on matplotlib.

animatplot A python package for making interactive as well as animated plots with matplotlib. Requires Python = 3.5 Matplotlib = 2.2 (because slider

Tyler Makaro 394 Dec 18, 2022
A simple python tool for explore your object detection dataset

A simple tool for explore your object detection dataset. The goal of this library is to provide simple and intuitive visualizations from your dataset and automatically find the best parameters for ge

GRADIANT - Centro Tecnolóxico de Telecomunicacións de Galicia 142 Dec 25, 2022
Example scripts for generating plots of Bohemian matrices

Bohemian Eigenvalue Plotting Examples This repository contains examples of generating plots of Bohemian eigenvalues. The examples in this repository a

Bohemian Matrices 5 Nov 12, 2022
cqMore is a CadQuery plugin based on CadQuery 2.1.

cqMore (under construction) cqMore is a CadQuery plugin based on CadQuery 2.1. Installation Please use conda to install CadQuery and its dependencies

Justin Lin 36 Dec 21, 2022
Backend app for visualizing CANedge log files in Grafana (directly from local disk or S3)

CANedge Grafana Backend - Visualize CAN/LIN Data in Dashboards This project enables easy dashboard visualization of log files from the CANedge CAN/LIN

13 Dec 15, 2022
Standardized plots and visualizations in Python

Standardized plots and visualizations in Python pltviz is a Python package for standardized visualization. Routine and novel plotting approaches are f

Andrew Tavis McAllister 0 Jul 09, 2022
DrawBot lets you draw images taken from the internet on Skribbl.io, Gartic Phone and Paint

DrawBot You don't speak french? No worries, english translation is over here. C'est quoi ? DrawBot est un logiciel codé par V2F qui va prendre possess

V2F 205 Jan 01, 2023
Sci palettes for matplotlib/seaborn

sci palettes for matplotlib/seaborn Installation python3 -m pip install sci-palettes Usage import seaborn as sns import matplotlib.pyplot as plt impor

Qingdong Su 2 Jun 07, 2022
With Holoviews, your data visualizes itself.

HoloViews Stop plotting your data - annotate your data and let it visualize itself. HoloViews is an open-source Python library designed to make data a

HoloViz 2.3k Jan 02, 2023
Alternative layout visualizer for ZSA Moonlander keyboard

General info This is a keyboard layout visualizer for ZSA Moonlander keyboard (because I didn't find their Oryx or their training tool particularly us

10 Jul 19, 2022
1900-2016 Olympic Data Analysis in Python by plotting different graphs

🔥 Olympics Data Analysis 🔥 In Data Science field, there is a big topic before creating a model for future prediction is Data Analysis. We can find o

Sayan Roy 1 Feb 06, 2022