Make sankey, alluvial and sankey bump plots in ggplot

Overview

ggsankey

The goal of ggsankey is to make beautiful sankey, alluvial and sankey bump plots in ggplot2

Installation

You can install the development version of ggsankey from github with:

# install.packages("devtools")
devtools::install_github("davidsjoberg/ggsankey")

How does it work

Google defines a sankey as:

A sankey diagram is a visualization used to depict a flow from one set of values to another. The things being connected are called nodes and the connections are called links. Sankeys are best used when you want to show a many-to-many mapping between two domains or multiple paths through a set of stages.

To plot a sankey diagram with ggsankey each observation has a stage (called a discrete x-value in ggplot) and be part of a node. Furthermore, each observation needs to have instructions of which node it will belong to in the next stage. See the image below for some clarification.

Hence, to use geom_sankey the aestethics x, next_x, node and next_node are required. The last stage should point to NA. The aestethics fill and color will affect both nodes and flows.

To controll geometries (not changed by data) like fill, color, size, alpha etc for nodes and flows you can either choose to set a global value that affect both, or you can specify which one you want to alter. For example node.color = 'black' will only draw a black line around the nodes, but not the flows (links).

Example

geom_sankey

A basic sankey plot that shows how dimensions are linked.

library(ggsankey)
library(dplyr)
library(ggplot2)

df <- mtcars %>%
  make_long(cyl, vs, am, gear, carb)

ggplot(df, aes(x = x, 
               next_x = next_x, 
               node = node, 
               next_node = next_node,
               fill = factor(node))) +
  geom_sankey()

And by adding a little pimp.

  • Labels with geom_sankey_label which places labels in the center of nodes if given the same aestethics.

  • ggsankey also comes with custom minimalistic themes that can be used. Here I use theme_sankey.

ggplot(df, aes(x = x, next_x = next_x, node = node, next_node = next_node, fill = factor(node), label = node)) +
  geom_sankey(flow.alpha = .6,
              node.color = "gray30") +
  geom_sankey_label(size = 3, color = "white", fill = "gray40") +
  scale_fill_viridis_d() +
  theme_sankey(base_size = 18) +
  labs(x = NULL) +
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5)) +
  ggtitle("Car features")

geom_alluvial

Alluvial plots are very similiar to sankey plots but have no spaces between nodes and start at y = 0 instead being centered around the x-axis.

ggplot(df, aes(x = x, next_x = next_x, node = node, next_node = next_node, fill = factor(node), label = node)) +
  geom_alluvial(flow.alpha = .6) +
  geom_alluvial_text(size = 3, color = "white") +
  scale_fill_viridis_d() +
  theme_alluvial(base_size = 18) +
  labs(x = NULL) +
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5)) +
  ggtitle("Car features")

geom_sankey_bump

Sankey bump plots is mix between bump plots and sankey and mostly useful for time series. When a group becomes larger than another it bumps above it.

# install.packages("gapminder")
library(gapminder)

df <- gapminder %>%
  group_by(continent, year) %>%
  summarise(gdp = (sum_(pop * gdpPercap)/1e9) %>% round(0), .groups = "keep") %>%
  ungroup()

ggplot(df, aes(x = year,
               node = continent,
               fill = continent,
               value = gdp)) +
  geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 6) +
  scale_fill_viridis_d(option = "A", alpha = .8) +
  theme_sankey_bump(base_size = 16) +
  labs(x = NULL,
       y = "GDP ($ bn)",
       fill = NULL,
       color = NULL) +
  theme(legend.position = "bottom") +
  labs(title = "GDP development per continent")

Owner
David Sjoberg
Happy R user. Twitter: @davsjob
David Sjoberg
Python wrapper for Synoptic Data API. Retrieve data from thousands of mesonet stations and networks. Returns JSON from Synoptic as Pandas DataFrame

☁ Synoptic API for Python (unofficial) The Synoptic Mesonet API (formerly MesoWest) gives you access to real-time and historical surface-based weather

Brian Blaylock 23 Jan 06, 2023
649 Pokémon palettes as CSVs, with a Python lib to turn names/IDs into palettes, or MatPlotLib compatible ListedColormaps.

PokePalette 649 Pokémon, broken down into CSVs of their RGB colour palettes. Complete with a Python library to convert names or Pokédex IDs into eithe

11 Dec 05, 2022
daily report of @arkinvest ETF activity + data collection

ark_invest daily weekday report of @arkinvest ETF activity + data collection This script was created to: Extract and save daily csv's from ARKInvest's

T D 27 Jan 02, 2023
The visual framework is designed on the idea of module and implemented by mixin method

Visual Framework The visual framework is designed on the idea of module and implemented by mixin method. Its biggest feature is the mixins module whic

LEFTeyes 9 Sep 19, 2022
Backend app for visualizing CANedge log files in Grafana (directly from local disk or S3)

CANedge Grafana Backend - Visualize CAN/LIN Data in Dashboards This project enables easy dashboard visualization of log files from the CANedge CAN/LIN

13 Dec 15, 2022
Python scripts for plotting audiograms and related data from Interacoustics Equinox audiometer and Otoaccess software.

audiometry Python scripts for plotting audiograms and related data from Interacoustics Equinox 2.0 audiometer and Otoaccess software. Maybe similar sc

Hamilton Lab at UT Austin 2 Jun 15, 2022
This is a Web scraping project using BeautifulSoup and Python to scrape basic information of all the Test matches played till Jan 2022.

Scraping-test-matches-data This is a Web scraping project using BeautifulSoup and Python to scrape basic information of all the Test matches played ti

Souradeep Banerjee 4 Oct 10, 2022
Simulation du problème de Monty Hall avec Python et matplotlib

Le problème de Monty Hall C'est un jeu télévisé où il y a trois portes sur le plateau de jeu. Seule une de ces portes cache un trésor. Il n'y a rien d

ETCHART YANG 1 Jan 06, 2022
A Python package that provides evaluation and visualization tools for the DexYCB dataset

DexYCB Toolkit DexYCB Toolkit is a Python package that provides evaluation and visualization tools for the DexYCB dataset. The dataset and results wer

NVIDIA Research Projects 107 Dec 26, 2022
Data parsing and validation using Python type hints

pydantic Data validation and settings management using Python type hinting. Fast and extensible, pydantic plays nicely with your linters/IDE/brain. De

Samuel Colvin 12.1k Jan 06, 2023
Peloton Stats to Google Sheets with Data Visualization through Seaborn and Plotly

Peloton Stats to Google Sheets with Data Visualization through Seaborn and Plotly Problem: 2 peloton users were looking for a way to track their metri

9 Jul 22, 2022
DrawBot lets you draw images taken from the internet on Skribbl.io, Gartic Phone and Paint

DrawBot You don't speak french? No worries, english translation is over here. C'est quoi ? DrawBot est un logiciel codé par V2F qui va prendre possess

V2F 205 Jan 01, 2023
Official Matplotlib cheat sheets

Official Matplotlib cheat sheets

Matplotlib Developers 6.7k Jan 09, 2023
By default, networkx has problems with drawing self-loops in graphs.

By default, networkx has problems with drawing self-loops in graphs. It makes it hard to draw a graph with self-loops or to make a nicely looking chord diagram. This repository provides some code to

Vladimir Shitov 5 Jan 06, 2022
Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t

Plotly 17.9k Dec 31, 2022
Area-weighted venn-diagrams for Python/matplotlib

Venn diagram plotting routines for Python/Matplotlib Routines for plotting area-weighted two- and three-circle venn diagrams. Installation The simples

Konstantin Tretyakov 400 Dec 31, 2022
Mattia Ficarelli 2 Mar 29, 2022
Plotting data from the landroid and a raspberry pi zero to a influx-db

landroid-pi-influx Plotting data from the landroid and a raspberry pi zero to a influx-db Dependancies Hardware: Landroid WR130E Raspberry Pi Zero Wif

2 Oct 22, 2021
Python ts2vg package provides high-performance algorithm implementations to build visibility graphs from time series data.

ts2vg: Time series to visibility graphs The Python ts2vg package provides high-performance algorithm implementations to build visibility graphs from t

Carlos Bergillos 26 Dec 17, 2022
Small U-Net for vehicle detection

Small U-Net for vehicle detection Vivek Yadav, PhD Overview In this repository , we will go over using U-net for detecting vehicles in a video stream

Vivek Yadav 91 Nov 03, 2022