Python & Julia port of codes in excellent R books

Overview

X4DS

This repo is a collection of

Python & Julia port of codes in the following excellent R books:

Python Stack Julia Stack
Language
Version
v3.9 v1.7
Data
Processing
  • Pandas
  • DataFrames
  • Visualization
  • Matplotlib
  • Seaborn
  • MakiE
  • AlgebraOfGraphics
  • Machine
    Learning
  • Scikit-Learn
  • MLJ
  • Probablistic
    Programming
  • PyMC
  • Turing
  • Code Styles

    2.1. Basics

    • prefer enumerate() over range(len())
    xs = range(3)
    
    # good
    for ind, x in enumerate(xs):
      print(f'{ind}: {x}')
    
    # bad
    for i in range(len(xs)):
      print(f'{i}: {xs[i]}')

    2.2. Matplotlib

    including seaborn

    • prefer Axes object over Figure object
    • use constrained_layout=True when draw subplots
    # good
    _, axes = plt.subplots(1, 2, constrained_layout=True)
    axes[0].plot(x1, y1)
    axes[1].hist(x2, y2)
    
    # bad
    plt.subplot(121)
    plt.plot(x1, y1)
    plt.subplot(122)
    plt.hist(x2, y2)
    • prefer axes.flatten() over plt.subplot() in cases where subplots' data is iterable
    • prefer zip() or enumerate() over range() for iterable objects
    # good
    _, ax = plt.subplots(2, 2, figsize=[12,8],constrained_layout=True)
    
    for ax, x, y in zip(axes.flatten(), xs, ys):
      ax.plot(x, y)
    
    # bad
    for i in range(4):
      ax = plt.subplot(2, 2, i+1)
      ax.plot(x[i], y[i])
    • prefer set() method over set_*() method
    # good
    ax.set(xlabel='x', ylabel='y')
    
    # bad
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    • Prefer despine() over ax.spines[*].set_visible()
    # good
    sns.despine()
    
    # bad
    ax.spines["top"].set_visible(False)
    ax.spines["bottom"].set_visible(False)
    ax.spines["right"].set_visible(False)
    ax.spines["left"].set_visible(False)

    2.3. Pandas

    • prefer df['col'] over df.col
    # good
    movies['duration']
    
    # bad
    movies.duration
    • prefer df.query over df[] or df.loc[] in simple-selection
    # good
    movies.query('duration >= 200')
    
    # bad
    movies[movies['duration'] >= 200]
    movies.loc[movies['duration'] >= 200, :]
    • prefer df.loc and df.iloc over df[] in multiple-selection
    # good
    movies.loc[movies['duration'] >= 200, 'genre']
    movies.iloc[0:2, :]
    
    # bad
    movies[movies['duration'] >= 200].genre
    movies[0:2]

    LaTeX Styles

    Multiple lines

    Reduce the use of begin{array}...end{array}

    • equations: begin{aligned}...end{aligned}
    $$
    \begin{aligned}
    y_1 = x^2 + 2*x \\
    y_2 = x^3 + x
    \end{aligned}
    $$
    • equations with conditions: begin{cases}...end{cases}
    $$
    \begin{cases}
    y = x^2 + 2*x & x > 0 \\
    y = x^3 + x & x ≤ 0
    \end{cases}
    $$
    • matrix: begin{matrix}...end{matrix}
    $$
    \begin{vmatrix}
      a + a^′ & b + b^′ \\ c & d
      \end{vmatrix}= \begin{vmatrix}
      a & b \\ c & d
      \end{vmatrix} + \begin{vmatrix}
      a^′ & b^′ \\ c & d
    \end{vmatrix}
    $$

    Brackets

    • prefer \Bigg...\Bigg over \left...\right
    $$
    A\Bigg[v_1\ v_2\ \ v_r\Bigg]
    $$
    • prefer \underset{}{} over \underset{}
    $$
    \underset{θ}{\mathrm{argmax}}\ p(x_i|θ)
    $$

    Expressions

    • prefer ^{\top} over ^T for transpose

    $$ 𝐀^⊤ $$

    $$
    𝐀^{\top}
    $$
    • prefer \to over \rightarrow for limit

    $$ \lim_{n → ∞} $$

    $$
    \lim_{n\to \infty}
    $$
    • prefer underset{}{} over \limits_

    $$ \underset{w}{\rm argmin}\ (wx +b) $$

    $$
    \underset{w}{\rm argmin}\ (wx +b)
    $$

    Fonts

    • prefer \mathrm over \mathop or \operatorname
    $$
    θ_{\mathrm{MLE}}=\underset{θ}{\mathrm{argmax}}\ ∑_{i = 1}^{N}\log p(x_i|θ)
    $$

    ISLR

    References

    style <style> table { border-collapse: collapse; text-align: center; } </style>
    Owner
    Gitony
    Gitony
    Declarative statistical visualization library for Python

    Altair http://altair-viz.github.io Altair is a declarative statistical visualization library for Python. With Altair, you can spend more time understa

    Altair 8k Jan 05, 2023
    Geospatial Data Visualization using PyGMT

    Example script to visualize topographic data, earthquake data, and tomographic data on a map

    Utpal Kumar 2 Jul 30, 2022
    An adaptable Snakemake workflow which uses GATKs best practice recommendations to perform germline mutation calling starting with BAM files

    Germline Mutation Calling This Snakemake workflow follows the GATK best-practice recommandations to call small germline variants. The pipeline require

    12 Dec 24, 2022
    Visualization ideas for data science

    Nuance I use Nuance to curate varied visualization thoughts during my data scientist career. It is not yet a package but a list of small ideas. Welcom

    Li Jiangchun 16 Nov 03, 2022
    Simple and lightweight Spotify Overlay written in Python.

    Simple Spotify Overlay This is a simple yet powerful Spotify Overlay. About I have been looking for something like this ever since I got Spotify. I th

    27 Sep 03, 2022
    Create SVG drawings from vector geodata files (SHP, geojson, etc).

    SVGIS Create SVG drawings from vector geodata files (SHP, geojson, etc). SVGIS is great for: creating small multiples, combining lots of datasets in a

    Neil Freeman 78 Dec 09, 2022
    Lumen provides a framework for visual analytics, which allows users to build data-driven dashboards from a simple yaml specification

    Lumen project provides a framework for visual analytics, which allows users to build data-driven dashboards from a simple yaml specification

    HoloViz 120 Jan 04, 2023
    A data visualization curriculum of interactive notebooks.

    A data visualization curriculum of interactive notebooks, using Vega-Lite and Altair. This repository contains a series of Python-based Jupyter notebooks.

    UW Interactive Data Lab 1.2k Dec 30, 2022
    The windML framework provides an easy-to-use access to wind data sources within the Python world, building upon numpy, scipy, sklearn, and matplotlib. Renewable Wind Energy, Forecasting, Prediction

    windml Build status : The importance of wind in smart grids with a large number of renewable energy resources is increasing. With the growing infrastr

    Computational Intelligence Group 125 Dec 24, 2022
    A declarative (epi)genomics visualization library for Python

    gos is a declarative (epi)genomics visualization library for Python. It is built on top of the Gosling JSON specification, providing a simplified interface for authoring interactive genomic visualiza

    Gosling 107 Dec 14, 2022
    Homework 2: Matplotlib and Data Visualization

    Homework 2: Matplotlib and Data Visualization Overview These data visualizations were created for my introductory computer science course using Python

    Sophia Huang 12 Oct 20, 2022
    Smoking Simulation is an app to simulate the spreading of smokers and non-smokers, their interactions and population during certain amount of time.

    Smoking Simulation is an app to simulate the spreading of smokers and non-smokers, their interactions and population during certain

    Bohdan Ruban 5 Nov 08, 2022
    :art: Diagram as Code for prototyping cloud system architectures

    Diagrams Diagram as Code. Diagrams lets you draw the cloud system architecture in Python code. It was born for prototyping a new system architecture d

    MinJae Kwon 27.5k Dec 30, 2022
    This project is created to visualize the system statistics such as memory usage, CPU usage, memory accessible by process and much more using Kibana Dashboard with Elasticsearch.

    System Stats Visualizer This project is created to visualize the system statistics such as memory usage, CPU usage, memory accessible by process and m

    Vishal Teotia 5 Feb 06, 2022
    Political elections, appointment, analysis and visualization in Python

    Political elections, appointment, analysis and visualization in Python poli-sci-kit is a Python package for political science appointment and election

    Andrew Tavis McAllister 9 Dec 01, 2022
    OpenStats is a library built on top of streamlit that extracts data from the Github API and shows the main KPIs

    Open Stats Discover and share the KPIs of your OpenSource project. OpenStats is a library built on top of streamlit that extracts data from the Github

    Pere Miquel Brull 4 Apr 03, 2022
    IPython/Jupyter notebook module for Vega and Vega-Lite

    IPython Vega IPython/Jupyter notebook module for Vega 5, and Vega-Lite 4. Notebooks with embedded visualizations can be viewed on GitHub and nbviewer.

    Vega 335 Nov 29, 2022
    Python package for the analysis and visualisation of finite-difference fields.

    discretisedfield Marijan Beg1,2, Martin Lang2, Samuel Holt3, Ryan A. Pepper4, Hans Fangohr2,5,6 1 Department of Earth Science and Engineering, Imperia

    ubermag 12 Dec 14, 2022
    Wikipedia WordCloud App generate Wikipedia word cloud art created using python's streamlit, matplotlib, wikipedia and wordcloud packages

    Wikipedia WordCloud App Wikipedia WordCloud App generate Wikipedia word cloud art created using python's streamlit, matplotlib, wikipedia and wordclou

    Siva Prakash 5 Jan 02, 2022
    This GitHub Repository contains Data Analysis projects that I have completed so far! While most of th project are focused on Data Analysis, some of them are also put here to show off other skills that I have learned.

    Welcome to my Data Analysis projects page! This GitHub Repository contains Data Analysis projects that I have completed so far! While most of th proje

    Kyle Dini 1 Jan 31, 2022