Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Last update: Nov 08, 2022

Overview

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Intro

This repo contains the python/stan version of the Statistical Rethinking course that Professor Richard McElreath taught on the Max Planck Institute for Evolutionary Anthropology in Leipzig during the Winter of 2019/2020. The original repo for the course, from which this repo is forked, can be found here. The course contains 20 lectures structured in 10 weeks with a series of assignments for each week. The course is an excellent introduction to bayesian modelling in general and to the Rethinking Statistics wonderful book written by Professor McElreath.

How to use this repo

There are ten jupyter notebooks, one for each week of the course. At the beginning of each notebook there are links to the youtube videos of the lectures, the slides used and the original homework questions and answers in R.

How I would use this repo is like this:

Go to the notebook of the week.
Watch the two videos for the lectures of that week. Their URL are at the very top of each notebook.
Read the original problems presented to the students and try to solve them on your own.
Follow the exercises solutions of the notebook with my code and explanations by Professor McElreath.

Installing `CmdStanPy`

The stan code is executed thanks to CmdStanPy. CmdStanPy is a lightweight pure-Python interface to CmdStan which provides access to the Stan compiler and all inference algorithms. It provides the function install_cmdstan() which downloads CmdStan from GitHub and builds the CmdStan utilities. It can be can be called from within Python or from the command line.

import cmdstanpy
cmdstanpy.install_cmdstan()

You can found more information about the installation process here.

Other useful resources

There are a lot of very useful resources for bayesian statistical modelling out there. Specifically centered on Professor McElreath work I would mention:

Original repo for the course.
Original rethinking package repo

Copyright

The present work is a derivative work of Statistical Rethinking: A Bayesian Course Using python and pymc3 by Gabriel Bosque Chacon and Statistical Rethinking: A Bayesian Course Using Python and NumPyro by Andrés Suárez. I made the stan code, the plotnine figures and slightly modifications to his comments.

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Related tags

Overview

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Intro

How to use this repo

Installing `CmdStanPy`

Other useful resources

Copyright

Owner

Andrés Suárez

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Functional Data Analysis, or FDA, is the field of Statistics that analyses data that depend on a continuous parameter.

Hidden Markov Models in Python, with scikit-learn like API

CINECA molecular dynamics tutorial set

OpenDrift is a software for modeling the trajectories and fate of objects or substances drifting in the ocean, or even in the atmosphere.

Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown.

An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks

Projects that implement various aspects of Data Engineering.

Basis Set Format Converter

TE-dependent analysis (tedana) is a Python library for denoising multi-echo functional magnetic resonance imaging (fMRI) data

Weather analysis with Python, SQLite, SQLAlchemy, and Flask

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

An orchestration platform for the development, production, and observation of data assets.

Python implementation of Principal Component Analysis

A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

MS in Data Science capstone project. Studying attacks on autonomous vehicles.

Single machine, multiple cards training; mix-precision training; DALI data loader.

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Related tags

Overview

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Intro

How to use this repo

Installing CmdStanPy

Other useful resources

Copyright

Owner

Andrés Suárez

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Functional Data Analysis, or FDA, is the field of Statistics that analyses data that depend on a continuous parameter.

Hidden Markov Models in Python, with scikit-learn like API

CINECA molecular dynamics tutorial set

OpenDrift is a software for modeling the trajectories and fate of objects or substances drifting in the ocean, or even in the atmosphere.

Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown.

An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks

Projects that implement various aspects of Data Engineering.

Basis Set Format Converter

TE-dependent analysis (tedana) is a Python library for denoising multi-echo functional magnetic resonance imaging (fMRI) data

Weather analysis with Python, SQLite, SQLAlchemy, and Flask

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

An orchestration platform for the development, production, and observation of data assets.

Python implementation of Principal Component Analysis

A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

MS in Data Science capstone project. Studying attacks on autonomous vehicles.

Single machine, multiple cards training; mix-precision training; DALI data loader.

Installing `CmdStanPy`