Statistical Rethinking course winter 2022

Overview

Statistical Rethinking (2022 Edition)

Instructor: Richard McElreath

Lectures: Uploaded <Playlist> and pre-recorded, two per week

Discussion: Online, Fridays 3pm-4pm Central European Time

Purpose

This course teaches data analysis, but it focuses on scientific models first. The unfortunate truth about data is that nothing much can be done with it, until we say what caused it. We will prioritize conceptual, causal models and precise questions about those models. We will use Bayesian data analysis to connect scientific models to evidence. And we will learn powerful computational tools for coping with high-dimension, imperfect data of the kind that biologists and social scientists face.

Format

Online, flipped instruction. The lectures are pre-recorded. We'll meet online once a week for an hour to work through the solutions to the assigned problems.

We'll use the 2nd edition of my book, <Statistical Rethinking>. I'll provide a PDF of the book to enrolled students.

Registration: Please sign up via <[COURSE IS FULL SORRY]>. I've also set aside 100 audit tickets at the same link, for people who want to participate, but who don't need graded work and course credit.

Calendar & Topical Outline

There are 10 weeks of instruction. Links to lecture recordings will appear in this table. Weekly problem sets are assigned on Fridays and due the next Friday, when we discuss the solutions in the weekly online meeting.

Lecture playlist on Youtube: <Statistical Rethinking 2022>

Week ## Meeting date Reading Lectures
Week 01 07 January Chapters 1, 2 and 3 [1] <The Golem of Prague> <(Slides)>
[2] <Bayesian Inference> <(Slides)>
Week 02 14 January Chapters 4 and 5 [3] <Basic Regression> <(Slides)>
[4] <Categories & Curves> <(Slides)>
Week 03 21 January Chapters 5 and 6 [5] <Elemental Confounds> <(Slides)>
[6] <Good & Bad Controls> <(Slides)>
Week 04 28 January Chapters 7 and 8 [7] Overfitting
[8] Interactions
Week 05 04 February Chapters 9, 10 and 11 [9] Markov chain Monte Carlo
[10] Binomial GLMs
Week 06 11 February Chapters 11 and 12 [11] Poisson GLMs
[12] Ordered Categories
Week 07 18 February Chapter 13 [13] Multilevel Models
[14] Multi-Multilevel Models
Week 08 25 February Chapter 14 [15] Varying Slopes
[16] Gaussian Processes
Week 09 04 March Chapter 15 [17] Measurement Error
[18] Missing Data
Week 10 11 March Chapters 16 and 17 [19] Beyond GLMs: State-space Models, ODEs
[20] Horoscopes

Coding

This course involves a lot of scripting. Students can engage with the material using either the original R code examples or one of several conversions to other computing environments. The conversions are not always exact, but they are rather complete. Each option is listed below.

Original R Flavor

For those who want to use the original R code examples in the print book, you need to install the rethinking R package. The code is all on github https://github.com/rmcelreath/rethinking/ and there are additional details about the package there, including information about using the more-up-to-date cmdstanr instead of rstan as the underlying MCMC engine.

R + Tidyverse + ggplot2 + brms

The <Tidyverse/brms> conversion is very high quality and complete through Chapter 14.

Python and PyMC3

The <Python/PyMC3> conversion is quite complete.

Julia and Turing

The <Julia/Turing> conversion is not as complete, but is growing fast and presents the Rethinking examples in multiple Julia engines, including the great <TuringLang>.

Other

The are several other conversions. See the full list at https://xcelab.net/rm/statistical-rethinking/.

Homework and solutions

I will also post problem sets and solutions. Check the folders at the top of the repository.

Owner
Richard McElreath
Richard McElreath
DefAP is a program developed to facilitate the exploration of a material's defect chemistry

DefAP is a program developed to facilitate the exploration of a material's defect chemistry. A large number of features are provided and rapid exploration is supported through the use of autoplotting

6 Oct 25, 2022
apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly.

Please consider citing the manuscript if you use apricot in your academic work! You can find more thorough documentation here. apricot implements subm

Jacob Schreiber 457 Dec 20, 2022
CINECA molecular dynamics tutorial set

High Performance Molecular Dynamics Logging into CINECA's computer systems To logon to the M100 system use the following command from an SSH client ss

J. W. Dell 0 Mar 13, 2022
A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Realtime Financial Market Data Visualization and Analysis Introduction This repo shows my project about real-time stock data pipeline. All the code is

6 Sep 07, 2022
Recommendations from Cramer: On the show Mad-Money (CNBC) Jim Cramer picks stocks which he recommends to buy. We will use this data to build a portfolio

Backtesting the "Cramer Effect" & Recommendations from Cramer Recommendations from Cramer: On the show Mad-Money (CNBC) Jim Cramer picks stocks which

Gábor Vecsei 12 Aug 30, 2022
This project is the implementation template for HW 0 and HW 1 for both the programming and non-programming tracks

This project is the implementation template for HW 0 and HW 1 for both the programming and non-programming tracks

Donald F. Ferguson 4 Mar 06, 2022
MotorcycleParts DataAnalysis python

We work with the accounting department of a company that sells motorcycle parts. The company operates three warehouses in a large metropolitan area.

NASEEM A P 1 Jan 12, 2022
Single-Cell Analysis in Python. Scales to >1M cells.

Scanpy – Single-Cell Analysis in Python Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It inc

Theis Lab 1.4k Jan 05, 2023
This cosmetics generator allows you to generate the new Fortnite cosmetics, Search pak and search cosmetics!

COSMETICS GENERATOR This cosmetics generator allows you to generate the new Fortnite cosmetics, Search pak and search cosmetics! Remember to put the l

ᴅᴊʟᴏʀ3xᴢᴏ 11 Dec 13, 2022
Pypeln is a simple yet powerful Python library for creating concurrent data pipelines.

Pypeln Pypeln (pronounced as "pypeline") is a simple yet powerful Python library for creating concurrent data pipelines. Main Features Simple: Pypeln

Cristian Garcia 1.4k Dec 31, 2022
Titanic data analysis for python

Titanic-data-analysis This Repo is an analysis on Titanic_mod.csv This csv file contains some assumed data of the Titanic ship after sinking This full

Hardik Bhanot 1 Dec 26, 2021
Pyspark Spotify ETL

This is my first Data Engineering project, it extracts data from the user's recently played tracks using Spotify's API, transforms data and then loads it into Postgresql using SQLAlchemy engine. Data

16 Jun 09, 2022
Airflow ETL With EKS EFS Sagemaker

Airflow ETL With EKS EFS & Sagemaker (en desarrollo) Diagrama de la solución Imp

1 Feb 14, 2022
The Spark Challenge Student Check-In/Out Tracking Script

The Spark Challenge Student Check-In/Out Tracking Script This Python Script uses the Student ID Database to match the entries with the ID Card Swipe a

1 Dec 09, 2021
A utility for functional piping in Python that allows you to access any function in any scope as a partial.

WithPartial Introduction WithPartial is a simple utility for functional piping in Python. The package exposes a context manager (used with with) calle

Michael Milton 1 Oct 26, 2021
My solution to the book A Collection of Data Science Take-Home Challenges

DS-Take-Home Solution to the book "A Collection of Data Science Take-Home Challenges". Note: Please don't contact me for the dataset. This repository

Jifu Zhao 1.5k Jan 03, 2023
BIGDATA SIMULATION ONE PIECE WORLD CENSUS

ONE PIECE is a Japanese manga of great international success. The story turns inhabited in a fictional world, tells the adventures of a young man whose body gained rubber properties after accidentall

Maycon Cypriano 3 Jun 30, 2022
An Integrated Experimental Platform for time series data anomaly detection.

Curve Sorry to tell contributors and users. We decided to archive the project temporarily due to the employee work plan of collaborators. There are no

Baidu 486 Dec 21, 2022
Accurately separate the TLD from the registered domain and subdomains of a URL, using the Public Suffix List.

tldextract Python Module tldextract accurately separates the gTLD or ccTLD (generic or country code top-level domain) from the registered domain and s

John Kurkowski 1.6k Jan 03, 2023
Python script to automate the plotting and analysis of percentage depth dose and dose profile simulations in TOPAS.

topas-create-graphs A script to automatically plot the results of a topas simulation Works for percentage depth dose (pdd) and dose profiles (dp). Dep

Sebastian Schäfer 10 Dec 08, 2022