Tools for calculating and visualizing Elo-like ratings of MLB teams using Retosheet data

Overview

Overview

This project uses historical baseball games data to calculate an Elo-like rating for MLB teams based on regular season match ups. The Elo rating system was originally developed for ranking chess players but also can be applied to other individual and team sports. In particular it works well for measuring performance of baseball teams because of the large number of games played per season.

Wikipedia has more technical details on the rating system: https://en.m.wikipedia.org/wiki/Elo_rating_system

A similar analysis was done by 538: https://projects.fivethirtyeight.com/complete-history-of-mlb/

This repo consists of three primary pieces of code:

  • Parser.py imports and cleans the game-level data. It performs basic calculations, such as determining game winners.
  • Elo.py calculates Elo ratings for a given set of games data
  • Visualization.R plots the output in a variety of interesting ways

Data

Source: https://www.retrosheet.org/gamelogs/index.html

This data was compiled by Retosheet.org. The only stipulation for use of this data is prominent display of this statement:

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".

The formats of the data can be found in the Formats.py module. The Parser.py script imports the annual files, applies some cleaning and formatting, and stores the consolidated data in a SQLite database for later use.

Examples

This graph shows the distribution of each team's Elo rating over the course of the decade 2010-2019. The ratings are weighted by in-season days.

Distribution of Elo Ratings

This graph shows the progress of the 5 teams in the AL West over the 5-year period starting in 2014.

American League West Ratings 2014 - 2019

Each team was also ranked by their Elo relative to the rest of the league on each day. The graph below shows the time spent at each rank for the 5 teams in the AL West. The gradient indicates the years that those ranks occured.

American League West Ranks 2010 - 2019

Future improvements

  • Finish tuning parameters for best model performance
  • Expand analysis to earlier time periods (requires handling of teams entering/leaving league)
Owner
Lukas Owens
Lukas Owens
2021 grafana arbitrary file read

2021_grafana_arbitrary_file_read base on pocsuite3 try 40 default plugins of grafana alertlist annolist barchart cloudwatch dashlist elasticsearch gra

ATpiu 5 Nov 09, 2022
A small script written in Python3 that generates a visual representation of the Mandelbrot set.

Mandelbrot Set Generator A small script written in Python3 that generates a visual representation of the Mandelbrot set. Abstract The colors in the ou

1 Dec 28, 2021
Define fortify and autoplot functions to allow ggplot2 to handle some popular R packages.

ggfortify This package offers fortify and autoplot functions to allow automatic ggplot2 to visualize statistical result of popular R packages. Check o

Sinhrks 504 Dec 23, 2022
This is a web application to visualize various famous technical indicators and stocks tickers from user

Visualizing Technical Indicators Using Python and Plotly. Currently facing issues hosting the application on heroku. As soon as I am able to I'll like

4 Aug 04, 2022
Eulera Dashboard is an easy and intuitive way to get a quick feel of what’s happening on the world’s market.

an easy and intuitive way to get a quick feel of what’s happening on the world’s market ! Eulera dashboard is a tool allows you to monitor historical

Salah Eddine LABIAD 4 Nov 25, 2022
Simple Python interface for Graphviz

Simple Python interface for Graphviz

Sebastian Bank 1.3k Dec 26, 2022
This is a Web scraping project using BeautifulSoup and Python to scrape basic information of all the Test matches played till Jan 2022.

Scraping-test-matches-data This is a Web scraping project using BeautifulSoup and Python to scrape basic information of all the Test matches played ti

Souradeep Banerjee 4 Oct 10, 2022
Learn Basic to advanced level Data visualisation techniques from this Repository

Data visualisation Hey, You can learn Basic to advanced level Data visualisation techniques from this Repository. Data visualization is the graphic re

Shashank dwivedi 16 Jan 03, 2023
a python function to plot a geopandas dataframe

Pretty GeoDataFrame A minimum python function (~60 lines) to draw pretty geodataframe. Based on matplotlib, shapely, descartes. Installation just use

haoming 27 Dec 05, 2022
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

Blaze 3.1k Jan 01, 2023
Graphing communities on Twitch.tv in a visually intuitive way

VisualizingTwitchCommunities This project maps communities of streamers on Twitch.tv based on shared viewership. The data is collected from the Twitch

Kiran Gershenfeld 312 Jan 07, 2023
A comprehensive tutorial for plotting focal mechanism

Focal_Mechanisms_Demo A comprehensive tutorial for plotting focal mechanism "beach-balls" using the PyGMT package for Python. (Resulting map of this d

3 Dec 13, 2022
The Python ensemble sampling toolkit for affine-invariant MCMC

emcee The Python ensemble sampling toolkit for affine-invariant MCMC emcee is a stable, well tested Python implementation of the affine-invariant ense

Dan Foreman-Mackey 1.3k Jan 04, 2023
Comparing USD and GBP Exchange Rates

Currency Data Visualization Comparing USD and GBP Exchange Rates This is a bar graph comparing GBP and USD exchange rates. I chose blue for the UK bec

5 Oct 28, 2021
Visualize large time-series data in plotly

plotly_resampler enables visualizing large sequential data by adding resampling functionality to Plotly figures. In this Plotly-Resampler demo over 11

PreDiCT.IDLab 604 Dec 28, 2022
A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews

hvPlot A high-level plotting API for the PyData ecosystem built on HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it?

HoloViz 697 Jan 06, 2023
GDSHelpers is an open-source package for automatized pattern generation for nano-structuring.

GDSHelpers GDSHelpers in an open-source package for automatized pattern generation for nano-structuring. It allows exporting the pattern in the GDSII-

Helge Gehring 76 Dec 16, 2022
Visualize your pandas data with one-line code

PandasEcharts 简介 基于pandas和pyecharts的可视化工具 安装 pip 安装 $ pip install pandasecharts 源码安装 $ git clone https://github.com/gamersover/pandasecharts $ cd pand

陈华杰 2 Apr 13, 2022
Practical-statistics-for-data-scientists - Code repository for O'Reilly book

Code repository Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python by Peter Bruce, Andrew Bruce, and Peter Gedeck Pub

1.7k Jan 04, 2023
Tools for writing, submitting, debugging, and monitoring Storm topologies in pure Python

Petrel Tools for writing, submitting, debugging, and monitoring Storm topologies in pure Python. NOTE: The base Storm package provides storm.py, which

AirSage 247 Dec 18, 2021