Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

Last update: Dec 31, 2022

Related tags

Overview

weightedcalcs

weightedcalcs is a pandas-based Python library for calculating weighted means, medians, standard deviations, and more.

Features

Plays well with pandas.
Support for weighted means, medians, quantiles, standard deviations, and distributions.
Support for grouped calculations, using DataFrameGroupBy objects.
Raises an error when your data contains null-values.
Full test coverage.

Installation

pip install weightedcalcs

Usage

Getting started

Every weighted calculation in weightedcalcs begins with an instance of the weightedcalcs.Calculator class. Calculator takes one argument: the name of your weighting variable. So if you're analyzing a survey where the weighting variable is called "resp_weight", you'd do this:

import weightedcalcs as wc
calc = wc.Calculator("resp_weight")

Types of calculations

Currently, weightedcalcs.Calculator supports the following calculations:

calc.mean(my_data, value_var): The weighted arithmetic average of value_var.
calc.quantile(my_data, value_var, q): The weighted quantile of value_var, where q is between 0 and 1.
calc.median(my_data, value_var): The weighted median of value_var, equivalent to .quantile(...) where q=0.5.
calc.std(my_data, value_var): The weighted standard deviation of value_var.
calc.distribution(my_data, value_var): The weighted proportions of value_var, interpreting value_var as categories.
calc.count(my_data): The weighted count of all observations, i.e., the total weight.
calc.sum(my_data, value_var): The weighted sum of value_var.

The obj parameter above should one of the following:

A pandas DataFrame object
A pandas DataFrame.groupby object
A plain Python dictionary where the keys are column names and the values are equal-length lists.

Basic example

Below is a basic example of using weightedcalcs to find what percentage of Wyoming residents are married, divorced, et cetera:

import pandas as pd
import weightedcalcs as wc

# Load the 2015 American Community Survey person-level responses for Wyoming
responses = pd.read_csv("examples/data/acs-2015-pums-wy-simple.csv")

# `PWGTP` is the weighting variable used in the ACS's person-level data
calc = wc.Calculator("PWGTP")

# Get the distribution of marriage-status responses
calc.distribution(responses, "marriage_status").round(3).sort_values(ascending=False)

# -- Output --
# marriage_status
# Married                                0.425
# Never married or under 15 years old    0.421
# Divorced                               0.097
# Widowed                                0.046
# Separated                              0.012
# Name: PWGTP, dtype: float64

More examples

See this notebook to see examples of other calculations, including grouped calculations.

Max Ghenis has created a version of the example notebook that can be run directly in your browser, via Google Colab.

Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

Related tags

Overview

weightedcalcs

Features

Installation

Usage

Getting started

Types of calculations

Basic example

More examples

Weightedcalcs in the wild

Other Python weighted-calculation libraries

Owner

Jeremy Singer-Vine

Employee Turnover Analysis

CSV database for chihuahua (HUAHUA) blockchain transactions

pandas: powerful Python data analysis toolkit

Python dataset creator to construct datasets composed of OpenFace extracted features and Shimmer3 GSR+ Sensor datas

Get mutations in cluster by querying from LAPIS API

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Spaghetti: an open-source Python library for the analysis of network-based spatial data

In this tutorial, raster models of soil depth and soil water holding capacity for the United States will be sampled at random geographic coordinates within the state of Colorado.

This is a tool for speculation of ancestral allel, calculation of sfs and drawing its bar plot.

A data parser for the internal syncing data format used by Fog of World.

Statistical & Probabilistic Analysis of Store Sales, University Survey, & Manufacturing data

peptides.py is a pure-Python package to compute common descriptors for protein sequences

InDels analysis of CRISPR lines by NGS amplicon sequencing technology for a multicopy gene family.

A DSL for data-driven computational pipelines

International Space Station data with Python research 🌎

OpenARB is an open source program aiming to emulate a free market while encouraging players to participate in arbitrage in order to increase working capital.

Cleaning and analysing aggregated UK political polling data.

Program that predicts the NBA mvp based on data from previous years.

PCAfold is an open-source Python library for generating, analyzing and improving low-dimensional manifolds obtained via Principal Component Analysis (PCA).

Analyse the limit order book in seconds. Zoom to tick level or get yourself an overview of the trading day.