Jupyter notebook and datasets from the pandas Q&A video series

Overview

Python pandas Q&A video series

Read about the series, and view all of the videos on one page: Easier data analysis in Python with pandas.

Jupyter Notebooks

Videos (playlist)

  1. What is pandas? (Introduction to the Q&A series) (6:24)
  2. How do I read a tabular data file into pandas? (8:54)
  3. How do I select a pandas Series from a DataFrame? (11:10)
  4. Why do some pandas commands end with parentheses (and others don't)? (8:45)
  5. How do I rename columns in a pandas DataFrame? (9:36)
  6. How do I remove columns from a pandas DataFrame? (6:35)
  7. How do I sort a pandas DataFrame or a Series? (8:56)
  8. How do I filter rows of a pandas DataFrame by column value? (13:44)
  9. How do I apply multiple filter criteria to a pandas DataFrame? (9:51)
  10. Your pandas questions answered! (9:06)
  11. How do I use the "axis" parameter in pandas? (8:33)
  12. How do I use string methods in pandas? (6:16)
  13. How do I change the data type of a pandas Series? (7:28)
  14. When should I use a "groupby" in pandas? (8:24)
  15. How do I explore a pandas Series? (9:50)
  16. How do I handle missing values in pandas? (14:27)
  17. What do I need to know about the pandas index? (Part 1) (13:36)
  18. What do I need to know about the pandas index? (Part 2) (10:38)
  19. How do I select multiple rows and columns from a pandas DataFrame? (21:46)
  20. When should I use the "inplace" parameter in pandas? (10:18)
  21. How do I make my pandas DataFrame smaller and faster? (19:05)
  22. How do I use pandas with scikit-learn to create Kaggle submissions? (13:25)
  23. More of your pandas questions answered! (19:23)
  24. How do I create dummy variables in pandas? (13:13)
  25. How do I work with dates and times in pandas? (10:20)
  26. How do I find and remove duplicate rows in pandas? (9:47)
  27. How do I avoid a SettingWithCopyWarning in pandas? (13:29)
  28. How do I change display options in pandas? (14:55)
  29. How do I create a pandas DataFrame from another object? (14:25)
  30. How do I apply a function to a pandas Series or DataFrame? (17:57)
  31. Bonus: How do I use the MultiIndex in pandas? (25:00)
  32. Bonus: How do I merge DataFrames in pandas? (21:48)
  33. Bonus: 4 new time-saving tricks in pandas (14:50)
  34. Bonus: 5 new changes in pandas you need to know about (20:54)
  35. Bonus: My top 25 pandas tricks (27:37)
  36. Bonus: Data Science Best Practices with pandas (PyCon 2019) (1:44:16)
  37. Bonus: Your pandas questions answered! (webcast) (1:56:01)

Datasets

Filename Description Raw File Original Source Other
chipotle.tsv Online orders from the Chipotle restaurant chain bit.ly/chiporders The Upshot Upshot article
drinks.csv Alcohol consumption by country bit.ly/drinksbycountry FiveThirtyEight FiveThirtyEight article
imdb_1000.csv Top rated movies from IMDb bit.ly/imdbratings IMDb Web scraping script
stocks.csv Small dataset of stock prices bit.ly/smallstocks DataCamp
titanic_test.csv Testing set from Kaggle's Titanic competition bit.ly/kaggletest Kaggle Data dictionary
titanic_train.csv Training set from Kaggle's Titanic competition bit.ly/kaggletrain Kaggle Data dictionary
u.data Movie ratings by MovieLens users bit.ly/movielensdata GroupLens Data dictionary
u.item Movie information from MovieLens bit.ly/movieitems GroupLens Data dictionary
u.user Demographic information about MovieLens users bit.ly/movieusers GroupLens Data dictionary
ufo.csv Reports of UFO sightings from 1930-2000 bit.ly/uforeports National UFO Reporting Center Web scraping script
DrawBot lets you draw images taken from the internet on Skribbl.io, Gartic Phone and Paint

DrawBot You don't speak french? No worries, english translation is over here. C'est quoi ? DrawBot est un logiciel codé par V2F qui va prendre possess

V2F 205 Jan 01, 2023
LabGraph is a a Python-first framework used to build sophisticated research systems with real-time streaming, graph API, and parallelism.

LabGraph is a a Python-first framework used to build sophisticated research systems with real-time streaming, graph API, and parallelism.

MLH Fellowship 7 Oct 05, 2022
Yata is a fast, simple and easy Data Visulaization tool, running on python dash

Yata is a fast, simple and easy Data Visulaization tool, running on python dash. The main goal of Yata is to provide a easy way for persons with little programming knowledge to visualize their data e

Cybercreek 3 Jun 28, 2021
Process dataframe in a easily way.

Popanda Written by Shengxuan Wang at OSU. Used for processing dataframe, especially for machine learning. The name is from "Po" in the movie Kung Fu P

ShawnWang 1 Dec 24, 2021
A simple, fast, extensible python library for data validation.

Validr A simple, fast, extensible python library for data validation. Simple and readable schema 10X faster than jsonschema, 40X faster than schematic

kk 209 Sep 19, 2022
Interactive Data Visualization in the browser, from Python

Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords hi

Bokeh 17.1k Dec 31, 2022
Data science project for exploratory analysis on the kcse grades dataset (Kamilimu Data Science Track)

Kcse-Data-Analysis Data science project for exploratory analysis on the kcse grades dataset (Kamilimu Data Science Track) Findings The performance of

MUGO BRIAN 1 Feb 23, 2022
Streamlit component for Let's-Plot visualization library

streamlit-letsplot This is a work-in-progress, providing a convenience function to plot charts from the Lets-Plot visualization library. Example usage

Randy Zwitch 9 Nov 03, 2022
Visualization ideas for data science

Nuance I use Nuance to curate varied visualization thoughts during my data scientist career. It is not yet a package but a list of small ideas. Welcom

Li Jiangchun 16 Nov 03, 2022
Flipper Zero documentation repo

Flipper Zero Docs Participation To fix a bug or add something new to this repository, you need to open a pull-request. Also, on every page of the site

Flipper Zero (All Repositories will be public soon) 114 Dec 30, 2022
Datapane is the easiest way to create data science reports from Python.

Datapane Teams | Documentation | API Docs | Changelog | Twitter | Blog Share interactive plots and data in 3 lines of Python. Datapane is a Python lib

Datapane 744 Jan 06, 2023
Sky attention heatmap of submissions to astrometry.net

astroheat Installation Requires Python 3.6+, Tested with Python 3.9.5 Install library dependencies pip install -r requirements.txt The program require

4 Jun 20, 2022
SummVis is an interactive visualization tool for text summarization.

SummVis is an interactive visualization tool for analyzing abstractive summarization model outputs and datasets.

Robustness Gym 246 Dec 08, 2022
Quickly and accurately render even the largest data.

Turn even the largest data into images, accurately Build Status Coverage Latest dev release Latest release Docs Support What is it? Datashader is a da

HoloViz 2.9k Dec 28, 2022
eoplatform is a Python package that aims to simplify Remote Sensing Earth Observation by providing actionable information on a wide swath of RS platforms and provide a simple API for downloading and visualizing RS imagery

An Earth Observation Platform Earth Observation made easy. Report Bug | Request Feature About eoplatform is a Python package that aims to simplify Rem

Matthew Tralka 4 Aug 11, 2022
A gui application to visualize various sorting algorithms using pure python.

Sorting Algorithm Visualizer A gui application to visualize various sorting algorithms using pure python. Language : Python 3 Libraries required Tkint

Rajarshi Banerjee 19 Nov 30, 2022
Draw interactive NetworkX graphs with Altair

nx_altair Draw NetworkX graphs with Altair nx_altair offers a similar draw API to NetworkX but returns Altair Charts instead. If you'd like to contrib

Zachary Sailer 206 Dec 12, 2022
This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, reading data from PubSub.

Sample streaming Dataflow pipeline written in Python This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, readin

Israel Herraiz 9 Mar 18, 2022
Lightweight data validation and adaptation Python library.

Valideer Lightweight data validation and adaptation library for Python. At a Glance: Supports both validation (check if a value is valid) and adaptati

Podio 258 Nov 22, 2022
An XLSX spreadsheet renderer for Django REST Framework.

drf-renderer-xlsx provides an XLSX renderer for Django REST Framework. It uses OpenPyXL to create the spreadsheet and returns the data.

The Wharton School 166 Dec 01, 2022