Python for Data Analysis, 2nd Edition

Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media

Follow Wes on Twitter:

1st Edition Readers

If you are reading the 1st Edition (published in 2012), please find the reorganized book materials on the 1st-edition branch.

Translations

Chinese by Xu Liang
Polish by Michal Biesiada

IPython Notebooks:

Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
Chapter 3: Built-in Data Structures, Functions, and Files
Chapter 4: NumPy Basics: Arrays and Vectorized Computation
Chapter 5: Getting Started with pandas
Chapter 6: Data Loading, Storage, and File Formats
Chapter 7: Data Cleaning and Preparation
Chapter 8: Data Wrangling: Join, Combine, and Reshape
Chapter 9: Plotting and Visualization
Chapter 10: Data Aggregation and Group Operations
Chapter 11: Time Series
Chapter 12: Advanced pandas
Chapter 13: Introduction to Modeling Libraries in Python
Chapter 14: Data Analysis Examples
Appendix A: Advanced NumPy

License

Code

The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.

Python for Data Analysis, 2nd Edition

Related tags

Overview

Python for Data Analysis, 2nd Edition

1st Edition Readers

Translations

IPython Notebooks:

License

Code

Owner

Wes McKinney

Toolchest provides APIs for scientific and bioinformatic data analysis.

A utility for functional piping in Python that allows you to access any function in any scope as a partial.

University Challenge 2021 With Python

Basis Set Format Converter

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Improving your data science workflows with

Pipetools enables function composition similar to using Unix pipes.

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Nobel Data Analysis

:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Conduits - A Declarative Pipelining Tool For Pandas

pipeline for migrating lichess data into postgresql

In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

High Dimensional Portfolio Selection with Cardinality Constraints

A pipeline that creates consensus sequences from a Nanopore reads. I

My solution to the book A Collection of Data Science Take-Home Challenges

4CAT: Capture and Analysis Toolkit

Python implementation of Principal Component Analysis

Performance analysis of predictive (alpha) stock factors

📊 Python Flask game that consolidates data from Nasdaq, allowing the user to practice buying and selling stocks.