This is the code repository for Interpretable Machine Learning with Python, published by Packt.

Overview

Interpretable Machine Learning with Python

Interpretable Machine Learning with Pythone

This is the code repository for Interpretable Machine Learning with Python, published by Packt.

Learn to build interpretable high-performance models with hands-on real-world examples

What is this book about?

Do you want to understand your models and mitigate the risks associated with poor predictions using practical machine learning (ML) interpretation? Interpretable Machine Learning with Python can help you overcome these challenges, using interpretation methods to build fairer and safer ML models.

This book covers the following exciting features:

  • Recognize the importance of interpretability in business
  • Study models that are intrinsically interpretable such as linear models, decision trees, and Naïve Bayes
  • Become well-versed in interpreting models with model-agnostic methods
  • Visualize how an image classifier works and what it learns
  • Understand how to mitigate the influence of bias in datasets

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders. For example, Chapter02.

The code will look like the following:

base_classifier = KerasClassifier(model=base_model,\
                                  clip_values=(min_, max_))
y_test_mdsample_prob = np.max(y_test_prob[sampl_md_idxs],\
                                                       axis=1)
y_test_smsample_prob = np.max(y_test_prob[sampl_sm_idxs],\
                                                       axis=1)

Following is what you need for this book: This book is for data scientists, machine learning developers, and data stewards who have an increasingly critical responsibility to explain how the AI systems they develop work, their impact on decision making, and how they identify and manage bias. Working knowledge of machine learning and the Python programming language is expected.

With the following software and hardware list you can run all code files present in the book (Chapter 1-14).

Software and Hardware List

You can install the software required in any operating system by first installing Jupyter Notebook or Jupyter Lab with the most recent version of Python, or install Anaconda which can install everything at once. While hardware requirements for Jupyter are relatively modest, we recommend a machine with at least 4 cores of 2Ghz and 8Gb of RAM.

Alternatively, to installing the software locally, you can run the code in the cloud using Google Colab or another cloud notebook service.

Either way, the following packages are required to run the code in all the chapters (Google Colab has all the packages denoted with a ^):

Chapter Software required OS required
1 - 13 ^ Python 3.6+ Windows, Mac OS X, and Linux (Any)
1 - 13 ^ matplotlib 3.2.2+ Windows, Mac OS X, and Linux (Any)
1 - 13 ^ scikit-learn 0.22.2+ Windows, Mac OS X, and Linux (Any)
1 - 12 ^ pandas 1.1.5+ Windows, Mac OS X, and Linux (Any)
2 - 13 machine-learning-datasets 0.01.16+ Windows, Mac OS X, and Linux (Any)
2 - 13 ^ numpy 1.19.5+ Windows, Mac OS X, and Linux (Any)
3 - 13 ^ seaborn 0.11.1+ Windows, Mac OS X, and Linux (Any)
3 - 13 ^ tensorflow 2.4.1+ Windows, Mac OS X, and Linux (Any)
5 - 12 shap 0.38.1+ Windows, Mac OS X, and Linux (Any)
1, 5, 10, 12 ^ scipy 1.4.1+ Windows, Mac OS X, and Linux (Any)
5, 10-12 ^ xgboost 0.90+ Windows, Mac OS X, and Linux (Any)
6, 11, 12 ^ lightgbm 2.2.3+ Windows, Mac OS X, and Linux (Any)
7 - 9 alibi 0.5.5+ Windows, Mac OS X, and Linux (Any)
10 - 13 ^ tqdm 4.41.1+ Windows, Mac OS X, and Linux (Any)
2, 9 ^ statsmodels 0.10.2+ Windows, Mac OS X, and Linux (Any)
3, 5 rulefit 0.3.1+ Windows, Mac OS X, and Linux (Any)
6, 8 lime 0.2.0.1+ Windows, Mac OS X, and Linux (Any)
7, 12 catboost 0.24.4+ Windows, Mac OS X, and Linux (Any)
8, 9 ^ Keras 2.4.3+ Windows, Mac OS X, and Linux (Any)
11, 12 ^ pydot 1.3.0+ Windows, Mac OS X, and Linux (Any)
11, 12 xai 0.0.4+ Windows, Mac OS X, and Linux (Any)
1 ^ beautifulsoup4 4.6.3+ Windows, Mac OS X, and Linux (Any)
1 ^ requests 2.23.0+ Windows, Mac OS X, and Linux (Any)
3 cvae 0.0.3+ Windows, Mac OS X, and Linux (Any)
3 interpret 0.2.2+ Windows, Mac OS X, and Linux (Any)
3 ^ six 1.15.0+ Windows, Mac OS X, and Linux (Any)
3 skope-rules 1.0.1+ Windows, Mac OS X, and Linux (Any)
4 PDPbox 0.2.0+ Windows, Mac OS X, and Linux (Any)
4 pycebox 0.0.1+ Windows, Mac OS X, and Linux (Any)
5 alepython 0.1+ Windows, Mac OS X, and Linux (Any)
5 tensorflow-docs 0.0.02+ Windows, Mac OS X, and Linux (Any)
6 ^ nltk 3.2.5+ Windows, Mac OS X, and Linux (Any)
7 witwidget 1.7.0+ Windows, Mac OS X, and Linux (Any)
8 ^ opencv-python 4.1.2.30+ Windows, Mac OS X, and Linux (Any)
8 ^ scikit-image 0.16.2+ Windows, Mac OS X, and Linux (Any)
8 tf-explain 0.2.1+ Windows, Mac OS X, and Linux (Any)
8 tf-keras-vis 0.5.5+ Windows, Mac OS X, and Linux (Any)
9 SALib 1.3.12+ Windows, Mac OS X, and Linux (Any)
9 distython 0.0.3+ Windows, Mac OS X, and Linux (Any)
10 ^ mlxtend 0.14.0+ Windows, Mac OS X, and Linux (Any)
10 sklearn-genetic 0.3.0+ Windows, Mac OS X, and Linux (Any)
11 aif360==0.3.0 Windows, Mac OS X, and Linux (Any)
11 BlackBoxAuditing==0.1.54 Windows, Mac OS X, and Linux (Any)
11 dowhy 0.5.1+ Windows, Mac OS X, and Linux (Any)
11 econml 0.9.0+ Windows, Mac OS X, and Linux (Any)
11 ^ networkx 2.5+ Windows, Mac OS X, and Linux (Any)
12 bayesian-optimization 1.2.0+ Windows, Mac OS X, and Linux (Any)
12 ^ graphviz 0.10.1+ Windows, Mac OS X, and Linux (Any)
12 tensorflow-lattice 2.0.7+ Windows, Mac OS X, and Linux (Any)
13 adversarial-robustness-toolbox 1.5.0+ Windows, Mac OS X, and Linux (Any)

NOTE: the library machine-learning-datasets is the official name of what in the book is referred to as mldatasets. Due to naming conflicts, it had to be changed.

The exact versions of each library, as tested, can be found in the requirements.txt file and installed like this should you have a dedicated environment for them:

> pip install -r requirements.txt

You might get some conflicts specifically with libraries cvae, alepython, pdpbox and xai. If this is the case, try:

> pip install --no-deps -r requirements.txt

Alternatively, you can install libraries one chapter at a time inside of a local Jupyter environment using cells with !pip install or run all the code in Google Colab with the following links:

Remember to make sure you click on the menu item "File > Save a copy in Drive" as soon you open each link to ensure that your notebook is saved as you run it. Also, notebooks denoted with plus sign (+) are relatively compute-intensive, and will take an extremely long time to run on Google Colab but if you must go to "Runtime > Change runtime type" and select "High-RAM" for runtime shape. Otherwise, a better cloud enviornment or local environment is preferable.

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Summary

The book does much more than explain technical topics, but here's a summary of the chapters:

Chapters topics

Related products

Get to Know the Authors

Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. Currently, he's a Climate and Agronomic Data Scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a startup, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events. Whether it pertains to leisure activities, plant diseases, or customer lifetime value, Serg is passionate about providing the often-missing link between data and decision-making — and machine learning interpretation helps bridge this gap more robustly.

Owner
Packt
Providing books, eBooks, video tutorials, and articles for IT developers, administrators, and users.
Packt
A Python library for choreographing your machine learning research.

A Python library for choreographing your machine learning research.

AI2 270 Jan 06, 2023
A high performance and generic framework for distributed DNN training

BytePS BytePS is a high performance and general distributed training framework. It supports TensorFlow, Keras, PyTorch, and MXNet, and can run on eith

Bytedance Inc. 3.3k Dec 28, 2022
Lightning ⚡️ fast forecasting with statistical and econometric models.

Nixtla Statistical ⚡️ Forecast Lightning fast forecasting with statistical and econometric models StatsForecast offers a collection of widely used uni

Nixtla 2.1k Dec 29, 2022
A high-performance topological machine learning toolbox in Python

giotto-tda is a high-performance topological machine learning toolbox in Python built on top of scikit-learn and is distributed under the G

giotto.ai 632 Dec 29, 2022
Decision tree is the most powerful and popular tool for classification and prediction

Diabetes Prediction Using Decision Tree Introduction Decision tree is the most powerful and popular tool for classification and prediction. A Decision

Arjun U 1 Jan 23, 2022
Mesh TensorFlow: Model Parallelism Made Easier

Mesh TensorFlow - Model Parallelism Made Easier Introduction Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying

1.3k Dec 26, 2022
Simple structured learning framework for python

PyStruct PyStruct aims at being an easy-to-use structured learning and prediction library. Currently it implements only max-margin methods and a perce

pystruct 666 Jan 03, 2023
Bayesian Modeling and Computation in Python

Bayesian Modeling and Computation in Python Open access and Code This repository contains the open access version of the text and the code examples in

Bayesian Modeling and Computation in Python 339 Jan 02, 2023
🔬 A curated list of awesome machine learning strategies & tools in financial market.

🔬 A curated list of awesome machine learning strategies & tools in financial market.

GeorgeZou 1.6k Dec 30, 2022
[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark

TensorFrames (Deprecated) Note: TensorFrames is deprecated. You can use pandas UDF instead. Experimental TensorFlow binding for Scala and Apache Spark

Databricks 757 Dec 31, 2022
Can a machine learning project be implemented to estimate the salaries of baseball players whose salary information and career statistics for 1986 are shared?

END TO END MACHINE LEARNING PROJECT ON HITTERS DATASET Can a machine learning project be implemented to estimate the salaries of baseball players whos

Pinar Oner 7 Dec 18, 2021
Falken provides developers with a service that allows them to train AI that can play their games

Falken provides developers with a service that allows them to train AI that can play their games. Unlike traditional RL frameworks that learn through rewards or batches of offline training, Falken is

Google Research 223 Jan 03, 2023
Hypernets: A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.

A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.

DataCanvas 216 Dec 23, 2022
A Python toolbox to churn out organic alkalinity calculations with minimal brain engagement.

Organic Alkalinity Sausage Machine A Python toolbox to churn out organic alkalinity calculations with minimal brain engagement. Getting started To mak

Charles Turner 1 Feb 01, 2022
ml4ir: Machine Learning for Information Retrieval

ml4ir: Machine Learning for Information Retrieval | changelog Quickstart → ml4ir Read the Docs | ml4ir pypi | python ReadMe ml4ir is an open source li

Salesforce 77 Jan 06, 2023
An AutoML survey focusing on practical systems.

This project is a community effort in constructing and maintaining an up-to-date beginner-friendly introduction to AutoML, focusing on practical systems. AutoML is a big field, and continues to grow

AutoGOAL 16 Aug 14, 2022
nn-Meter is a novel and efficient system to accurately predict the inference latency of DNN models on diverse edge devices

A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Microsoft 241 Dec 26, 2022
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Dec 29, 2022
ML Kaggle Titanic Problem using LogisticRegrission

-ML-Kaggle-Titanic-Problem-using-LogisticRegrission here you will find the solution for the titanic problem on kaggle with comments and step by step c

Mahmoud Nasser Abdulhamed 3 Oct 23, 2022
A Software Framework for Neuromorphic Computing

A Software Framework for Neuromorphic Computing

Lava 338 Dec 26, 2022