Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining

Overview

logo


**Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining.**



Sections



[Download a PDF version] of this flowchart.






Introduction to Machine Learning and Pattern Classification

[back to top]

  • Predictive modeling, supervised machine learning, and pattern classification - the big picture [Markdown]

  • Entry Point: Data - Using Python's sci-packages to prepare data for Machine Learning tasks and other data analyses [IPython nb]

  • An Introduction to simple linear supervised classification using scikit-learn [IPython nb]






Pre-processing

[back to top]

  • Feature Extraction

    • Tips and Tricks for Encoding Categorical Features in Classification Tasks [IPython nb]
  • Scaling and Normalization

    • About Feature Scaling: Standardization and Min-Max-Scaling (Normalization) [IPython nb]
  • Feature Selection

    • Sequential Feature Selection Algorithms [IPython nb]
  • Dimensionality Reduction

    • Principal Component Analysis (PCA) [IPython nb]
    • The effect of scaling and mean centering of variables prior to a PCA [PDF] [HTML]
    • PCA based on the covariance vs. correlation matrix [IPython nb]
    • Linear Discriminant Analysis (LDA) [IPython nb]
      • Kernel tricks and nonlinear dimensionality reduction via PCA [IPython nb]
  • Representing Text

    • Tf-idf Walkthrough for scikit-learn [IPython nb]



Model Evaluation

[back to top]

  • An Overview of General Performance Metrics of Binary Classifier Systems [PDF]
  • Cross-validation
    • Streamline your cross-validation workflow - scikit-learn's Pipeline in action [IPython nb]
  • Model evaluation, model selection, and algorithm selection in machine learning - Part I [Markdown]
  • Model evaluation, model selection, and algorithm selection in machine learning - Part II [Markdown]



Parameter Estimation

[back to top]

  • Parametric Techniques

    • Introduction to the Maximum Likelihood Estimate (MLE) [IPython nb]
    • How to calculate Maximum Likelihood Estimates (MLE) for different distributions [IPython nb]
  • Non-Parametric Techniques

    • Kernel density estimation via the Parzen-window technique [IPython nb]
    • The K-Nearest Neighbor (KNN) technique
  • Regression Analysis

    • Linear Regression

    • Non-Linear Regression




Machine Learning Algorithms

[back to top]

Bayes Classification

  • Naive Bayes and Text Classification I - Introduction and Theory [PDF]

Logistic Regression

  • Out-of-core Learning and Model Persistence using scikit-learn [IPython nb]

Neural Networks

  • Artificial Neurons and Single-Layer Neural Networks - How Machine Learning Algorithms Work Part 1 [IPython nb]

  • Activation Function Cheatsheet [IPython nb]

Ensemble Methods

  • Implementing a Weighted Majority Rule Ensemble Classifier in scikit-learn [IPython nb]

Decision Trees

  • Cheatsheet for Decision Tree Classification [IPython nb]



Clustering

[back to top]

  • Protoype-based clustering
  • Hierarchical clustering
    • Complete-Linkage Clustering and Heatmaps in Python [IPython nb]
  • Density-based clustering
  • Graph-based clustering
  • Probabilistic-based clustering



Collecting Data

[back to top]

  • Collecting Fantasy Soccer Data with Python and Beautiful Soup [IPython nb]

  • Download Your Twitter Timeline and Turn into a Word Cloud Using Python [IPython nb]

  • Reading MNIST into NumPy arrays [IPython nb]




Data Visualization

[back to top]

  • Exploratory Analysis of the Star Wars API [IPython nb]

  • Matplotlib examples -Exploratory data analysis of the Iris dataset [IPython nb]

  • Artificial Intelligence publications per country

[IPython nb] [PDF]




Statistical Pattern Classification Examples

[back to top]

  • Supervised Learning

    • Parametric Techniques

      • Univariate Normal Density

        • Ex1: 2-classes, equal variances, equal priors [IPython nb]
        • Ex2: 2-classes, different variances, equal priors [IPython nb]
        • Ex3: 2-classes, equal variances, different priors [IPython nb]
        • Ex4: 2-classes, different variances, different priors, loss function [IPython nb]
        • Ex5: 2-classes, different variances, equal priors, loss function, cauchy distr. [IPython nb]
      • Multivariate Normal Density

        • Ex5: 2-classes, different variances, equal priors, loss function [IPython nb]
        • Ex7: 2-classes, equal variances, equal priors [IPython nb]
    • Non-Parametric Techniques




Books

[back to top]

Python Machine Learning




Talks

[back to top]

An Introduction to Supervised Machine Learning and Pattern Classification: The Big Picture

[View on SlideShare]

[Download PDF]



MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song Lyrics

[View on SlideShare]

[Download PDF]




Applications

[back to top]

MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song Lyrics

This project is about building a music recommendation system for users who want to listen to happy songs. Such a system can not only be used to brighten up one's mood on a rainy weekend; especially in hospitals, other medical clinics, or public locations such as restaurants, the MusicMood classifier could be used to spread positive mood among people.

[musicmood GitHub Repository]


mlxtend - A library of extension and helper modules for Python's data analysis and machine learning libraries.

[mlxtend GitHub Repository]




Resources

[back to top]

  • Copy-and-paste ready LaTex equations [Markdown]

  • Open-source datasets [Markdown]

  • Free Machine Learning eBooks [Markdown]

  • Terms in data science defined in less than 50 words [Markdown]

  • Useful libraries for data science in Python [Markdown]

  • General Tips and Advices [Markdown]

  • A matrix cheatsheat for Python, R, Julia, and MATLAB [HTML]

Owner
Sebastian Raschka
Machine Learning researcher & passionate open source contributor. Author of the "Python Machine Learning" book.
Sebastian Raschka
This is the material used in my free Persian course: Machine Learning with Python

This is the material used in my free Persian course: Machine Learning with Python

Yara Mohamadi 4 Aug 07, 2022
GroundSeg Clustering Optimized Kdtree

ground seg and clustering based on kitti velodyne data, and a additional optimized kdtree for knn and radius nn search

2 Dec 02, 2021
SIMD-accelerated bitwise hamming distance Python module for hexidecimal strings

hexhamming What does it do? This module performs a fast bitwise hamming distance of two hexadecimal strings. This looks like: DEADBEEF = 1101111010101

Michael Recachinas 12 Oct 14, 2022
Python Automated Machine Learning library for tabular data.

Simple but powerful Automated Machine Learning library for tabular data. It uses efficient in-memory SAP HANA algorithms to automate routine Data Scie

Daniel Khromov 47 Dec 17, 2022
Microsoft 5.6k Jan 07, 2023
100 Days of Machine and Deep Learning Code

💯 Days of Machine Learning and Deep Learning Code MACHINE LEARNING TOPICS COVERED - FROM SCRATCH Linear Regression Logistic Regression K Means Cluste

Tanishq Gautam 66 Nov 02, 2022
Dieses Projekt ermöglicht es den Smartmeter der EVN (Netz Niederösterreich) über die Kundenschnittstelle auszulesen.

SmartMeterEVN Dieses Projekt ermöglicht es den Smartmeter der EVN (Netz Niederösterreich) über die Kundenschnittstelle auszulesen. Smart Meter werden

greenMike 43 Dec 04, 2022
WAGMA-SGD is a decentralized asynchronous SGD for distributed deep learning training based on model averaging.

WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can b

Shigang Li 6 Jun 18, 2022
Binary Classification Problem with Machine Learning

Binary Classification Problem with Machine Learning Solving Approach: 1) Ultimate Goal of the Assignment: This assignment is about solving a binary cl

Dinesh Mali 0 Jan 20, 2022
nn-Meter is a novel and efficient system to accurately predict the inference latency of DNN models on diverse edge devices

A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Microsoft 241 Dec 26, 2022
NumPy-based implementation of a multilayer perceptron (MLP)

My own NumPy-based implementation of a multilayer perceptron (MLP). Several of its components can be tuned and played with, such as layer depth and size, hidden and output layer activation functions,

1 Feb 10, 2022
Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 24-lesson curriculum all about Machine Learning

Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 24-lesson curriculum all about Machine Learning

Microsoft 43.4k Jan 04, 2023
Course files for "Ocean/Atmosphere Time Series Analysis"

time-series This package contains all necessary files for the course Ocean/Atmosphere Time Series Analysis, an introduction to data and time series an

Jonathan Lilly 107 Nov 29, 2022
Greykite: A flexible, intuitive and fast forecasting library

The Greykite library provides flexible, intuitive and fast forecasts through its flagship algorithm, Silverkite.

LinkedIn 1.4k Jan 15, 2022
Databricks Certified Associate Spark Developer preparation toolkit to setup single node Standalone Spark Cluster along with material in the form of Jupyter Notebooks.

Databricks Certification Spark Databricks Certified Associate Spark Developer preparation toolkit to setup single node Standalone Spark Cluster along

19 Dec 13, 2022
cuML - RAPIDS Machine Learning Library

cuML - GPU Machine Learning Algorithms cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions t

RAPIDS 3.1k Dec 28, 2022
Responsible AI Workshop: a series of tutorials & walkthroughs to illustrate how put responsible AI into practice

Responsible AI Workshop Responsible innovation is top of mind. As such, the tech industry as well as a growing number of organizations of all kinds in

Microsoft 9 Sep 14, 2022
scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly.

scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly. Its main purpose is the transformation of bilinear forms into sparse matrices and linear forms into vectors.

Tom Gustafsson 297 Dec 13, 2022
#30DaysOfStreamlit is a 30-day social challenge for you to build and deploy Streamlit apps.

30 Days Of Streamlit 🎈 This is the official repo of #30DaysOfStreamlit — a 30-day social challenge for you to learn, build and deploy Streamlit apps.

Streamlit 53 Jan 02, 2023
TIANCHI Purchase Redemption Forecast Challenge

TIANCHI Purchase Redemption Forecast Challenge

Haorui HE 4 Aug 26, 2022