Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining

Last update: Dec 30, 2022

Overview

**Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining.**

Sections

Introduction to Machine Learning and Pattern Classification
Pre-Processing
Model Evaluation
Parameter Estimation
Machine Learning Algorithms
Clustering
Collecting Data
Data Visualization
Statistical Pattern Classification Examples
Books
Talks
Applications
Resources

[Download a PDF version] of this flowchart.

Introduction to Machine Learning and Pattern Classification

[back to top]

Predictive modeling, supervised machine learning, and pattern classification - the big picture [Markdown]
Entry Point: Data - Using Python's sci-packages to prepare data for Machine Learning tasks and other data analyses [IPython nb]
An Introduction to simple linear supervised classification using scikit-learn [IPython nb]

Pre-processing

[back to top]

Feature Extraction
- Tips and Tricks for Encoding Categorical Features in Classification Tasks [IPython nb]
Scaling and Normalization
- About Feature Scaling: Standardization and Min-Max-Scaling (Normalization) [IPython nb]
Feature Selection
- Sequential Feature Selection Algorithms [IPython nb]
Dimensionality Reduction
- Principal Component Analysis (PCA) [IPython nb]
- The effect of scaling and mean centering of variables prior to a PCA [PDF] [HTML]
- PCA based on the covariance vs. correlation matrix [IPython nb]
- Linear Discriminant Analysis (LDA) [IPython nb]
  - Kernel tricks and nonlinear dimensionality reduction via PCA [IPython nb]
Representing Text
- Tf-idf Walkthrough for scikit-learn [IPython nb]

Model Evaluation

[back to top]

An Overview of General Performance Metrics of Binary Classifier Systems [PDF]
Cross-validation
- Streamline your cross-validation workflow - scikit-learn's Pipeline in action [IPython nb]
Model evaluation, model selection, and algorithm selection in machine learning - Part I [Markdown]
Model evaluation, model selection, and algorithm selection in machine learning - Part II [Markdown]

Parameter Estimation

[back to top]

Parametric Techniques
- Introduction to the Maximum Likelihood Estimate (MLE) [IPython nb]
- How to calculate Maximum Likelihood Estimates (MLE) for different distributions [IPython nb]
Non-Parametric Techniques
- Kernel density estimation via the Parzen-window technique [IPython nb]
- The K-Nearest Neighbor (KNN) technique
Regression Analysis
- Linear Regression
  - Least-Squares fit [IPython nb]
- Non-Linear Regression

Machine Learning Algorithms

[back to top]

Bayes Classification

Naive Bayes and Text Classification I - Introduction and Theory [PDF]

Logistic Regression

Out-of-core Learning and Model Persistence using scikit-learn [IPython nb]

Neural Networks

Artificial Neurons and Single-Layer Neural Networks - How Machine Learning Algorithms Work Part 1 [IPython nb]
Activation Function Cheatsheet [IPython nb]

Ensemble Methods

Implementing a Weighted Majority Rule Ensemble Classifier in scikit-learn [IPython nb]

Decision Trees

Cheatsheet for Decision Tree Classification [IPython nb]

Clustering

[back to top]

Protoype-based clustering
Hierarchical clustering
- Complete-Linkage Clustering and Heatmaps in Python [IPython nb]
Density-based clustering
Graph-based clustering
Probabilistic-based clustering

Collecting Data

[back to top]

Collecting Fantasy Soccer Data with Python and Beautiful Soup [IPython nb]
Download Your Twitter Timeline and Turn into a Word Cloud Using Python [IPython nb]
Reading MNIST into NumPy arrays [IPython nb]

Data Visualization

[back to top]

Exploratory Analysis of the Star Wars API [IPython nb]

Matplotlib examples -Exploratory data analysis of the Iris dataset [IPython nb]

Artificial Intelligence publications per country

[IPython nb] [PDF]

Statistical Pattern Classification Examples

[back to top]

Supervised Learning
- Parametric Techniques
  - Univariate Normal Density
    - Ex1: 2-classes, equal variances, equal priors [IPython nb]
    - Ex2: 2-classes, different variances, equal priors [IPython nb]
    - Ex3: 2-classes, equal variances, different priors [IPython nb]
    - Ex4: 2-classes, different variances, different priors, loss function [IPython nb]
    - Ex5: 2-classes, different variances, equal priors, loss function, cauchy distr. [IPython nb]
  - Multivariate Normal Density
    - Ex5: 2-classes, different variances, equal priors, loss function [IPython nb]
    - Ex7: 2-classes, equal variances, equal priors [IPython nb]
- Non-Parametric Techniques

Books

[back to top]

Python Machine Learning

Talks

[back to top]

An Introduction to Supervised Machine Learning and Pattern Classification: The Big Picture

[View on SlideShare]

[Download PDF]

MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song Lyrics

[View on SlideShare]

[Download PDF]

Applications

[back to top]

MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song Lyrics

This project is about building a music recommendation system for users who want to listen to happy songs. Such a system can not only be used to brighten up one's mood on a rainy weekend; especially in hospitals, other medical clinics, or public locations such as restaurants, the MusicMood classifier could be used to spread positive mood among people.

[musicmood GitHub Repository]

mlxtend - A library of extension and helper modules for Python's data analysis and machine learning libraries.

[mlxtend GitHub Repository]

Resources

[back to top]

Copy-and-paste ready LaTex equations [Markdown]
Open-source datasets [Markdown]
Free Machine Learning eBooks [Markdown]
Terms in data science defined in less than 50 words [Markdown]
Useful libraries for data science in Python [Markdown]
General Tips and Advices [Markdown]
A matrix cheatsheat for Python, R, Julia, and MATLAB [HTML]

Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining

Related tags

Overview

Sections

Introduction to Machine Learning and Pattern Classification

Pre-processing

Model Evaluation

Parameter Estimation

Machine Learning Algorithms

Bayes Classification

Logistic Regression

Neural Networks

Ensemble Methods

Decision Trees

Clustering

Collecting Data

Data Visualization

Statistical Pattern Classification Examples

Books

Python Machine Learning

Talks

An Introduction to Supervised Machine Learning and Pattern Classification: The Big Picture

MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song Lyrics

Applications

MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song Lyrics

mlxtend - A library of extension and helper modules for Python's data analysis and machine learning libraries.

Resources

Owner

Sebastian Raschka

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application

SynapseML - an open source library to simplify the creation of scalable machine learning pipelines

Painless Machine Learning for python based on scikit-learn

Dive into Machine Learning

BigDL: Distributed Deep Learning Framework for Apache Spark

Nixtla is an open-source time series forecasting library.

Avocado hass time series vs predict price

Predicting diabetes over a five year period using logistic regression and the Pima First-Nation dataset

Educational python for Neural Networks, written in pure Python/NumPy.

Microsoft Machine Learning for Apache Spark

a distributed deep learning platform

Programming assignments and quizzes from all courses within the Machine Learning Engineering for Production (MLOps) specialization offered by deeplearning.ai

hgboost - Hyperoptimized Gradient Boosting

[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark

Time-series momentum for momentum investing strategy

Machine Learning Techniques using python.

Arquivos do curso online sobre a estatística voltada para ciência de dados e aprendizado de máquina.

Multiple Linear Regression using the LinearRegression class from sklearn.linear_model library

This is the code repository for LRM Stochastic watershed model.