Machine learning, in numpy

Overview

numpy-ml

Ever wish you had an inefficient but somewhat legible collection of machine learning algorithms implemented exclusively in NumPy? No?

Installation

For rapid experimentation

To use this code as a starting point for ML prototyping / experimentation, just clone the repository, create a new virtualenv, and start hacking:

$ git clone https://github.com/ddbourgin/numpy-ml.git
$ cd numpy-ml && virtualenv npml && source npml/bin/activate
$ pip3 install -r requirements-dev.txt

As a package

If you don't plan to modify the source, you can also install numpy-ml as a Python package: pip3 install -u numpy_ml.

The reinforcement learning agents train on environments defined in the OpenAI gym. To install these alongside numpy-ml, you can use pip3 install -u 'numpy_ml[rl]'.

Documentation

For more details on the available models, see the project documentation.

Available models

  1. Gaussian mixture model

    • EM training
  2. Hidden Markov model

    • Viterbi decoding
    • Likelihood computation
    • MLE parameter estimation via Baum-Welch/forward-backward algorithm
  3. Latent Dirichlet allocation (topic model)

    • Standard model with MLE parameter estimation via variational EM
    • Smoothed model with MAP parameter estimation via MCMC
  4. Neural networks

    • Layers / Layer-wise ops
      • Add
      • Flatten
      • Multiply
      • Softmax
      • Fully-connected/Dense
      • Sparse evolutionary connections
      • LSTM
      • Elman-style RNN
      • Max + average pooling
      • Dot-product attention
      • Embedding layer
      • Restricted Boltzmann machine (w. CD-n training)
      • 2D deconvolution (w. padding and stride)
      • 2D convolution (w. padding, dilation, and stride)
      • 1D convolution (w. padding, dilation, stride, and causality)
    • Modules
      • Bidirectional LSTM
      • ResNet-style residual blocks (identity and convolution)
      • WaveNet-style residual blocks with dilated causal convolutions
      • Transformer-style multi-headed scaled dot product attention
    • Regularizers
      • Dropout
    • Normalization
      • Batch normalization (spatial and temporal)
      • Layer normalization (spatial and temporal)
    • Optimizers
      • SGD w/ momentum
      • AdaGrad
      • RMSProp
      • Adam
    • Learning Rate Schedulers
      • Constant
      • Exponential
      • Noam/Transformer
      • Dlib scheduler
    • Weight Initializers
      • Glorot/Xavier uniform and normal
      • He/Kaiming uniform and normal
      • Standard and truncated normal
    • Losses
      • Cross entropy
      • Squared error
      • Bernoulli VAE loss
      • Wasserstein loss with gradient penalty
      • Noise contrastive estimation loss
    • Activations
      • ReLU
      • Tanh
      • Affine
      • Sigmoid
      • Leaky ReLU
      • ELU
      • SELU
      • Exponential
      • Hard Sigmoid
      • Softplus
    • Models
      • Bernoulli variational autoencoder
      • Wasserstein GAN with gradient penalty
      • word2vec encoder with skip-gram and CBOW architectures
    • Utilities
      • col2im (MATLAB port)
      • im2col (MATLAB port)
      • conv1D
      • conv2D
      • deconv2D
      • minibatch
  5. Tree-based models

    • Decision trees (CART)
    • [Bagging] Random forests
    • [Boosting] Gradient-boosted decision trees
  6. Linear models

    • Ridge regression
    • Logistic regression
    • Ordinary least squares
    • Bayesian linear regression w/ conjugate priors
      • Unknown mean, known variance (Gaussian prior)
      • Unknown mean, unknown variance (Normal-Gamma / Normal-Inverse-Wishart prior)
  7. n-Gram sequence models

    • Maximum likelihood scores
    • Additive/Lidstone smoothing
    • Simple Good-Turing smoothing
  8. Multi-armed bandit models

    • UCB1
    • LinUCB
    • Epsilon-greedy
    • Thompson sampling w/ conjugate priors
      • Beta-Bernoulli sampler
    • LinUCB
  9. Reinforcement learning models

    • Cross-entropy method agent
    • First visit on-policy Monte Carlo agent
    • Weighted incremental importance sampling Monte Carlo agent
    • Expected SARSA agent
    • TD-0 Q-learning agent
    • Dyna-Q / Dyna-Q+ with prioritized sweeping
  10. Nonparameteric models

    • Nadaraya-Watson kernel regression
    • k-Nearest neighbors classification and regression
    • Gaussian process regression
  11. Matrix factorization

    • Regularized alternating least-squares
    • Non-negative matrix factorization
  12. Preprocessing

    • Discrete Fourier transform (1D signals)
    • Discrete cosine transform (type-II) (1D signals)
    • Bilinear interpolation (2D signals)
    • Nearest neighbor interpolation (1D and 2D signals)
    • Autocorrelation (1D signals)
    • Signal windowing
    • Text tokenization
    • Feature hashing
    • Feature standardization
    • One-hot encoding / decoding
    • Huffman coding / decoding
    • Term frequency-inverse document frequency (TF-IDF) encoding
    • MFCC encoding
  13. Utilities

    • Similarity kernels
    • Distance metrics
    • Priority queue
    • Ball tree
    • Discrete sampler
    • Graph processing and generators

Contributing

Am I missing your favorite model? Is there something that could be cleaner / less confusing? Did I mess something up? Submit a PR! The only requirement is that your models are written with just the Python standard library and NumPy. The SciPy library is also permitted under special circumstances ;)

See full contributing guidelines here.

Official respository for "Modeling Defocus-Disparity in Dual-Pixel Sensors", ICCP 2020

Official respository for "Modeling Defocus-Disparity in Dual-Pixel Sensors", ICCP 2020 BibTeX @INPROCEEDINGS{punnappurath2020modeling, author={Abhi

Abhijith Punnappurath 22 Oct 01, 2022
The FIRST GANs-based omics-to-omics translation framework

OmiTrans Please also have a look at our multi-omics multi-task DL freamwork 👀 : OmiEmbed The FIRST GANs-based omics-to-omics translation framework Xi

Xiaoyu Zhang 6 Dec 14, 2022
Graph InfoClust: Leveraging cluster-level node information for unsupervised graph representation learning

Graph-InfoClust-GIC [PAKDD 2021] PAKDD'21 version Graph InfoClust: Maximizing Coarse-Grain Mutual Information in Graphs Preprint version Graph InfoClu

Costas Mavromatis 21 Dec 03, 2022
BMN: Boundary-Matching Network

BMN: Boundary-Matching Network A pytorch-version implementation codes of paper: "BMN: Boundary-Matching Network for Temporal Action Proposal Generatio

qinxin 260 Dec 06, 2022
My published benchmark for a Kaggle Simulations Competition

Lux AI Working Title Bot Please refer to the Kaggle notebook for the comment section. The comment section contains my explanation on my code structure

Tong Hui Kang 29 Aug 22, 2022
Impelmentation for paper Feature Generation and Hypothesis Verification for Reliable Face Anti-Spoofing

FGHV Impelmentation for paper Feature Generation and Hypothesis Verification for Reliable Face Anti-Spoofing Requirements Python 3.6 Pytorch 1.5.0 Cud

5 Jun 02, 2022
A web application that provides real time temperature and humidity readings of a house.

About A web application which provides real time temperature and humidity readings of a house. If you're interested in the data collected so far click

Ben Thompson 3 Jan 28, 2022
Deep Learning applied to Integral data analysis

DeepIntegralCompton Deep Learning applied to Integral data analysis Module installation Move to the root directory of the project and execute : pip in

Thomas Vuillaume 1 Dec 10, 2021
Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.

Unsupervised Phone and Word Segmentation using Vector-Quantized Neural Networks Overview Unsupervised phone and word segmentation on speech data is pe

Herman Kamper 13 Dec 11, 2022
Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 130+ Indicators

Pandas TA - A Technical Analysis Library in Python 3 Pandas Technical Analysis (Pandas TA) is an easy to use library that leverages the Pandas package

Kevin Johnson 3.2k Jan 09, 2023
Gapmm2: gapped alignment using minimap2 (align transcripts to genome)

gapmm2: gapped alignment using minimap2 This tool is a wrapper for minimap2 to r

Jon Palmer 2 Jan 27, 2022
OrienMask: Real-time Instance Segmentation with Discriminative Orientation Maps

OrienMask This repository implements the framework OrienMask for real-time instance segmentation. It achieves 34.8 mask AP on COCO test-dev at the spe

45 Dec 13, 2022
Code for the paper "How Attentive are Graph Attention Networks?"

How Attentive are Graph Attention Networks? This repository is the official implementation of How Attentive are Graph Attention Networks?. The PyTorch

175 Dec 29, 2022
JORLDY an open-source Reinforcement Learning (RL) framework provided by KakaoEnterprise

Repository for Open Source Reinforcement Learning Framework JORLDY

Kakao Enterprise Corp. 330 Dec 30, 2022
Official implementation of "Generating 3D Molecules for Target Protein Binding"

Generating 3D Molecules for Target Protein Binding This is the official implementation of the GraphBP method proposed in the following paper. Meng Liu

DIVE Lab, Texas A&M University 74 Dec 07, 2022
PyBrain - Another Python Machine Learning Library.

PyBrain -- the Python Machine Learning Library =============================================== INSTALLATION ------------ Quick answer: make sure you

2.8k Dec 31, 2022
RAMA: Rapid algorithm for multicut problem

RAMA: Rapid algorithm for multicut problem Solves multicut (correlation clustering) problems orders of magnitude faster than CPU based solvers without

Paul Swoboda 60 Dec 13, 2022
Latent Network Models to Account for Noisy, Multiply-Reported Social Network Data

VIMuRe Latent Network Models to Account for Noisy, Multiply-Reported Social Network Data. If you use this code please cite this article (preprint). De

6 Dec 15, 2022
Source-to-Source Debuggable Derivatives in Pure Python

Tangent Tangent is a new, free, and open-source Python library for automatic differentiation. Existing libraries implement automatic differentiation b

Google 2.2k Jan 01, 2023
Fully Convlutional Neural Networks for state-of-the-art time series classification

Deep Learning for Time Series Classification As the simplest type of time series data, univariate time series provides a reasonably good starting poin

Stephen 572 Dec 23, 2022