A non-linear, non-parametric Machine Learning method capable of modeling complex datasets

Overview

Fast Symbolic Regression

Symbolic Regression is a non-linear, non-parametric Machine Learning method capable of modeling complex data sets. fastsr aims at providing the most simple, powerful models possible by optimizing not only for error but also for model complexity. fastsr is built on top of fastgp, a numpy implementation of genetic programming built on top of deap. All estimators adhere to the sklearn estimator interface and can thus be used in pipelines.

fastsr was designed and developed by the Morphology, Evolution & Cognition Laboratory at the University of Vermont. It extends research code which can be found here.

Installation

fastsr is compatible with Python 2.7+.

pip install fastsr

Example Usage

Symbolic Regression is really good at fitting nonlinear functions. Let's try to fit the third order polynomial x^3 + x^2 + x. This is the "regression" example from the examples folder.

import matplotlib.pyplot as plt

import numpy as np

from fastsr.estimators.symbolic_regression import SymbolicRegression

from fastgp.algorithms.fast_evaluate import fast_numpy_evaluate
from fastgp.parametrized.simple_parametrized_terminals import get_node_semantics
def target(x):
    return x**3 + x**2 + x

Now we'll generate some data on the domain [-10, 10].

X = np.linspace(-10, 10, 100, endpoint=True)
y = target(X)

Finally we'll create and fit the Symbolic Regression estimator and check the score.

sr = SymbolicRegression(seed=72066)
sr.fit(X, y)
score = sr.score(X, y)
Score: 0.0

Whoa! That's not much error. Don't get too used to scores like that though, real data sets aren't usually as simple as a third order polynomial.

fastsr uses Genetic Programming to fit the data. That means equations are evolving to fit the data better and better each generation. Let's have a look at the best individuals and their respective scores.

print('Best Individuals:')
sr.print_best_individuals()
Best Individuals:
0.0 : add(add(square(X0), cube(X0)), X0)
34.006734006733936 : add(square(X0), cube(X0))
2081.346746380927 : add(cube(X0), X0)
2115.3534803876605 : cube(X0)
137605.24466869785 : add(add(X0, add(X0, X0)), add(X0, X0))
141529.89102341252 : add(add(X0, X0), add(X0, X0))
145522.55084614072 : add(add(X0, X0), X0)
149583.22413688237 : add(X0, X0)
151203.96034032793 : numpy_protected_sqrt(cube(numpy_protected_log_abs(exp(X0))))
151203.96034032793 : cube(numpy_protected_sqrt(X0))
153711.91089563753 : numpy_protected_log_abs(exp(X0))
153711.91089563753 : X0
155827.26437602515 : square(X0)
156037.81673350732 : add(numpy_protected_sqrt(X0), cbrt(X0))
157192.02956807753 : numpy_protected_sqrt(exp(cbrt(X0)))

At the top we find our best individual, which is exactly the third order polynomial we defined our target function to be. You might be confused as to why we consider all these other individuals, some with very large errors be be "best". We can look through the history object to see some of the equations that led up to our winning model by ordering by error.

history = sr.history_
population = list(filter(lambda x: hasattr(x, 'error'), list(sr.history_.genealogy_history.values())))
population.sort(key=lambda x: x.error, reverse=True)

Let's get a sample of the unique solutions. There are quite a few so the print statements have been omitted.

X = X.reshape((len(X), 1))
i = 1
previous_errror = population[0]
unique_individuals = []
while i < len(population):
    ind = population[i]
    if ind.error != previous_errror:
        print(str(i) + ' | ' + str(ind.error) + ' | ' + str(ind))
        unique_individuals.append(ind)
    previous_errror = ind.error
    i += 1

Now we can plot the equations over the target functions.

def plot(index):
    plt.plot(X, y, 'r')
    plt.axis([-10, 10, -1000, 1000])
    y_hat = fast_numpy_evaluate(unique_individuals[index], sr.pset_.context, X, get_node_semantics)
    plt.plot(X, y_hat, 'g')
    plt.savefig(str(i) + 'ind.png')
    plt.gcf().clear()

i = 0
while i < len(unique_individuals):
    plot(i)
    i += 10
i = len(unique_individuals) - 1
plot(i)

Stitched together into a gif we get a view into the evolutionary process.

Convergence Gif

Fitness Age Size Complexity Pareto Optimization

In addition to minimizing the error when creating an interpretable model it's often useful to minimize the size of the equations and their complexity (as defined by the order of an approximating polynomial[1]). In Multi-Objective optimization we keep all individuals that are not dominated by any other individuals and call this group the Pareto Front. These are the individuals printed in the Example Usage above. The age component helps prevent the population of equations from falling into a local optimum and was introduced in AFPO [2] but is out of the scope of this readme.

The result of this optimization technique is that a range of solutions are considered "best" individuals. Although in practice you will probably be interested in the top or several top individuals, be aware that the population as a whole was pressured into keeping individual equations as simple as possible in addition to keeping error as low as possible.

Literature Cited

  1. Ekaterina J Vladislavleva, Guido F Smits, and Dick Den Hertog. 2009. Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Transactions on Evolutionary Computation 13, 2 (2009), 333โ€“349.
  2. Michael Schmidt and Hod Lipson. 2011. Age-fitness pareto optimization. In Genetic Programming Theory and Practice VIII. Springer, 129โ€“146.
Owner
VAMSHI CHOWDARY
๐ƒ๐€๐“๐€ ๐’๐‚๐ˆ๐„๐๐‚๐„ ๐„๐๐“๐‡๐”๐’๐ˆ๐€๐’๐“
VAMSHI CHOWDARY
OpenMMLab Text Detection, Recognition and Understanding Toolbox

Introduction English | ็ฎ€ไฝ“ไธญๆ–‡ MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the correspondi

OpenMMLab 3k Jan 07, 2023
ADB-IP-ROTATION - Use your mobile phone to gain a temporary IP address using ADB and data tethering

ADB IP ROTATE This an Python script based on Android Debug Bridge (adb) shell sc

Dor Bismuth 2 Jul 12, 2022
BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond

BasicVSR BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond Ported from https://github.com/xinntao/BasicSR Dependencie

Holy Wu 8 Jun 07, 2022
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch = 1.2.0 P

16 Dec 14, 2022
LightNet++: Boosted Light-weighted Networks for Real-time Semantic Segmentation

LightNet++ !!!New Repo.!!! โ‡’ EfficientNet.PyTorch: Concise, Modular, Human-friendly PyTorch implementation of EfficientNet with Pre-trained Weights !!

linksense 237 Jan 05, 2023
Cooperative multi-agent reinforcement learning for high-dimensional nonequilibrium control

Cooperative multi-agent reinforcement learning for high-dimensional nonequilibrium control Official implementation of: Cooperative multi-agent reinfor

0 Nov 16, 2021
A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

Yuchao Zhang 204 Jul 14, 2022
ไบคไบ’ๅผๆ ‡ๆณจ่ฝฏไปถ๏ผŒๆš‚ๅฎšๅ iann

iann ไบคไบ’ๅผๆ ‡ๆณจ่ฝฏไปถ๏ผŒๆš‚ๅฎšๅiannใ€‚ ๅฎ‰่ฃ… ๆŒ‰็…งๅฎ˜็ฝ‘ไป‹็ปๅฎ‰่ฃ…paddleใ€‚ ๅฎ‰่ฃ…ๅ…ถไป–ไพ่ต– pip install -r requirements.txt ่ฟ่กŒ git clone https://github.com/PaddleCV-SIG/iann/ cd iann python iann

294 Dec 30, 2022
Deep learning with TensorFlow and earth observation data.

Deep Learning with TensorFlow and EO Data Complete file set for Jupyter Book Autor: Development Seed Date: 04 October 2021 ISBN: (to come) Notebook tu

Development Seed 20 Nov 16, 2022
Data Engineering ZoomCamp

Data Engineering ZoomCamp I'm partaking in a Data Engineering Bootcamp / Zoomcamp and will be tracking my progress here. I can't promise these notes w

Aaron 61 Jan 06, 2023
Instance-wise Occlusion and Depth Orders in Natural Scenes (CVPR 2022)

Instance-wise Occlusion and Depth Orders in Natural Scenes Official source code. Appears at CVPR 2022 This repository provides a new dataset, named In

27 Dec 27, 2022
Pytorch Implementation of paper "Noisy Natural Gradient as Variational Inference"

Noisy Natural Gradient as Variational Inference PyTorch implementation of Noisy Natural Gradient as Variational Inference. Requirements Python 3 Pytor

Tony JiHyun Kim 119 Dec 02, 2022
An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding, top-down-bottom-up, and attention (consensus between columns)

GLOM - Pytorch (wip) An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding,

Phil Wang 173 Dec 14, 2022
Code repository for Self-supervised Structure-sensitive Learning, CVPR'17

Self-supervised Structure-sensitive Learning (SSL) Ke Gong, Xiaodan Liang, Xiaohui Shen, Liang Lin, "Look into Person: Self-supervised Structure-sensi

Clay Gong 219 Dec 29, 2022
Code for the Higgs Boson Machine Learning Challenge organised by CERN & EPFL

A method to solve the Higgs boson challenge using Least Squares - Novae This project is the Project 1 of EPFL CS-433 Machine Learning. The project is

Giacomo Orsi 1 Nov 09, 2021
Pytorch implementation of ICASSP 2022 paper Attention Probe: Vision Transformer Distillation in the Wild

Attention Probe: Vision Transformer Distillation in the Wild Jiahao Wang, Mingdeng Cao, Shuwei Shi, Baoyuan Wu, Yujiu Yang In ICASSP 2022 This code is

IIGROUP 6 Sep 21, 2022
A real-time motion capture system that estimates poses and global translations using only 6 inertial measurement units

TransPose Code for our SIGGRAPH 2021 paper "TransPose: Real-time 3D Human Translation and Pose Estimation with Six Inertial Sensors". This repository

Xinyu Yi 261 Dec 31, 2022
WHENet - ONNX, OpenVINO, TFLite, TensorRT, EdgeTPU, CoreML, TFJS, YOLOv4/YOLOv4-tiny-3L

HeadPoseEstimation-WHENet-yolov4-onnx-openvino ONNX, OpenVINO, TFLite, TensorRT, EdgeTPU, CoreML, TFJS, YOLOv4/YOLOv4-tiny-3L 1. Usage $ git clone htt

Katsuya Hyodo 49 Sep 21, 2022
Gray Zone Assessment

Gray Zone Assessment Get started Clone github repository git clone https://github.com/andreanne-lemay/gray_zone_assessment.git Build docker image dock

1 Jan 08, 2022
Codes for โ€œA Deeply Supervised Attention Metric-Based Network and an Open Aerial Image Dataset for Remote Sensing Change Detectionโ€

DSAMNet The pytorch implementation for "A Deeply-supervised Attention Metric-based Network and an Open Aerial Image Dataset for Remote Sensing Change

Mengxi Liu 41 Dec 14, 2022