Scalable machine learning based time series forecasting

Last update: Dec 24, 2022

Overview

mlforecast

Scalable machine learning based time series forecasting.

Install

PyPI

pip install mlforecast

Optional dependencies

If you want more functionality you can instead use pip install mlforecast[extra1,extra2,...]. The current extra dependencies are:

aws: adds the functionality to use S3 as the storage in the CLI.
cli: includes the validations necessary to use the CLI.
distributed: installs dask to perform distributed training. Note that you'll also need to install either LightGBM or XGBoost.

For example, if you want to perform distributed training through the CLI using S3 as your storage you'll need all three extras, which you can get using: pip install mlforecast[aws,cli,distributed].

conda-forge

conda install -c conda-forge mlforecast

Note that this installation comes with the required dependencies for the local interface. If you want to:

Use s3 as storage: conda install -c conda-forge s3path
Perform distributed training: conda install -c conda-forge dask and either LightGBM or XGBoost.

How to use

The following provides a very basic overview, for a more detailed description see the documentation.

Programmatic API

Store your time series in a pandas dataframe with an index named unique_id that identifies each time serie, a column ds that contains the datestamps and a column y with the values.

from mlforecast.utils import generate_daily_series

series = generate_daily_series(20)
display_df(series.head())

unique_id	ds	y
id_00	2000-01-01 00:00:00	0.264447
id_00	2000-01-02 00:00:00	1.28402
id_00	2000-01-03 00:00:00	2.4628
id_00	2000-01-04 00:00:00	3.03552
id_00	2000-01-05 00:00:00	4.04356

Then create a TimeSeries object with the features that you want to use. These include lags, transformations on the lags and date features. The lag transformations are defined as numba jitted functions that transform an array, if they have additional arguments you supply a tuple (transform_func, arg1, arg2, ...).

from mlforecast.core import TimeSeries
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean

ts = TimeSeries(
    lags=[7, 14],
    lag_transforms={
        1: [expanding_mean],
        7: [(rolling_mean, 7), (rolling_mean, 14)]
    },
    date_features=['dayofweek', 'month']
)
ts

TimeSeries(freq=<Day>, transforms=['lag-7', 'lag-14', 'expanding_mean_lag-1', 'rolling_mean_lag-7_window_size-7', 'rolling_mean_lag-7_window_size-14'], date_features=['dayofweek', 'month'], num_threads=8)

Next define a model. If you want to use the local interface this can be any regressor that follows the scikit-learn API. For distributed training there are LGBMForecast and XGBForecast.

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(random_state=0)

Now instantiate your forecast object with the model and the time series. There are two types of forecasters, Forecast which is local and DistributedForecast which performs the whole process in a distributed way.

from mlforecast.forecast import Forecast

fcst = Forecast(model, ts)

To compute the features and train the model using them call .fit on your Forecast object.

fcst.fit(series)

Forecast(model=RandomForestRegressor(random_state=0), ts=TimeSeries(freq=<Day>, transforms=['lag-7', 'lag-14', 'expanding_mean_lag-1', 'rolling_mean_lag-7_window_size-7', 'rolling_mean_lag-7_window_size-14'], date_features=['dayofweek', 'month'], num_threads=8))

To get the forecasts for the next 14 days call .predict(14) on the forecaster. This will update the target with each prediction and recompute the features to get the next one.

predictions = fcst.predict(14)

display_df(predictions.head())

unique_id	ds	y_pred
id_00	2000-08-10 00:00:00	5.24484
id_00	2000-08-11 00:00:00	6.25861
id_00	2000-08-12 00:00:00	0.225484
id_00	2000-08-13 00:00:00	1.22896
id_00	2000-08-14 00:00:00	2.30246

CLI

If you're looking for computing quick baselines, want to avoid some boilerplate or just like using CLIs better then you can use the mlforecast binary with a configuration file like the following:

!cat sample_configs/local.yaml

data:
  prefix: data
  input: train
  output: outputs
  format: parquet
features:
  freq: D
  lags: [7, 14]
  lag_transforms:
    1: 
    - expanding_mean
    7: 
    - rolling_mean:
        window_size: 7
    - rolling_mean:
        window_size: 14
  date_features: ["dayofweek", "month", "year"]
  num_threads: 2
backtest:
  n_windows: 2
  window_size: 7
forecast:
  horizon: 7
local:
  model:
    name: sklearn.ensemble.RandomForestRegressor
    params:
      n_estimators: 10
      max_depth: 7

The configuration is validated using FlowConfig.

This configuration will use the data in data.prefix/data.input to train and write the results to data.prefix/data.output both with data.format.

data_path = Path('data')
data_path.mkdir()
series.to_parquet(data_path/'train')

!mlforecast sample_configs/local.yaml

Split 1 MSE: 0.0251
Split 2 MSE: 0.0180

list((data_path/'outputs').iterdir())

[PosixPath('data/outputs/valid_1.parquet'),
 PosixPath('data/outputs/valid_0.parquet'),
 PosixPath('data/outputs/forecast.parquet')]

Comments

mlforecast for multivariate time series analysis
Hello,

I want to use "mlforecast" library for my Multivariate Time Series problem and I want to know how could I add new features, like holidays or temperature, to the dataset besides 'lags' and 'date_features'. Below is flow configuration:

`fcst = Forecast( models=model, freq='W-MON', lags=[1,2,3,4,5,6,7,8], date_features=['month', 'week'] ) `

Is there a way to add exogenous variables to the training process? I could not find relevant information to be able to do this.

Thank you!
opened by MariaBocsa 5
What's the purpose of using scale_factor?

I noticed in the docs under the "Custom predictions" section it references using a scale_factor - I'm just wondering what the purpose of this would be?

Is it the same purpose as the alpha here?: https://www.kaggle.com/code/lemuz90/m5-mlforecast/notebook

I'm assuming that it's some kind of post prediction adjustment to improve accuracy but I'm keen to hear the thought process behind it.

opened by TPreece101 5
[FEAT] Add step size argument to cross validation method
Description

This PR adds the step_size argument to the cross validation method. The argument controls the size between each cross validation window.

Checklist:

[x] This PR has a meaningful title and a clear description.

[x] The tests pass.

[x] All linting tasks pass.

[x] The notebooks are clean.

feature
opened by FedericoGarza 4
[FIX] delete cla.yml
Description

CLA agreement will now be handled by https://cla-assistant.io/ Checklist:

[ ] This PR has a meaningful title and a clear description.

[ ] The tests pass.

[ ] All linting tasks pass.

[ ] The notebooks are clean.
opened by FedericoGarza 3
add MLForecast.from_cv
Description

Removes the fit_on_all argument from LightGBMCV and introduces a constructor MLForecast.from_cv that builds the forecast object from a trained cv with the best iteration, features and parameters from cv object. Also makes some small changes to keep the structure of the input dataframe, which are:

If the id is not the index the predict method from all forecasts returns it as a column (previously it was always the index)

The cv_preds_ argument of LightGBMCV had id and time as a multiindex, now they have the same structure as the input df.

Checklist:

[x] This PR has a meaningful title and a clear description.

[x] The tests pass.

[x] All linting tasks pass.

[x] The notebooks are clean.

breaking
opened by jmoralez 2
remove dashes from feature names
Description

Removes dashes from feature names, e.g. lag-7 becomes lag7.

Checklist:

[ ] This PR has a meaningful title and a clear description.

[ ] The tests pass.

[ ] All linting tasks pass.

[ ] The notebooks are clean.

breaking
opened by jmoralez 2
Unable to import Forecast from mlforecast
Description

Unable to import Forecast from mlforecast

Reproducible example

# code goes here from mlforecast import Forecast

ImportError: cannot import name 'Forecast' from 'mlforecast' (/home//mambaforge/envs/dev/lib/python3.7/site-packages/mlforecast/__init__.py)

# Stacktrace

Environment info

python=3.7 pip installlation mlforecast

Package version: mlforecast=0.2.0

Additional information
opened by iki77 2
nb Forecast doens't run in latest pypi version

This nb doesn't work with latest pypi mlforecast version (installing via pip install mlforecast, version 0.2.0) https://github.com/Nixtla/mlforecast/blob/6ac01ec16e1da2d04ca8ea9e4d4a2ed173f7c534/nbs/forecast.ipynb

To make it work, I had to specifically pass the same package as in github: pip install git+https://github.com/Nixtla/mlforecast.git#egg=mlforecast

opened by Gabrielcidral1 2
sort only ds and y columns on fit
Description

Since the input for the transformations has to be sorted we used to sort the whole dataframe, however this can be very inefficient when there are many dynamic columns. This PR sorts using only the ds and y columns before constructing the GroupedArray thus keeping the peak memory usage constant with respect to the number of dynamic features.

Checklist:

[x] This PR has a meaningful title and a clear description.

[x] The tests pass.

[ ] There isn't a decrease in the tests coverage.

[x] All linting tasks pass.

[x] The notebooks are clean.

[x] If this modifies the docs, you've made sure that they were updated correctly.
opened by jmoralez 2

Bug: When using Forecast.backtest on a series with freq='W', y_pred contains null values

Code to reproduce:

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from mlforecast.core import TimeSeries
from mlforecast.forecast import Forecast

#Generate weekly data
#https://towardsdatascience.com/forecasting-with-machine-learning-models-95a6b6579090

rng = np.random.RandomState(90)
serie_length = 52 * 4  #4 years' weekly data
dates = pd.date_range('2000-01-01', freq='W', periods=serie_length, name='ds')
y = dates.dayofweek + rng.randint(-1, 2, size=dates.size)
data = pd.DataFrame({'y': y.astype(np.float64)}, index=dates)
#data.plot(marker='.', figsize=(20, 6));

train_mlfcst = data.reset_index()[['ds', 'y']]
train_mlfcst.index = pd.Index(np.repeat(0, data.shape[0]), name='unique_id')

backtest_fcst = Forecast(
    LinearRegression(fit_intercept=False), TimeSeries(lags=[4, 8])
)
backtest_results = backtest_fcst.backtest(train_mlfcst, n_windows=2, window_size=52)

result1 = next(backtest_results)
result1

	ds	y	y_pred
unique_id			
0	2001-12-30	6.0	5.105716
0	2002-01-06	5.0	5.026820
0	2002-01-13	7.0	4.640784
0	2002-01-20	5.0	6.145316
0	2002-01-27	6.0	4.746834
0	2002-02-03	6.0	4.635672
0	2002-02-10	7.0	4.271653
0	2002-02-17	7.0	NaN
0	2002-02-24	7.0	NaN
0	2002-03-03	5.0	NaN
0	2002-03-10	5.0	NaN
0	2002-03-17	7.0	NaN
0	2002-03-24	7.0	NaN
0	2002-03-31	5.0	NaN
0	2002-04-07	7.0	NaN
0	2002-04-14	5.0	NaN
0	2002-04-21	6.0	NaN
0	2002-04-28	5.0	NaN
0	2002-05-05	7.0	NaN
0	2002-05-12	7.0	NaN
0	2002-05-19	5.0	NaN
0	2002-05-26	6.0	NaN
0	2002-06-02	5.0	NaN
0	2002-06-09	6.0	NaN
0	2002-06-16	5.0	NaN
0	2002-06-23	6.0	NaN
0	2002-06-30	6.0	NaN
0	2002-07-07	6.0	NaN
0	2002-07-14	7.0	NaN
0	2002-07-21	5.0	NaN
0	2002-07-28	6.0	NaN
0	2002-08-04	6.0	NaN
0	2002-08-11	5.0	NaN
0	2002-08-18	7.0	NaN
0	2002-08-25	7.0	NaN
0	2002-09-01	6.0	NaN
0	2002-09-08	5.0	NaN
0	2002-09-15	6.0	NaN
0	2002-09-22	5.0	NaN
0	2002-09-29	5.0	NaN
0	2002-10-06	6.0	NaN
0	2002-10-13	5.0	NaN
0	2002-10-20	6.0	NaN
0	2002-10-27	5.0	NaN
0	2002-11-03	6.0	NaN
0	2002-11-10	5.0	NaN
0	2002-11-17	7.0	NaN
0	2002-11-24	7.0	NaN
0	2002-12-01	6.0	NaN
0	2002-12-08	5.0	NaN
0	2002-12-15	5.0	NaN
0	2002-12-22	6.0	NaN

opened by AMKiller 2

Fix parquet writes for distributed in cli
Description

A recent change in dask created an error when trying to write a dask dataframe built from futures to parquet. This solves that issue.

Checklist:

[x] This PR has a meaningful title and a clear description.

[x] The tests pass.

[x] There isn't a decrease in the tests coverage.

[x] All linting tasks pass.

[x] The notebooks are clean.

[x] If this modifies the docs, you've made sure that they were updated correctly.
opened by jmoralez 2
Support one model per horizon approach
Description

We currently support only the recursive strategy where the same model is used to predict over the complete horizon and the model's predictions are used to update the target and recompute the features.

This adds a max_horizon argument to MLForecast.fit to indicate that it should train that many models and use each to predict its corresponding horizon when calling MLForecast.predict.

Checklist:

[x] This PR has a meaningful title and a clear description.

[x] The tests pass.

[x] All linting tasks pass.

[x] The notebooks are clean.

feature
opened by jmoralez 1

Releases(v0.4.0)

v0.4.0(Nov 25, 2022)
What's Changed

rename Forecast to MLForecast by @jmoralez in https://github.com/Nixtla/mlforecast/pull/63

Full Changelog: https://github.com/Nixtla/mlforecast/compare/v0.3.1...v0.4.0
Source code(tar.gz)
Source code(zip)
v0.3.1(Nov 9, 2022)
What's Changed

fix unused arguments by @jmoralez in https://github.com/Nixtla/mlforecast/pull/61

Full Changelog: https://github.com/Nixtla/mlforecast/compare/v0.3.0...v0.3.1
Source code(tar.gz)
Source code(zip)
v0.3.0(Nov 1, 2022)
What's Changed

raise error when serie is too short for backtest by @jmoralez in https://github.com/Nixtla/mlforecast/pull/32

allow models list by @jmoralez (#34, #36)

[FEAT] Allow used by GitHub section hardcoding lib name by @FedericoGarza in https://github.com/Nixtla/mlforecast/pull/37

[FIX] Add black as a development dependency by @FedericoGarza in https://github.com/Nixtla/mlforecast/pull/38

rename backtest to cross_validation and return single dataframe by @jmoralez in https://github.com/Nixtla/mlforecast/pull/41

Remove TimeSeries from Forecast constructor by @jmoralez in https://github.com/Nixtla/mlforecast/pull/44

allow passing column names as arguments. allow ds to be int by @jmoralez in https://github.com/Nixtla/mlforecast/pull/45

add LightGBMCV by @jmoralez in https://github.com/Nixtla/mlforecast/pull/48

support applying differences to series by @jmoralez in https://github.com/Nixtla/mlforecast/pull/52

allow functions as date features by @jmoralez in https://github.com/Nixtla/mlforecast/pull/57

Improve docs by @jmoralez in https://github.com/Nixtla/mlforecast/pull/59

New Contributors

@FedericoGarza made their first contribution in https://github.com/Nixtla/mlforecast/pull/37

Full Changelog: https://github.com/Nixtla/mlforecast/compare/v0.2.0...v0.3.0
Source code(tar.gz)
Source code(zip)

Owner

Nixtla

Open Source Time Series Forecasting

GitHub Repository https://nixtla.github.io/mlforecast/

Unoffical reMarkable AddOn for Firefox.

reMarkable for Firefox (Download) This repo converts the offical reMarkable Chrome Extension into a Firefox AddOn published here under the name "Unoff

45 Nov 28, 2022

you can add any codes in any language by creating its respective folder (if already not available).

HACKTOBERFEST-2021-WEB-DEV Beginner-Hacktoberfest Need Your first pr for hacktoberfest 2k21 ? come on in About This is repository of Responsive Portfo

8 Oct 17, 2022

CSPML (crystal structure prediction with machine learning-based element substitution)

CSPML (crystal structure prediction with machine learning-based element substitution) CSPML is a unique methodology for the crystal structure predicti

8 Dec 20, 2022

Node Dependent Local Smoothing for Scalable Graph Learning

Node Dependent Local Smoothing for Scalable Graph Learning Requirements Environments: Xeon Gold 5120 (CPU), 384GB(RAM), TITAN RTX (GPU), Ubuntu 16.04

15 Nov 28, 2022

Loopy belief propagation for factor graphs on discrete variables, in JAX!

PGMax implements general factor graphs for discrete probabilistic graphical models (PGMs), and hardware-accelerated differentiable loopy belief propagation (LBP) in JAX.

62 Dec 23, 2022

Refactoring dalle-pytorch and taming-transformers for TPU VM

Text-to-Image Translation (DALL-E) for TPU in Pytorch Refactoring Taming Transformers and DALLE-pytorch for TPU VM with Pytorch Lightning Requirements

61 Nov 07, 2022

K Closest Points and Maximum Clique Pruning for Efficient and Effective 3D Laser Scan Matching (To appear in RA-L 2022)

KCP The official implementation of KCP: k Closest Points and Maximum Clique Pruning for Efficient and Effective 3D Laser Scan Matching, accepted for p

109 Dec 14, 2022

Instance-wise Occlusion and Depth Orders in Natural Scenes (CVPR 2022)

Instance-wise Occlusion and Depth Orders in Natural Scenes Official source code. Appears at CVPR 2022 This repository provides a new dataset, named In

27 Dec 27, 2022

This Artificial Intelligence program can take a black and white/grayscale image and generate a realistic or plausible colorized version of the same picture.

Colorizer The point of this project is to write a program capable of taking a black and white / grayscale image, and generating a realistic or plausib

1 Jan 06, 2022

Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition - NeurIPS2021

Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition Project Page | Video | Paper Implementation for Neural-PIL. A novel method wh

64 Dec 29, 2022

Mahadi-Now - This Is Pakistani Just Now Login Tools

PAKISTANI JUST NOW LOGIN TOOLS Install apt update apt upgrade apt install python

19 Apr 06, 2022

I will implement Fastai in each projects present in this repository.

DEEP LEARNING FOR CODERS WITH FASTAI AND PYTORCH The repository contains a list of the projects which I have worked on while reading the book Deep Lea

43 Dec 20, 2022

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

CoTr: Efficient 3D Medical Image Segmentation by bridging CNN and Transformer This is the official pytorch implementation of the CoTr: Paper: CoTr: Ef

218 Dec 25, 2022

A general-purpose programming language, focused on simplicity, safety and stability.

The Rivet programming language A general-purpose programming language, focused on simplicity, safety and stability. Rivet's goal is to be a very power

17 Dec 29, 2022

Reviatalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation

Reviatalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation This is the implementation of the approach describ

47 Nov 15, 2022

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time.

An Effective Loss Function for Generating 3D Models from Single 2D Image without Rendering Papers with code | Paper Nikola Zubić Pietro Lio University

213 Dec 27, 2022

Scalable machine learning based time series forecasting

Related tags

Overview

mlforecast

Install

PyPI

Optional dependencies

conda-forge

How to use

Programmatic API

CLI

Comments

Description

Description

Description

Description

Description

Reproducible example

Environment info

Additional information

Description

Description

Description

Releases(v0.4.0)

v0.4.0(Nov 25, 2022)

What's Changed

v0.3.1(Nov 9, 2022)

What's Changed

v0.3.0(Nov 1, 2022)

What's Changed

New Contributors

Owner

Nixtla

Unoffical reMarkable AddOn for Firefox.

you can add any codes in any language by creating its respective folder (if already not available).

CSPML (crystal structure prediction with machine learning-based element substitution)

Node Dependent Local Smoothing for Scalable Graph Learning

Loopy belief propagation for factor graphs on discrete variables, in JAX!

Refactoring dalle-pytorch and taming-transformers for TPU VM

K Closest Points and Maximum Clique Pruning for Efficient and Effective 3D Laser Scan Matching (To appear in RA-L 2022)

Instance-wise Occlusion and Depth Orders in Natural Scenes (CVPR 2022)

This Artificial Intelligence program can take a black and white/grayscale image and generate a realistic or plausible colorized version of the same picture.

Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition - NeurIPS2021

Mahadi-Now - This Is Pakistani Just Now Login Tools

I will implement Fastai in each projects present in this repository.

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

A general-purpose programming language, focused on simplicity, safety and stability.

Reviatalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time.

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.

A python script to dump all the challenges locally of a CTFd-based Capture the Flag.

LaneAF: Robust Multi-Lane Detection with Affinity Fields

Official implementation of "Robust channel-wise illumination estimation"