mlinsights.timeseries#

Datasets#

mlinsights.timeseries.datasets.artificial_data(dt1, dt2, minutes=1)[source]#

Generates articial data every minutes.

Parameters:
  • dt1 – first date

  • dt2 – second date

  • minutes – interval between two observations

Returns:

dataframe

<<<

import datetime
from mlinsights.timeseries.datasets import artificial_data

now = datetime.datetime.now()
data = artificial_data(now - datetime.timedelta(40), now)
print(data.head())

>>>

                            time         y
    0 2023-12-04 14:04:53.186595  1.580145
    1 2023-12-04 14:05:53.186595  1.821150
    2 2023-12-04 14:06:53.186595  1.659127
    3 2023-12-04 14:07:53.186595  1.806003
    4 2023-12-04 14:08:53.186595  1.941861

Experimentation#

mlinsights.timeseries.patterns.find_ts_group_pattern(ttime, values, names, name_subset=None, per='week', unit='half-hour', agg='sum', estimator=None, verbose=0)[source]#

Clusters times series to find similar patterns.

Parameters:
  • ttime – time column

  • values – features to use to cluster

  • names – column which holds group name

  • name_subset – subset of groups to study, None for all

  • per – aggragation per week

  • unit – unit

  • agg – aggregation function

  • estimator – estimator used to find pattern, sklearn.cluster.KMeans and 10 groups

  • verbose – verbosity

Returns:

found clusters, distances

Manipulation#

mlinsights.timeseries.agg.aggregate_timeseries(df, index='time', values='y', unit='half-hour', agg='sum', per=None)[source]#

Aggregates timeseries assuming the data is in a dataframe.

@param df dataframe @param index time column @param values value or values column @param unit aggregate over a specific period @param sum kind of aggregation @param per second aggregation, per week… @return aggregated values

Plotting#

mlinsights.timeseries.plotting.plot_week_timeseries(time, value, normalise=True, label=None, h=0.85, value2=None, label2=None, daynames=None, xfmt='%1.0f', ax=None)[source]#

Shows a timeseries dispatched by days as bars.

Parameters:
  • time – dates

  • value – values to display as bars.

  • normalise – normalise data before showing it

  • label – label of the series

  • h – scale factor

  • value2 – second series to show as a line

  • label2 – label of the second series

  • daynames – names to use for week day names (default is English)

  • xfmt – format number of the X axis

  • ax – existing axis

Returns:

axis

(Source code, png, hires.png, pdf)

../_images/timeseries-1.png

Prediction#

BaseReciprocalTimeSeriesTransformer#

The following function builds a regular dataset from a timeseries so that it can be used by machine learning models.

class mlinsights.timeseries.base.BaseReciprocalTimeSeriesTransformer(context_length=0)[source]#

Base for all timeseries preprocessing automatically applied within a predictor.

fit(X, y, sample_weight=None)[source]#

Stores the first values.

get_fct_inv()[source]#

Returns the reverse tranform.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BaseReciprocalTimeSeriesTransformer#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

set_transform_request(*, context: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') BaseReciprocalTimeSeriesTransformer#

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

contextstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for context parameter in transform.

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in transform.

Returns#

selfobject

The updated object.

transform(X, y, sample_weight=None, context=None)[source]#

Transforms both X and y. Returns X and y, returns sample_weight as well if not None. The context is used when the y series stored in the predictor is not related to the y series given to the transform method.

build_ts_X_y#

mlinsights.timeseries.utils.build_ts_X_y(model, X, y, weights=None, same_rows=False)[source]#

Builds standard X, y based in the given one.

Parameters:
  • model – a timeseries model (BaseTimeSeries)

  • X – times series, used as features, [n_obs, n_features], X may be empty (None)

  • y – timeseries (one single vector), [n_obs]

  • weights – weights None or array [n_obs]

  • same_rows – keeps the same number of rows as the original datasets, use nan when no value is available

Returns:

(X, y, weights): X is array of features [nrows, n_features + past] where nrows = n_obs + model.delay2 - model.past + 2, y is an array of targets [nrows], weights is None or array [nrows]

<<<

import numpy
from mlinsights.timeseries import build_ts_X_y
from mlinsights.timeseries.base import BaseTimeSeries

X = numpy.arange(10).reshape(5, 2)
y = numpy.arange(5) * 100
weights = numpy.arange(5) * 1000
bs = BaseTimeSeries(past=2)
nx, ny, nw = build_ts_X_y(bs, X, y, weights)
print("X=", X)
print("y=", y)
print("nx=", nx)
print("ny=", ny)

>>>

    X= [[0 1]
     [2 3]
     [4 5]
     [6 7]
     [8 9]]
    y= [  0 100 200 300 400]
    nx= [[  2   3   0 100]
     [  4   5 100 200]
     [  6   7 200 300]]
    ny= [[200]
     [300]
     [400]]

With use_all_past=True:

<<<

import numpy
from mlinsights.timeseries.base import BaseTimeSeries
from mlinsights.timeseries import build_ts_X_y

X = numpy.arange(10).reshape(5, 2)
y = numpy.arange(5) * 100
weights = numpy.arange(5) * 1000
bs = BaseTimeSeries(past=2, use_all_past=True)
nx, ny, nw = build_ts_X_y(bs, X, y, weights)
print("X=", X)
print("y=", y)
print("nx=", nx)
print("ny=", ny)

>>>

    X= [[0 1]
     [2 3]
     [4 5]
     [6 7]
     [8 9]]
    y= [  0 100 200 300 400]
    nx= [[  0   1   2   3   0 100]
     [  2   3   4   5 100 200]
     [  4   5   6   7 200 300]]
    ny= [[200]
     [300]
     [400]]

BaseTimeSeries#

The first class defined the template for all timeseries estimators. It deals with a timeseries ine one dimension and additional features.

class mlinsights.timeseries.base.BaseTimeSeries(past=1, delay1=1, delay2=2, use_all_past=False, preprocessing=None)[source]#

Base class to build a predictor on timeseries. The class computes one or several predictions at each time, between delay1 and delay2. It computes: \hat{Y_{t+d} = f(Y_{t-1}, ..., Y_{t-p})} with d in [delay1, delay2[ and 1 \leqslant p \leqslant past.

Parameters:
  • past – values to use to predict

  • delay1 – the model computes the first prediction for time=t + delay1

  • delay2 – the model computes the last prediction for time=t + delay2 excluded

  • use_all_past – use all past features, not only the timeseries

  • preprocessing – preprocessing to apply before predicting, only the timeseries itselves, it can be a difference, it must be of type BaseReciprocalTimeSeriesTransformer

has_preprocessing()[source]#

Tells if there is one preprocessing.

DummyTimeSeriesRegressor#

The first predictor is a dummy one: it uses the current value to predict the future.

class mlinsights.timeseries.dummies.DummyTimeSeriesRegressor(estimator='dummy', past=1, delay1=1, delay2=2, use_all_past=False, preprocessing=None)[source]#

Dummy regressor for time series. Use past values as prediction.

Parameters:
  • estimator – estimator to use for regression, sklearn.linear_model.LinearRegression implements a linear auto-regressor, 'dummy' use past value as predictions

  • past – values to use to predict

  • delay1 – the model computes the first prediction for time=t + delay1

  • delay2 – the model computes the last prediction for time=t + delay2 excluded

  • use_all_past – use all past features, not only the timeseries

  • preprocessing – preprocessing to apply before predicting, only the timeseries itselves, it can be a difference, it must be of type BaseReciprocalTimeSeriesTransformer

fit(X, y, sample_weight=None)[source]#

Trains the model.

Parameters:
  • X – output of X may be empty (None)

  • y – timeseries (one single vector), array [n_obs]

  • sample_weight – weights None or array [n_obs]

Returns:

self

predict(X, y)[source]#

Returns the prediction

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') DummyTimeSeriesRegressor#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') DummyTimeSeriesRegressor#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

ARTimeSeriesRegressor#

The first regressor is an auto-regressor. It can be estimated with any regressor implemented in scikit-learn.

class mlinsights.timeseries.ar.ARTimeSeriesRegressor(estimator='dummy', past=1, delay1=1, delay2=2, use_all_past=False, preprocessing=None)[source]#

Base class to build a regressor on timeseries. The class computes one or several predictions at each time, between delay1 and delay2. It computes: \hat{Y_{t+d} = f(Y_{t-1}, ..., Y_{t-p})} with d in [delay1, delay2[ and 1 \leqslant p \leqslant past.

Parameters:
  • estimator – estimator to use for regression, sklearn.linear_model.LinearRegression implements a linear auto-regressor, 'dummy' use past value as predictions

  • past – values to use to predict

  • delay1 – the model computes the first prediction for time=t + delay1

  • delay2 – the model computes the last prediction for time=t + delay2 excluded

  • use_all_past – use all past features, not only the timeseries

  • preprocessing – preprocessing to apply before predicting, only the timeseries itselves, it can be a difference, it must be of type BaseReciprocalTimeSeriesTransformer

fit(X, y, sample_weight=None)[source]#

Trains the model.

Parameters:
  • X – output of X may be empty (None)

  • y – timeseries (one single vector), array [n_obs]

  • sample_weight – weights None or array [n_obs]

Returns:

self

predict(X, y)[source]#

Returns the prediction

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ARTimeSeriesRegressor#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ARTimeSeriesRegressor#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

ts_mape#

The library implements one scoring function which compares the prediction to what a dummy predictor would do by using the previous day as a prediction.

mlinsights.timeseries.metrics.ts_mape(expected_y, predicted_y, sample_weight=None)[source]#

Computes \frac{\sum_i | \hat{Y_t} - Y_t |}
{\sum_i | Y_t - Y_{t-1} |}. It compares the prediction to what a dummy predictor would do by using the previous day as a prediction.

Parameters:
  • expected_y – expected values

  • predicted_y – predictions

  • sample_weight – sample weight

Returns:

metrics