convst.transformers.R_DST

class convst.transformers.R_DST(transform_type='auto', phase_invariance=False, alpha=0.5, normalize_output=False, n_samples=None, n_shapelets=10000, shapelet_lengths=[11], shapelet_lengths_bounds=None, lengths_bounds_reduction=0.5, prime_dilations=False, proba_norm=0.8, percentiles=[5, 10], random_state=None, max_channels=None, min_len=None, n_jobs=1)

Bases: BaseEstimator, TransformerMixin

Base class for RDST transformer. Depending on the parameters and of the type of data (i.e. multivariate, variable length, etc.) given during the fit method, it will call the adapted transformer.

For more information on the transformer and the effect of the different parameters on the transformation and shapelet extraction process, please refer to [1]_ and [2]_

Parameters:
transform_typestr, optional

Type of transformer to use. Based on the characteristics of the input time series, different class of transformer must be used, for example the tranformer for univariate series is not the same as for multivariate ones for run-time optimization reasons. The default is ‘auto’, which automatically select the transformer based on the data passed in the fit method.

phase_invariancebool, optional

Wheter to use phase invariance for shapelet sampling and distance computation. The default is False.

alphafloat, optional

The alpha similarity parameter, the higher the value, the lower the allowed number of common indexes with previously sampled shapelets when sampling a new one with similar parameters. It can cause the number of sampled shapelets to be lower than n_shapelets if the whole search space has been covered. The default is 0.5.

normalize_outputboolean, optional

Wheter to normalize the argmin and shapelet occurrence feature by the length of the series from which it was extracted. This is mostly useful for variable length time series. The default is False.

n_samplesfloat, optional

Proportion (in ]0,1]) of samples to consider for the shapelet extraction. The default is None, meaning that all samples are used.

n_shapeletsint, optional

The maximum number of shapelet to be sampled. The default is 10_000.

shapelet_lengthsarray, optional

The set of possible length for shapelets. The values can be integers to specify an absolute length, or a float, to specify a length relative to the input time series length. The default is [11].

shapelet_lengths_boundsarray, optional

An 1D array with two elements containing the min and max possible length for shapelet candidate, can be int or float. The default is None, meaning that shapelet_lengths parameter is used.

lengths_bounds_reductionfloat, optional

A float in ]0,1], quantifying the proportion of lengths to explore between the min and max bounds of shapelet_lengths_bounds. The default is 0.5. For example, with bounds as [4,10], and a reduction of 0.5, only [4,6,8,10] will be considered as possible lengths.

prime_dilationsbool, optional

If True, only dilation with prime values will be considered for shapelet candidates. This will greatly speed-up the algorithm for long time series and/or short shapelet length, possibly at the cost of some accuracy.

proba_normfloat, optional

The proportion of shapelets that will use a normalized distance function, which induce scale invariance. The default is 0.8.

percentilesarray, optional

The two perceniles used to select the lambda threshold used to compute the Shapelet Occurrence feature. The default is [5,10].

n_jobsint, optional

The number of threads used to sample and compute the distance vectors. The default is 1, -1 means all available cores.

random_stateobject, optional

The seed for the random state. The default is None.

max_channelsint, optional

The maximum number of feature possibly considered by a multivariate shapelet. The default is None, meaning max_chanels=n_features.

min_lenint, optional

The minimum length of an input time series for variable length input. The default is None, meaning min_len=min(n_timestamps) on the training data. This can cause error if a shorter serie sis present in the test set.

Attributes
——-
transformer_object

The transformer that have been selected based on the parameters and the type of data. This is the object used to transform the input data.

.. [1] Antoine Guillaume et al, “Random Dilated Shapelet Transform: A new approach of time series shapelets” (2022)
.. [2] Antoine Guillaume, “Time series classification with shapelets: Application to predictive maintenance on event logs” (2023)

Methods

__init__([transform_type, phase_invariance, ...])

fit(X, y)

Fit method.

fit_transform(X[, y])

Fit to data, then transform it.

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform the input time series using previously fitted shapelets.

fit(X, y)

Fit method. Random shapelets are generated using the parameters supplied during initialisation. Then, the class attributes are filled with the result of this random initialisation.

Parameters:
Xarray, shape=(n_samples, n_features, n_timestamps)

Input time series.

yarray, shape=(n_samples)

Class of the input time series.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • None: Transform configuration is unchanged

Returns:
selfestimator instance

Estimator instance.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)

Transform the input time series using previously fitted shapelets. We compute a distance vector between each shapelet and each time series and extract the min, argmin, and shapelet occurence features based on the lambda threshold of each shapelet.

Parameters:
Xarray, shape=(n_samples, n_features, n_timestamps)

Input time series.

Returns:
Xarray, shape=(n_samples, 3*n_shapelets)

Transformed input time series.