convst.transformers.R_DST¶
- class convst.transformers.R_DST(transform_type='auto', phase_invariance=False, alpha=0.5, normalize_output=False, n_samples=None, n_shapelets=10000, shapelet_lengths=[11], shapelet_lengths_bounds=None, lengths_bounds_reduction=0.5, prime_dilations=False, proba_norm=0.8, percentiles=[5, 10], random_state=None, max_channels=None, min_len=None, n_jobs=1)¶
Bases:
BaseEstimator,TransformerMixinBase class for RDST transformer. Depending on the parameters and of the type of data (i.e. multivariate, variable length, etc.) given during the fit method, it will call the adapted transformer.
For more information on the transformer and the effect of the different parameters on the transformation and shapelet extraction process, please refer to [1]_ and [2]_
- Parameters:
- transform_typestr, optional
Type of transformer to use. Based on the characteristics of the input time series, different class of transformer must be used, for example the tranformer for univariate series is not the same as for multivariate ones for run-time optimization reasons. The default is ‘auto’, which automatically select the transformer based on the data passed in the fit method.
- phase_invariancebool, optional
Wheter to use phase invariance for shapelet sampling and distance computation. The default is False.
- alphafloat, optional
The alpha similarity parameter, the higher the value, the lower the allowed number of common indexes with previously sampled shapelets when sampling a new one with similar parameters. It can cause the number of sampled shapelets to be lower than n_shapelets if the whole search space has been covered. The default is 0.5.
- normalize_outputboolean, optional
Wheter to normalize the argmin and shapelet occurrence feature by the length of the series from which it was extracted. This is mostly useful for variable length time series. The default is False.
- n_samplesfloat, optional
Proportion (in ]0,1]) of samples to consider for the shapelet extraction. The default is None, meaning that all samples are used.
- n_shapeletsint, optional
The maximum number of shapelet to be sampled. The default is 10_000.
- shapelet_lengthsarray, optional
The set of possible length for shapelets. The values can be integers to specify an absolute length, or a float, to specify a length relative to the input time series length. The default is [11].
- shapelet_lengths_boundsarray, optional
An 1D array with two elements containing the min and max possible length for shapelet candidate, can be int or float. The default is None, meaning that shapelet_lengths parameter is used.
- lengths_bounds_reductionfloat, optional
A float in ]0,1], quantifying the proportion of lengths to explore between the min and max bounds of shapelet_lengths_bounds. The default is 0.5. For example, with bounds as [4,10], and a reduction of 0.5, only [4,6,8,10] will be considered as possible lengths.
- prime_dilationsbool, optional
If True, only dilation with prime values will be considered for shapelet candidates. This will greatly speed-up the algorithm for long time series and/or short shapelet length, possibly at the cost of some accuracy.
- proba_normfloat, optional
The proportion of shapelets that will use a normalized distance function, which induce scale invariance. The default is 0.8.
- percentilesarray, optional
The two perceniles used to select the lambda threshold used to compute the Shapelet Occurrence feature. The default is [5,10].
- n_jobsint, optional
The number of threads used to sample and compute the distance vectors. The default is 1, -1 means all available cores.
- random_stateobject, optional
The seed for the random state. The default is None.
- max_channelsint, optional
The maximum number of feature possibly considered by a multivariate shapelet. The default is None, meaning max_chanels=n_features.
- min_lenint, optional
The minimum length of an input time series for variable length input. The default is None, meaning min_len=min(n_timestamps) on the training data. This can cause error if a shorter serie sis present in the test set.
- Attributes
- ——-
- transformer_object
The transformer that have been selected based on the parameters and the type of data. This is the object used to transform the input data.
- .. [1] Antoine Guillaume et al, “Random Dilated Shapelet Transform: A new approach of time series shapelets” (2022)
- .. [2] Antoine Guillaume, “Time series classification with shapelets: Application to predictive maintenance on event logs” (2023)
Methods
__init__([transform_type, phase_invariance, ...])fit(X, y)Fit method.
fit_transform(X[, y])Fit to data, then transform it.
get_params([deep])Get parameters for this estimator.
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
transform(X)Transform the input time series using previously fitted shapelets.
- fit(X, y)¶
Fit method. Random shapelets are generated using the parameters supplied during initialisation. Then, the class attributes are filled with the result of this random initialisation.
- Parameters:
- Xarray, shape=(n_samples, n_features, n_timestamps)
Input time series.
- yarray, shape=(n_samples)
Class of the input time series.
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)¶
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
None: Transform configuration is unchanged
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- transform(X)¶
Transform the input time series using previously fitted shapelets. We compute a distance vector between each shapelet and each time series and extract the min, argmin, and shapelet occurence features based on the lambda threshold of each shapelet.
- Parameters:
- Xarray, shape=(n_samples, n_features, n_timestamps)
Input time series.
- Returns:
- Xarray, shape=(n_samples, 3*n_shapelets)
Transformed input time series.