StumpyMatrixProfiler¶
This class is used to detect anomalies/novelties in time series data using STUMPY matrix profiling algorithm and plotting the results. The matrix profile is a distance profile between subsequences of the input time series data. The algorithm involves sliding a window across the time series data and calculating the matrix profile. The subsequence corresponding to each discovered anomaly/novelty in the time series data will have a distance profile value greater than other subsequences corresponding to a similar or identical anomaly/novelty.
Configuration¶
Required Configuration¶
The Stumpy matrix profiler requires the following configuration:
-
local_dir: Location of a local directory to output files generated by this component. -
model_target: The column in the source data that contains the label that the model will try to predict.
Optional Configuration¶
The Stumpy matrix profiler has no optional configuration.
Default Configuration¶
The Stumpy matrix profiler uses the following optional configuration:
-
stumpy_analysis_image_name: The file name of the generated image. Defaults toSTUMPY_DISCORD_ANALYSIS.PNG. -
stumpy_window_size: A list of window sizes. Each entry will generate a new plot. Defaults to[30]. -
stumpy_num_discords: The total number of discords to include in the analysis. These will be the mosts most likely to be identified as anomalies. Defaults to 3.
Methods¶
check_data¶
def check_data(self, data, *args, **kwargs):
Arguments:
data(pandas.DataFrame): Input time series data.
Returns:
-
data_report(None): Stumpy does not generate a report, so this returnsNonein order to stay consistent with the method signiture. -
file_path(string): Path to the exported data check report. -
checks_status(string): The status of the checks. Currently this always returns "Pass".
Examples:
import pandas as pd
from lolpop.component import StumpyMatrixProfiler, StdOutLogger
# define input data
my_data = pd.DataFrame({'ds': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'y': [4.0, 5.0, 6.0, 1.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0]})
#create component configuration
kwargs = {
"conf" : {
"config": {
"local_dir": "/tmp/artifacts",
"stumpy_window_size": [3],
"stumpy_num_discords": 1,
"model_target": "y",
},
},
"component": {
"logger": StdOutLogger(),
}
}
# instantiate the data checker class
data_checker = StumpyMatrixProfiler(**kwargs)
#run the checks
_, file_path, check_status = data_checker.check_data(data=my_data)
#print report path and checks status
print(f"Data check report saved at {file_path}. Checks status: {checks_status}")
__plot_mp¶
def __plot_mp(self, axs, m, h, mp, discords, i):
Arguments:
axs(list of matplotlib axes): List of matplotlib axes to plot the matrix.m(int): Subsequence length for STUMPY matrix.h(int): Height of box for marking the anomalies/novelties.mp(numpy array): STUMPY matrix to be plotted.discords(list of int): Indices of the discovered anomalies/novelties in the time series data.i(int): Index in axs list where the plot will be made.
Returns:
- None
Examples: