Skip to content

StumpyMatrixProfiler

This class is used to detect anomalies/novelties in time series data using STUMPY matrix profiling algorithm and plotting the results. The matrix profile is a distance profile between subsequences of the input time series data. The algorithm involves sliding a window across the time series data and calculating the matrix profile. The subsequence corresponding to each discovered anomaly/novelty in the time series data will have a distance profile value greater than other subsequences corresponding to a similar or identical anomaly/novelty.

Configuration

Required Configuration

The Stumpy matrix profiler requires the following configuration:

  • local_dir: Location of a local directory to output files generated by this component.

  • model_target: The column in the source data that contains the label that the model will try to predict.

Optional Configuration

The Stumpy matrix profiler has no optional configuration.

Default Configuration

The Stumpy matrix profiler uses the following optional configuration:

  • stumpy_analysis_image_name: The file name of the generated image. Defaults to STUMPY_DISCORD_ANALYSIS.PNG.

  • stumpy_window_size: A list of window sizes. Each entry will generate a new plot. Defaults to [30].

  • stumpy_num_discords: The total number of discords to include in the analysis. These will be the mosts most likely to be identified as anomalies. Defaults to 3.

Methods

check_data

def check_data(self, data, *args, **kwargs):
Method for detecting anomalies/novelties in time series data using STUMPY matrix profiling algorithm and plotting the results.

Arguments:

  • data (pandas.DataFrame): Input time series data.

Returns:

  • data_report (None): Stumpy does not generate a report, so this returns None in order to stay consistent with the method signiture.

  • file_path (string): Path to the exported data check report.

  • checks_status (string): The status of the checks. Currently this always returns "Pass".

Examples:

import pandas as pd
from lolpop.component import StumpyMatrixProfiler, StdOutLogger

# define input data
my_data = pd.DataFrame({'ds': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 
                        'y': [4.0, 5.0, 6.0, 1.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0]})

#create component configuration
kwargs = {
    "conf" : {
        "config": {
            "local_dir": "/tmp/artifacts",
            "stumpy_window_size": [3], 
            "stumpy_num_discords": 1,
            "model_target": "y",
        },
    },
    "component": {
        "logger": StdOutLogger(),
    }
}

# instantiate the data checker class
data_checker = StumpyMatrixProfiler(**kwargs)

#run the checks
_, file_path, check_status = data_checker.check_data(data=my_data)

#print report path and checks status
print(f"Data check report saved at {file_path}. Checks status: {checks_status}")

__plot_mp

def __plot_mp(self, axs, m, h, mp, discords, i): 
Private method for plotting the STUMPY matrix and marking the discovered anomalies/novelties.

Arguments:

  • axs (list of matplotlib axes): List of matplotlib axes to plot the matrix.
  • m (int): Subsequence length for STUMPY matrix.
  • h (int): Height of box for marking the anomalies/novelties.
  • mp (numpy array): STUMPY matrix to be plotted.
  • discords (list of int): Indices of the discovered anomalies/novelties in the time series data.
  • i (int): Index in axs list where the plot will be made.

Returns:

  • None

Examples: