Skip to content

DeepchecksDataChecker

The DeepchecksDataChecker class is a child class of the BaseDataChecker class. This class is responsible for generating a data check report using Deepchecks. In particular, it runs the data integrity suite on the provided dataset.

Configuration

Required Configuration

The Deepchecks data checker requires the following configuration:

  • local_dir: Location of a local directory to output files generated by this component.

Optional Configuration

The Deepchecks data checker uses the following optional configuration:

  • model_target: The column name of the model target, or label.
  • model_index: The column name of the model index, if it exists. The index uniquely identifies a row.
  • model_time_index: The column name of the model time_index, if it exists. The time index records when the observation occured.
  • model_cat_features: A list of columns that correspond to categorical features for the model.

Default Configuration

The Deepchecks data checker uses the following optional configuration:

  • deepchecks_report_name: The file name of the generated report. Defaults to DEEPCHECKS_DATA_REPORT.HTML.

Methods

The following are the methods available in the DeepchecksDataChecker class:

  • check_data

check_data

def check_data(self, data, *args, **kwargs):
Generates a data check report using Deepchecks.

Arguments:

  • data (pd.DataFrame): A dataframe of the data to check

Returns:

  • data_report (object): Python object of the data report.
  • file_path (string): Path to the exported report.
  • checks_status (string): Status of the checks ("PASS"/"WARN"/"ERROR", etc.)

Example:

from lolpop.component import DeepchecksDataChecker, StdOutLogger
import pandas as pd

# define input data
my_data = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

#create component configuration
kwargs = {
    "conf" : {
        "config": {
            "local_dir": "/tmp/artifacts",
        },
    },
    "component": {
        "logger": StdOutLogger(),
    }
}

# instantiate the data checker class
data_checker = DeepchecksDataChecker(**kwargs)

# check data and save the report
data_report, file_path, checks_status = data_checker.check_data(data=my_data)

print(f"The generated report is saved in {file_path} with status {checks_status}.")