DeepchecksDataChecker¶
The DeepchecksDataChecker class is a child class of the BaseDataChecker class. This class is responsible for generating a data check report using Deepchecks. In particular, it runs the data integrity suite on the provided dataset.
Configuration¶
Required Configuration¶
The Deepchecks data checker requires the following configuration:
local_dir: Location of a local directory to output files generated by this component.
Optional Configuration¶
The Deepchecks data checker uses the following optional configuration:
model_target: The column name of the model target, or label.model_index: The column name of the model index, if it exists. The index uniquely identifies a row.model_time_index: The column name of the model time_index, if it exists. The time index records when the observation occured.model_cat_features: A list of columns that correspond to categorical features for the model.
Default Configuration¶
The Deepchecks data checker uses the following optional configuration:
deepchecks_report_name: The file name of the generated report. Defaults toDEEPCHECKS_DATA_REPORT.HTML.
Methods¶
The following are the methods available in the DeepchecksDataChecker class:
check_data
check_data¶
def check_data(self, data, *args, **kwargs):
Arguments:
data(pd.DataFrame): A dataframe of the data to check
Returns:
data_report(object): Python object of the data report.file_path(string): Path to the exported report.checks_status(string): Status of the checks ("PASS"/"WARN"/"ERROR", etc.)
Example:
from lolpop.component import DeepchecksDataChecker, StdOutLogger
import pandas as pd
# define input data
my_data = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
#create component configuration
kwargs = {
"conf" : {
"config": {
"local_dir": "/tmp/artifacts",
},
},
"component": {
"logger": StdOutLogger(),
}
}
# instantiate the data checker class
data_checker = DeepchecksDataChecker(**kwargs)
# check data and save the report
data_report, file_path, checks_status = data_checker.check_data(data=my_data)
print(f"The generated report is saved in {file_path} with status {checks_status}.")