EvidentlyAIDataProfiler¶
The EvidentlyAIDataProfiler class is a Python class that allows for profiling and comparing data using the EvidentlyAI library. It is a subclass of the BaseDataProfiler class.
Configuration¶
Required Configuration¶
The EvidentlyAI data profiler requires the following configuration:
local_dir: Location of a local directory to output files generated by this component.
Optional Configuration¶
The EvidentlyAI data profiler has no optional configuration.
Default Configuration¶
The EvidentlyAI data profiler uses the following optional configuration:
evidentlyai_profile_report_name: The file name of the generated report. Defaults toEVIDENTLYAI_DATA_PROFILE_REPORT.HTML.evidentlyai_comparison_report_name: The file name of the generated report. Defaults toEVIDENTLYAI_DATA_COMPARISON_REPORT.HTML.
Methods¶
profile_data¶
Profiles data using EvidentlyAI.
profile_data(data, *args, **kwargs)
Parameters
data(pd.DataFrame): A pandas DataFrame containing the data to be profiled.
Returns
data_report(object): A Python object representing the generated report.file_path(string): The file path of the exported report.
compare_data¶
Produces a data drift report between two data sets using EvidentlyAI.
compare_data(data, prev_data, *args, **kwargs)
Parameters
data(pd.DataFrame): A pandas DataFrame containing the "current" data.prev_data(pd.DataFrame): A pandas DataFrame containing the "historical" data.
Returns
data_report(object): A Python object representing the generated report.file_path(string): The file path of the exported report.
Usage¶
Here is an example of how to use the EvidentlyAIDataProfiler class:
import pandas as pd
from utils import error_handler, log_execution
from evidently import Report, DataQualityPreset, DataDriftPreset
from EvidentlyAIDataProfiler import EvidentlyAIDataProfiler
# Create an instance of the EvidentlyAIDataProfiler class
profiler = EvidentlyAIDataProfiler()
# Assuming you have data stored in a pandas DataFrame
data = pd.DataFrame(...)
# Profile the data using EvidentlyAI
data_report, file_path = profiler.profile_data(data)
# Compare the data to a previous version
prev_data = pd.DataFrame(...)
data_report, file_path = profiler.compare_data(data, prev_data)