Skip to content

SweetvizDataProfiler

The SweetvizDataProfiler class is used for generating data profiling reports and data comparison reports using the Sweetviz library.

Configuration

Required Configuration

The Sweetviz data profiler requires the following configuration:

  • local_dir: Location of a local directory to output files generated by this component.

Optional Configuration

The Sweetviz data profiler uses the following optional configuration.

  • model_target: The target feature in the data.

Default Configuration

The Sweetviz data profiler uses the following optional configuration:

  • SWEETVIZ_PROFILE_REPORT_NAME: The name of the profile report file. Default is "SWEETVIZ_DATA_PROFILE_REPORT.HTML".
  • SWEETVIZ_COMPARISON_REPORT_NAME: The name of the comparison report file. Default is "SWEETVIZ_DATA_COMPARISON_REPORT.HTML".

Methods

profile_data

Profiles data using Sweetviz.

profile_data(data, *args, **kwargs)

Arguments:

  • data (pd.DataFrame): A dataframe of the data to profile.

Returns

  • data_report (object): Python object of the report.
  • file_path (string): File path of the exported report.

Example

from lolpop.component import SweetvizDataProfiler 

config = {
  #insert component configuration here 
}

profiler = SweetvizDataProfiler(conf=config)

data = pd.read_csv("data.csv")
report, path = profiler.profile_data(data)

compare_data

Produces a data drift report between two data sets using Sweetviz.

compare_data(data, prev_data, *args, **kwargs)

Arguments

  • data (pd.DataFrame): A dataframe of the "current" data.
  • prev_data (pd.DataFrame): A dataframe of the "historical" data.

Returns

  • data_report (object): Python object of the report.
  • file_path (string): File path of the exported report.

Example

from lolpop.component import SweetvizDataProfiler 

config = {
  #insert component configuration here 
}

profiler = SweetvizDataProfiler(conf=config )

current_data = pd.read_csv("current_data.csv")
previous_data = pd.read_csv("previous_data.csv")

report, path = profiler.compare_data(current_data, previous_data)