Skip to content

YDataProfilingDataProfiler

The YDataProfilingDataProfiler class inherits from the BaseDataProfiler class. This class is responsible for profiling data using Ydata profiling as well as generating reports by comparing two data sets.

Note that Ydata supports profiling time-series data, as well as non-time series data.

Configuration

Required Configuration

The Ydata proifling data profiler requires the following configuration:

  • local_dir: Location of a local directory to output files generated by this component.

Optional Configuration

The Ydata profiling data profiler uses no optional configuration.

Default Configuration

The Ydata profliing data profiler uses the following optional configuration:

  • YDATA_PROFILE_REPORT_NAME: The name of the profile report file. Default is "YDATA_DATA_PROFILE_REPORT.HTML".
  • YDATA_COMPARISON_REPORT_NAME: The name of the comparison report file. Default is "YDATA_DATA_COMPARISON_REPORT.HTML".

Methods

profile_data

Profiles data using Ydata Profiling

profile_data(self, data, *args, **kwargs)
Arguments:

  • data (pd.DataFrame): A Pandas DataFrame of the data to profile.

Returns:

  • data_report (object): Python object of the report generated by Ydata Profiling.
  • file_path (string): file path of the exported report.

Example:

from lolpop.component import YDataProfilingDataProfiler

config = {
    #insert component config here 
}

profiler = YDataProfilingDataProfiler(conf=config)
data = pd.read_csv('data.csv')
report, file = profiler.profile_data(data)

compare_data

Produces a report between two data sets using Ydata Profiling. This method compares two data sets and will highlight the differences.

compare_data(self, data, prev_data, *args, **kwargs)
Arguments:

  • data (pd.DataFrame): A Pandas DataFrame of the "current" data.
  • prev_data (pd.DataFrame): A Pandas DataFrame of the "historical" data.

Returns:

  • comparison (object): Python object of the report generated by comparing two data sets.
  • file_path (string): file path of the exported report.

Example:

from lolpop.component import YDataProfilingDataProfiler

config = {
    #insert component config here 
}

profiler = YDataProfilingDataProfiler(conf=config)

data_1 = pd.read_csv('data1.csv')
data_2 = pd.read_csv('data2.csv')
comparison, file = profiler.compare_data(data_1, data_2)