YDataProfilingDataProfiler¶
The YDataProfilingDataProfiler class inherits from the BaseDataProfiler class. This class is responsible for profiling data using Ydata profiling as well as generating reports by comparing two data sets.
Note that Ydata supports profiling time-series data, as well as non-time series data.
Configuration¶
Required Configuration¶
The Ydata proifling data profiler requires the following configuration:
local_dir: Location of a local directory to output files generated by this component.
Optional Configuration¶
The Ydata profiling data profiler uses no optional configuration.
Default Configuration¶
The Ydata profliing data profiler uses the following optional configuration:
YDATA_PROFILE_REPORT_NAME: The name of the profile report file. Default is "YDATA_DATA_PROFILE_REPORT.HTML".YDATA_COMPARISON_REPORT_NAME: The name of the comparison report file. Default is "YDATA_DATA_COMPARISON_REPORT.HTML".
Methods¶
profile_data¶
Profiles data using Ydata Profiling
profile_data(self, data, *args, **kwargs)
data (pd.DataFrame): A Pandas DataFrame of the data to profile.
Returns:
data_report (object): Python object of the report generated by Ydata Profiling.file_path (string): file path of the exported report.
Example:
from lolpop.component import YDataProfilingDataProfiler
config = {
#insert component config here
}
profiler = YDataProfilingDataProfiler(conf=config)
data = pd.read_csv('data.csv')
report, file = profiler.profile_data(data)
compare_data¶
Produces a report between two data sets using Ydata Profiling. This method compares two data sets and will highlight the differences.
compare_data(self, data, prev_data, *args, **kwargs)
data (pd.DataFrame): A Pandas DataFrame of the "current" data.prev_data (pd.DataFrame): A Pandas DataFrame of the "historical" data.
Returns:
comparison (object): Python object of the report generated by comparing two data sets.file_path (string): file path of the exported report.
Example:
from lolpop.component import YDataProfilingDataProfiler
config = {
#insert component config here
}
profiler = YDataProfilingDataProfiler(conf=config)
data_1 = pd.read_csv('data1.csv')
data_2 = pd.read_csv('data2.csv')
comparison, file = profiler.compare_data(data_1, data_2)