OfflineProcess¶
The OfflineProcess class is a subclass of the BaseProcess class and is used for offline data processing. It provides methods for transforming, tracking, profiling, checking, and comparing data.
Configuration¶
Required Configuration¶
The OfflineProcess requires the following components:
data_transformermetadata_trackerresource_version_controldata_checkerdata_profiler
Methods¶
transform_data¶
Transforms source data by calling the data_transformer class method.
def transform_data(source_data_name, *args, **kwargs)
Arguments:
source_data_name(str): A string containing the name of the source data.
Returns:
data_out: The transformed data.
track_data¶
Tracks the data by creating a dataset version and registering version control metadata.
def track_data(data, id, *args, **kwargs)
Arguments:
data(object): The data to be tracked.id(str): The id of the dataset version.
Returns:
dataset_version: The dataset version of the registered version control metadata.
profile_data¶
Profiles the data by logging the data profile to the metadata tracker.
def profile_data(data, dataset_version, *args, **kwargs)
Arguments:
data(object): The data to be profiled.dataset_version(object): The dataset version of the registered version control metadata.
Returns:
- None
check_data¶
Checks the data by logging a data report to the metadata tracker and sending a notification if checks_status is ERROR or WARN.
def check_data(data, dataset_version, *args, **kwargs)
Arguments:
data(object): The data to be checked.dataset_version(object): The dataset version of the registered version control metadata.
Returns:
- None
compare_data¶
Compares a dataset version with the previous version and logs a comparison report to the metadata tracker.
compare_data(data, dataset_version, *args, **kwargs)
Arguments:
data(object): The data to be compared.dataset_version(object): The dataset version of the registered version control metadata.
Returns:
- None