BaseDataSplitter
Overview¶
A data_splitter
is a component that takes a dataset and splits it into training, validation, and test datasets. This is intended to properly split data for use in machine learning workflows, dependent upon the problem_type
of the workflow.
Attributes¶
BaseDataSplitter
contains no default attributes.
Configuration¶
BaseDataSplitter
contains no default or required configuration.
Interface¶
The following methods are part of BaseDataSplitter
and should be implemented in any class that inherits from this base class:
spit_data¶
Performs a data split on the given data.
def split_data(self, data, *args, **kwargs) -> dict[str, Any]
Arguments:
data
(object): The data to split.
Returns:
data_out
(dict): A dictionary containing the split datasets (train, validation, test)
get_train_test_dfs¶
Performs a data comparison between two data profiles.
def get_train_test_dfs(self, data,*args, **kwargs) -> tuple[Any, Any]
Arguments:
data
(dict): The dictionary of split datasets
Returns:
data_out
(tuple): A (train
,test
) tuple containing the combined training and test datasets.