Skip to content

BaseDataSplitter

Overview

A data_splitter is a component that takes a dataset and splits it into training, validation, and test datasets. This is intended to properly split data for use in machine learning workflows, dependent upon the problem_type of the workflow.

Attributes

BaseDataSplitter contains no default attributes.

Configuration

BaseDataSplitter contains no default or required configuration.

Interface

The following methods are part of BaseDataSplitter and should be implemented in any class that inherits from this base class:

spit_data

Performs a data split on the given data.

def split_data(self, data, *args, **kwargs) -> dict[str, Any]

Arguments:

  • data (object): The data to split.

Returns:

  • data_out (dict): A dictionary containing the split datasets (train, validation, test)

get_train_test_dfs

Performs a data comparison between two data profiles.

def get_train_test_dfs(self, data,*args, **kwargs) -> tuple[Any, Any]

Arguments:

  • data (dict): The dictionary of split datasets

Returns:

  • data_out (tuple): A (train, test) tuple containing the combined training and test datasets.