BaseDataSynthesizer
Overview¶
A data_synthesizer
is a component that is able to synthesizer a dataset given a sample dataset. The intention it to allow building of datasets of arbitrarily large size given a representative sample.
Attributes¶
BaseDataSynthesizer
contains no default attributes.
Configuration¶
BaseDataSynthesizer
contains no default or required configuration.
Interface¶
The following methods are part of BaseDataSynthesizer
and should be implemented in any class that inherits from this base class:
spit_data¶
Performs a data split on the given data.
def load_data(self, source_file, *args, **kwargs) -> tupe[Any, Any]
Arguments:
source_file
(object): File location of the source data to use as a sample for synthesis.
Returns:
data
(object): An object representing the loaded data. Very like a pandas.DataFrame.metadata
(Any): A python object containing metadata about the loaded data.
model_data¶
Creates a model for synthesizing data given a sample.
def model_data(self, data, *args, **kwargs) -> Any
Arguments:
data
(object): Data to model.
Returns:
model
(Any): A model that can synthesize new data.
sample_data¶
Generates new sample data given a synthetic model.
def sample_data(self, model, num_rows, *args, **kwargs) -> Any
Arguments:
model
(object): The synthetic model to use.num_rows
(int): The number of rows to generate.
Returns:
data
(object): The generated data. Most likely apandas.DataFrame
, or similar.
evaluate_data¶
Evaluates synthetic data given a sample of real data.
def evaluate_data(self, real_data, synthetic_data, *args, **kwargs) -> list[Any]:
Arguments:
real_data
(object): Real data from the original dataset.synthetic_data
(object): A sample of synthetic data, likely generated fromsample_data
.
Returns:
list
(object): A list of reports generated to evaluate the model.