Skip to content

BaseModelTrainer

Overview

A model_trainer is a component that essentially acts as a wrapper around a library that trains a machine learning model. It is expected that this class can fit a model to data and also predict the values of new data.

Attributes

BaseModelTrainer contains the following default attributes:

  • model: The trained model object. This should get set in the fit function.
  • feature_transformer: The feature transformer used to transform data before passing it to the model. This is optional. The feature_transformer can be set specifically for each ModelTrainer class used in a workflow, or at the pipeline level which will be used as the default for all ModelTrainer classes if not overridden.
  • mlflow_module: The name of the MLFlow submodule which contains the proper log_model method for this trainer. This is only needed if you intend to use MLFlow as your model repository
  • params: The training parameters for the trained model.

Configuration

Required Configuration

BaseModelTrainer contains the following required components:

  • metadata_tracker
  • metrics_tracker
  • resource_version_control

and the following required configuration:

  • metrics: A list of metrics to compute.
  • perf_metric: The metric to use for determining which experiment contains the best model.

Interface

The following methods are part of BaseModelTrainer and should be implemented in any class that inherits from this base class:

fit

Fits a model to the provided data.

def fit(self, data, *args, **kwargs) -> Any

Arguments:

  • data (object): The data dictionary containing train/test/validation data.

Returns:

  • model (object): The fitted model

predict

Uses the fitted model to make predictions on data.

def predict(self, data, *args, **kwargs) -> Any

Arguments:

  • data (object): Data dictionary containing test/train/validation data.

Returns:

  • A data dictionary containing predictions for all classes.

predict_df

Makes prediction on a single pandas dataframe.

def predict_df(self, df) -> Any:

Arguments:

  • df (object): pandas.DataFrame object to use to predict data.

Returns:

  • pandas.DataFrame containing the predictions.

predict_proba_df

Makes class predictions on a single pandas dataframe. Only applicable to classification problems.

def predict_proba_df(self, df) -> Any:

Arguments:

  • df (object): pandas.DataFrame object to use to predict data.

Returns:

  • pandas.DataFrame containing the predictions.

get_artifacts

Returns artifacts created during model training. Note that not all frameworks create artifacts, but this is an entry point to access them if they do.

def get_artifacts(self, id) -> dict[str,Any]:

Arguments:

  • id (str): Experiment id

Returns:

  • dictionary containing all relevant artifacts

Default Methods

The following methods are implemented in the base class. You may find a need to overwrite them as you implement your own model trainers.

save

Saves the trained model into resource_version_control and logs information into the metadata_tracker

def save(self, experiment, *args, **kwargs)

Arguments:

  • experiment (str): The experiment to use to save the model into.

_set_model

Setter method for self.model.

def _set_model(self, model):

Arguments:

  • model (object): Model object to assign to self.model

_get_model

Getter method for self.model

def _get_model(self) -> Any

Returns:

  • self.model

calculate_metrics

Calculates performance metrics ont he trained model.

def calculate_metrics(self, data, predictions, metrics, **kwargs) -> dict[str, float]

Arguments:

  • data (dict): dictonary of train/test/validation data
  • predictions (object):
  • metrics (list): list of metrics to calculate. Supported metrics include:
    • accuracy
    • f1
    • rocauc
    • prauc
    • precision
    • recall
    • mse
    • rmse
    • mae
    • mape
    • mdae
    • smape
    • r2
    • msle
    • rmsle

Returns:

  • metrics_out: dictionary of computed metrics

build_model

Simple workflow to build a log a model. This is typically called from withing a hyperparameter_tuner, or just on its own if hyperparameter tuning is not used.

def build_model(self, data, model_version, *args, **kwargs) -> tuple[Any, Any]

Arguments:

  • data (object): dictionary of training/test/valiadation data.
  • model_version (object): model version object

Returns:

  • model: the trained model
  • exp: experiment where the model was trained

rebuild_model

Rebuilds a model using all available data

def rebuild_model(self, data, model_version, *args, **kwargs) -> tuple[Any, Any]

Arguments:

  • data (object): dictionary of training/test/validation data.
  • model_version (object): model version object

Returns:

  • model: the trained model
  • exp: experiment where the model was trained

transform_and_fit

Transforms data using a feature transform and then fits the model to the transformed data.

python def transform_and_fit(self, data_dict, *args, **kwargs)

Arguments:

  • data (object): dictionary of training/test/validation data.

Returns:

  • model: the trained model

transform_and_predict

Transforms data using a feature transform and then creates predictions from the transformed data.

python def transform_and_predict(self, data, *args, **kwargs)

Arguments:

  • data (object): dictionary of training/test/validation data.

Returns:

  • predictions: the predictions

transform_and_predict_df

Transforms a single dataframe using a feature transform and then creates predictions from the transformed dataframe.

python def transform_and_predict_df(self, data, *args, **kwargs)

Arguments:

  • data (object): dataframe

Returns:

  • predictions: the predictions

transform_and_predict_proba_df

Transforms a single dataframe using a feature transform and then creates class predictions predictions from the transformed dataframe.

python def transform_and_predict_proba_df(self, data, *args, **kwargs)

Arguments:

  • data (object): dataframe

Returns:

  • predictions: the predictions

fit_transform_data

Fits feature transformer to data and then transforms that data using the fitted transformer.

python def fit_transform_data(self, X_data, y_data, *args, **kwargs)

Arguments:

  • X_data (object): Feature data to fit & transform
  • y_data (object): Label data.

Returns:

  • transformed_data: the transformed data

fit_data

Fits feature transformer to data

python def fit_data(self, X_data, y_data *args, **kwargs)

Arguments:

  • X_data (object): Feature data to fit & transform
  • y_data (object): Label data.

Returns:

  • feature_transformer: the fitted feature transformer

transform_data

Transforms a single dataframe using a feature transform.

python def transform_data(self, data, *args, **kwargs)

Arguments:

  • data (object): dataframe

Returns:

  • transformed_data: the transformed_data

_transform_dict

Transforms a dictionary of train/test/validation data sets.

python def transform_data(self, data_dict, *args, **kwargs)

Arguments:

  • data_dict (dictionary): dictionary of train/test/validation data sts.

Returns:

  • transformed_data_dict: returns the same dictionary, now with transformed data

_get_transformer

Returns the model's feature transformer

python def _get_transformer_(self)

Returns:

  • self.feature_transformer: the model's feature transformer

_set_transformer

Sets the model's feature transformer

python def _set_transformer_(self, transformer)

Arguments:

  • transformer (object): The feature transformer to set for the model trainer.

Returns:

  • None