BaseModelTrainer
Overview¶
A model_trainer
is a component that essentially acts as a wrapper around a library that trains a machine learning model. It is expected that this class can fit
a model to data and also predict
the values of new data.
Attributes¶
BaseModelTrainer
contains the following default attributes:
model
: The trained model object. This should get set in thefit
function.feature_transformer
: The feature transformer used to transform data before passing it to the model. This is optional. Thefeature_transformer
can be set specifically for eachModelTrainer
class used in a workflow, or at thepipeline
level which will be used as the default for allModelTrainer
classes if not overridden.mlflow_module
: The name of the MLFlow submodule which contains the properlog_model
method for this trainer. This is only needed if you intend to use MLFlow as your model repositoryparams
: The training parameters for the trained model.
Configuration¶
Required Configuration¶
BaseModelTrainer
contains the following required components:
metadata_tracker
metrics_tracker
resource_version_control
and the following required configuration:
metrics
: A list of metrics to compute.perf_metric
: The metric to use for determining which experiment contains the best model.
Interface¶
The following methods are part of BaseModelTrainer
and should be implemented in any class that inherits from this base class:
fit¶
Fits a model to the provided data.
def fit(self, data, *args, **kwargs) -> Any
Arguments:
data
(object): The data dictionary containing train/test/validation data.
Returns:
model
(object): The fitted model
predict¶
Uses the fitted model to make predictions on data.
def predict(self, data, *args, **kwargs) -> Any
Arguments:
data
(object): Data dictionary containing test/train/validation data.
Returns:
- A data dictionary containing predictions for all classes.
predict_df¶
Makes prediction on a single pandas dataframe.
def predict_df(self, df) -> Any:
Arguments:
df
(object): pandas.DataFrame object to use to predict data.
Returns:
- pandas.DataFrame containing the predictions.
predict_proba_df¶
Makes class predictions on a single pandas dataframe. Only applicable to classification problems.
def predict_proba_df(self, df) -> Any:
Arguments:
df
(object): pandas.DataFrame object to use to predict data.
Returns:
- pandas.DataFrame containing the predictions.
get_artifacts¶
Returns artifacts created during model training. Note that not all frameworks create artifacts, but this is an entry point to access them if they do.
def get_artifacts(self, id) -> dict[str,Any]:
Arguments:
id
(str): Experiment id
Returns:
- dictionary containing all relevant artifacts
Default Methods¶
The following methods are implemented in the base class. You may find a need to overwrite them as you implement your own model trainers.
save¶
Saves the trained model into resource_version_control
and logs information into the metadata_tracker
def save(self, experiment, *args, **kwargs)
Arguments:
experiment
(str): The experiment to use to save the model into.
_set_model¶
Setter method for self.model
.
def _set_model(self, model):
Arguments:
model
(object): Model object to assign toself.model
_get_model¶
Getter method for self.model
def _get_model(self) -> Any
Returns:
self.model
calculate_metrics¶
Calculates performance metrics ont he trained model.
def calculate_metrics(self, data, predictions, metrics, **kwargs) -> dict[str, float]
Arguments:
data
(dict): dictonary of train/test/validation datapredictions
(object):metrics
(list): list of metrics to calculate. Supported metrics include:accuracy
f1
rocauc
prauc
precision
recall
mse
rmse
mae
mape
mdae
smape
r2
msle
rmsle
Returns:
metrics_out
: dictionary of computed metrics
build_model¶
Simple workflow to build a log a model. This is typically called from withing a hyperparameter_tuner
, or just on its own if hyperparameter tuning is not used.
def build_model(self, data, model_version, *args, **kwargs) -> tuple[Any, Any]
Arguments:
data
(object): dictionary of training/test/valiadation data.model_version
(object): model version object
Returns:
model
: the trained modelexp
: experiment where the model was trained
rebuild_model¶
Rebuilds a model using all available data
def rebuild_model(self, data, model_version, *args, **kwargs) -> tuple[Any, Any]
Arguments:
data
(object): dictionary of training/test/validation data.model_version
(object): model version object
Returns:
model
: the trained modelexp
: experiment where the model was trained
transform_and_fit¶
Transforms data using a feature transform and then fits the model to the transformed data.
python
def transform_and_fit(self, data_dict, *args, **kwargs)
Arguments:
data
(object): dictionary of training/test/validation data.
Returns:
model
: the trained model
transform_and_predict¶
Transforms data using a feature transform and then creates predictions from the transformed data.
python
def transform_and_predict(self, data, *args, **kwargs)
Arguments:
data
(object): dictionary of training/test/validation data.
Returns:
predictions
: the predictions
transform_and_predict_df¶
Transforms a single dataframe using a feature transform and then creates predictions from the transformed dataframe.
python
def transform_and_predict_df(self, data, *args, **kwargs)
Arguments:
data
(object): dataframe
Returns:
predictions
: the predictions
transform_and_predict_proba_df¶
Transforms a single dataframe using a feature transform and then creates class predictions predictions from the transformed dataframe.
python
def transform_and_predict_proba_df(self, data, *args, **kwargs)
Arguments:
data
(object): dataframe
Returns:
predictions
: the predictions
fit_transform_data¶
Fits feature transformer to data and then transforms that data using the fitted transformer.
python
def fit_transform_data(self, X_data, y_data, *args, **kwargs)
Arguments:
X_data
(object): Feature data to fit & transformy_data
(object): Label data.
Returns:
transformed_data
: the transformed data
fit_data¶
Fits feature transformer to data
python
def fit_data(self, X_data, y_data *args, **kwargs)
Arguments:
X_data
(object): Feature data to fit & transformy_data
(object): Label data.
Returns:
feature_transformer
: the fitted feature transformer
transform_data¶
Transforms a single dataframe using a feature transform.
python
def transform_data(self, data, *args, **kwargs)
Arguments:
data
(object): dataframe
Returns:
transformed_data
: the transformed_data
_transform_dict¶
Transforms a dictionary of train/test/validation data sets.
python
def transform_data(self, data_dict, *args, **kwargs)
Arguments:
data_dict
(dictionary): dictionary of train/test/validation data sts.
Returns:
transformed_data_dict
: returns the same dictionary, now with transformed data
_get_transformer¶
Returns the model's feature transformer
python
def _get_transformer_(self)
Returns:
self.feature_transformer
: the model's feature transformer
_set_transformer¶
Sets the model's feature transformer
python
def _set_transformer_(self, transformer)
Arguments:
transformer
(object): The feature transformer to set for the model trainer.
Returns:
- None