MetaflowClassificationRunner¶

The MetaflowClassificationRunner class is a subclass of BaseRunner. It provides methods to run various pipelines and retrieve artifacts from those pipelines for classification tasks in Metaflow.

Configuration¶

Required Configuration¶

The ClassificationRunner class requires the following pipelines:

process
train
deploy
predict

and the following components:

metadata_tracker
metrics_tracker
resource_version_control

and the following configuration:

train_data: The training data set. This could be a path to the data or a table name, etc.
eval_data: The evaluation data set. This could be a path to the data or a table name, etc.
prediction_data: The prediction data set. This could be a path to the data or a table name, etc.
model_target: The column name that corresponds to the model target.
drop_columns: A list of columns to remove from data before training a model or running inference.

Methods¶

process_data¶

Runs the Metaflow process pipeline, retrieves artifacts from the pipeline, and returns the processed data.

def process_data(self, source="train")

Arguments:

source (str, optional): The source of data to process. Defaults to "train".

Returns:

data (dataframe): Dataframe with the processed data.
dataset_version (object): The dataset version object.

train_model¶

Runs the Metaflow train pipeline, retrieves artifacts from the pipeline, and returns the trained model artifacts.

train_model(self, data, dataset_version=None)

Arguments:

data (dataframe): Dataframe with input features and target variable.
dataset_version (object, optional): The dataset version object. Defaults to None.

Returns:

model_version (dict): Model version information.
model (sklearn.model): Trained model object.
is_new_model_better (bool): Boolean indicating whether the new model is better than the previous model.

deploy_model¶

Runs the Metaflow deploy pipeline, retrieves artifacts from the pipeline, and returns the deployment artifacts.

def deploy_model(self, model_version, model)

Arguments:

model_version (object): Model version object.
model (object): Trained model object.

Returns:

predict_data¶

Runs the Metaflow predict pipeline, retrieves artifacts from the pipeline, and returns the predicted data.

def predict_data(self, model_version, model, data, dataset_version)

Arguments:

model_version (object): Model version object.
model (object, optional): Trained model object. Defaults to None.
data (dataframe): Dataframe with input features.
dataset_version (object): The dataset version object.

Returns:

data (dataframe): Dataframe with predicted data.
prediction_job (object): The prediction job object.

evaluate_ground_truth¶

Evaluates the ground truth and prediction data, calculates metrics, and logs them.

def evaluate_ground_truth(self, prediction_job=None)

Arguments:

prediction_job (object, optional): The prediction job object. Defaults to None.

stop¶

Stops the metadata tracker.

def stop(self)

build_all¶

Runs the process, train, deploy & evaluation pipelines, retrieves artifacts from the pipelines, and logs the metrics.

def build_all(self)