Skip to content

MetaflowClassificationRunner

The MetaflowClassificationRunner class is a subclass of BaseRunner. It provides methods to run various pipelines and retrieve artifacts from those pipelines for classification tasks in Metaflow.

Configuration

Required Configuration

The ClassificationRunner class requires the following pipelines:

  • process
  • train
  • deploy
  • predict

and the following components:

  • metadata_tracker
  • metrics_tracker
  • resource_version_control

and the following configuration:

  • train_data: The training data set. This could be a path to the data or a table name, etc.
  • eval_data: The evaluation data set. This could be a path to the data or a table name, etc.
  • prediction_data: The prediction data set. This could be a path to the data or a table name, etc.
  • model_target: The column name that corresponds to the model target.
  • drop_columns: A list of columns to remove from data before training a model or running inference.

Methods

process_data

Runs the Metaflow process pipeline, retrieves artifacts from the pipeline, and returns the processed data.

def process_data(self, source="train")

Arguments:

  • source (str, optional): The source of data to process. Defaults to "train".

Returns:

  • data (dataframe): Dataframe with the processed data.
  • dataset_version (object): The dataset version object.

train_model

Runs the Metaflow train pipeline, retrieves artifacts from the pipeline, and returns the trained model artifacts.

train_model(self, data, dataset_version=None)

Arguments:

  • data (dataframe): Dataframe with input features and target variable.
  • dataset_version (object, optional): The dataset version object. Defaults to None.

Returns:

  • model_version (dict): Model version information.
  • model (sklearn.model): Trained model object.
  • is_new_model_better (bool): Boolean indicating whether the new model is better than the previous model.

deploy_model

Runs the Metaflow deploy pipeline, retrieves artifacts from the pipeline, and returns the deployment artifacts.

def deploy_model(self, model_version, model)

Arguments:

  • model_version (object): Model version object.
  • model (object): Trained model object.

Returns:

predict_data

Runs the Metaflow predict pipeline, retrieves artifacts from the pipeline, and returns the predicted data.

def predict_data(self, model_version, model, data, dataset_version)

Arguments:

  • model_version (object): Model version object.
  • model (object, optional): Trained model object. Defaults to None.
  • data (dataframe): Dataframe with input features.
  • dataset_version (object): The dataset version object.

Returns:

  • data (dataframe): Dataframe with predicted data.
  • prediction_job (object): The prediction job object.

evaluate_ground_truth

Evaluates the ground truth and prediction data, calculates metrics, and logs them.

def evaluate_ground_truth(self, prediction_job=None)

Arguments:

  • prediction_job (object, optional): The prediction job object. Defaults to None.

stop

Stops the metadata tracker.

def stop(self)

build_all

Runs the process, train, deploy & evaluation pipelines, retrieves artifacts from the pipelines, and logs the metrics.

def build_all(self)