BaseMetadataTracker
Overview¶
A metadata_tracker
is a component that logs artifacts and metadata relative to a machine learning workflow run. A run can be viewed as a collection of actions that defines a full workflow, such as training a model, making predictions, etc. A metadata_tracker
is expected to have an internal concept of run which users can leverage, and log necessary information for tracking purposes into that run.
The BaseMetadataTracker
class attempts to provide a consistent interface that can be implemented by any specific type of metadata tracker.
Attributes¶
BaseMetadataTracker
contains the following default attributes:
url
: The url of the metadata tracker.
Configuration¶
Required Configuration¶
BaseMetadataTracker
has no required configuration.
Interface¶
The following methods are part of BaseMetadataTracker
and should be implemented in any class that inherits from this base class:
log_artifact¶
Logs an artifact into a resource in the metadata tracker.
def log_artifact(self, resource, id, path, *args, **kwargs):
Arguments:
resource
(object): The resource in the metadata tracker to log the artifact to.id
(str): A string representing the artifact ID.path
(str): The path to the artifact that needs to be saved.
Returns:
- None
get_artifact¶
Retrieves an artifact from the metadata tracker.
def get_artifact(self, resource, id, *args, **kwargs) -> Any:
Arguments:
resource
(object): The resource in the metadata trackerid
(str): A string representing the artifact ID.
Returns:
- None
log_tag¶
Logs a key-value tag to the metadata tracker resource.
def log_tag(self, resource, key, value, *args, **kwargs):
Arguments:
resource
(object): The resource in the metadata tracker to log the tag into.key
(str): The tag key.value
(str): The tag value.
Returns:
- None
get_tag¶
Retrieves a tage value from the metadata tracker.
def get_tag(self, resource, key, *args, **kwargs) -> Any:
Arguments:
resource
(object): The metadata tracker resource.key
(str): The tag key.
Returns:
- Any: The value of the tag with the specified
key
for the MLflow run associated with the givenresource
.
log_metadata¶
Logs metadata into the metadata tracker resource.
def log_metadata(self, resource, id, data, *args, **kwargs):
Arguments:
resource
(object): The metadata tracker resource.id
(str): A string representing the metadata ID.data
(any): The metadata to store.
Returns:
- None
get_metadata¶
Retrieves metadata from the metadata tracker resource.
def get_metadata(self, resource, id, *args, **kwargs) -> Any:
Arguments:
resource
(object): The metadata tracker resource.id
(str): A string representing the metadata ID.
Returns:
- any: The value of the metadata with the specified
id
for the MLflow run associated with the givenresource
.
create_resource¶
Creates a new resource in the metadata tracker.
def create_resource(self, id, type=None, *args, **kwargs) -> Any:
Arguments:
id
(str): A string ID of the resource.type
(str, optional): A string with the value "experiment" or None.
Returns:
- object: An object from the metadata tracker representing the resource.
get_resource¶
Retrieves a resource from the metadata tracker.
def get_resource(self, id, type, *args, **kwargs) -> Any:
Arguments:
id
(str): A string ID of the resource to retrieve.type
(str): A string with the value "experiment" or None.
Returns:
- object: The resource from the metadata_tracker
update_resource¶
Updates a resource in the metadata tracker.
def update_resource(self, resource, updates, *args, **kwargs) -> Any:
Arguments:
resource
(object): A tuple containing the resource id and the MLflow run object.updates
(dict): A dictionary containing the updated information.
Returns:
- The updated resource.
clean_resource¶
Cleans up a resource. The main use is by experiments to ensure that any resources craeted by the experiment have been properly delt with.
def clean_resource(self, resource, type, *args, **kwargs):
Arguments:
resource
(object): The resource to clean.type
(str): The type of resource.
Returns:
- None
get_prev_resource_version¶
Returns the previous version of a resource.
def get_prev_resource_version(self, resource, *args, **kwargs) -> Any:
Arguments:
resource
(object): A resource version.
Returns:
- object: The previous version of the resource.
get_currently_deployed_model_version¶
Returns the currently deployed model version.
def get_currently_deployed_model_version(self, model_version, *args, **kwargs) -> Any:
Arguments:
model_version
(object): The model version resource.
Returns:
- object: The currently deployed model version
get_prediction_job_model_version¶
Returns the model version used in a prediction job.
def get_prediction_job_model_version(self, prediction_job, *args, **kwargs) -> Any:
Arguments:
prediction_job
(object): The prediction job resource.
Returns:
- object: The model version resouce.
get_latest_model_resource¶
Returns the latest resource from a model resource. Most commonly probably used to return the latest prediction job created from a model resource.
def get_latest_model_resource(self, model, type, *args, **kwargs) -> Any:
Arguments:
model
(object): The model resource.type
(str): The type of resource to return the latest version of.
Returns:
- (object): The latest version of the
type
specified.
get_winning_experiment¶
Returns the winning experiment from a model version.
def get_winning_experiment(self, model_version, *args, **kwargs) -> Any:
Arguments:
model_version
(object): The model version object.
Returns:
- object: The winning experiment.
build_model_lineage¶
Logs dataset versions used to create the provided model version.
def build_model_lineage(self, model_version, dataset_versions, *args, **kwargs):
Arguments:
model_version
(object): The model version object.dataset_versions
(list[object]): A list of dataset version objects.
Returns:
- None
get_resource_id¶
Returns the id of the resource.
def get_resource_id(self, resource, *args, **kwargs) -> str:
Arguments:
resource
(object): The resource object.
Returns:
- str: The ID of the given resource.
get_parent_id¶
Returns the id of the parent resource, if any.
def get_parent_id(self, resource, type=None, *args, **kwargs) -> str:
Arguments:
resource
(object): The resource object.type
(str, optional): The resource type.
Returns:
- str: parent_id: str, or None
register_vc_resource¶
The register_vc_resource
method registers information received from a version control component into the metadata tracker. It saves information related to the version control system, such as commit information, to the specified resource
.
def register_vc_resource(self, resource, vc_info, key=None, additional_metadata={}, *args, **kwargs):
Arguments:
resource
(object): The resource to logvc_info
into.vc_info
(dict): A dictionary of information returned by the version control component.key
(str, optional): The key to append to values to be logged.additional_metadata
(dict, optional): Additional metadata to log.
Returns:
- None
get_vc_info¶
The get_vc_info
method returns the resource version control information that was previously logged. It retrieves version control information, such as commit details, associated with the specified resource
.
def get_vc_info(self, resource, key="hexsha", *args, **kwargs) -> dict[str, Any]:
Arguments:
resource
(object): The resource to fetchvc_info
from .key
(str, optional): The key used when savingvc_info
.
Returns:
- dict: Returns the information needed to retrieve an object from the version control system. This typically includes details like a git hexsha.
log_data_profile¶
The log_data_profile
method logs a data profile to the specified resource
.
def log_data_profile(self, resource, file_path, *args, **kwargs):
Arguments:
resource
(object): The resource to log the data profile into.file_path
(str): The file path of the data profile.
Returns:
- None
log_checks¶
The log_checks
method logs a data check into the specified resource
.
def log_checks(self, resource, file_path, *args, **kwargs):
Arguments:
resource
(object): The resource to log the data check into.file_path
(str): The file path of the data check.
Returns:
- None
log_data_comparison¶
The log_data_comparison
method logs a data comparison into the specified resource
.
def log_data_comparison(self, resource, file_path, *args, **kwargs):
Arguments:
resource
(object): The resource to log the data comparison into.file_path
(str): The file path of the data comparison.
Returns:
- None
stop¶
The stop
method stops the active run.
def stop(self, *args, **kwargs):
load_model¶
The load_model
method loads a model trainer object from the metadata tracker.
def load_model(self, model_obj, model_version, ref_model, *args, **kwargs) -> Any:
Arguments:
model_obj
(object): A fitted modelmodel_version
(str, run): The model version to use to retrieve the model trainerref_model
(object): A model trainer object to use as a reference. I.E. will have similar configs, etc
Returns:
model
: the model trainer object