dvcVersionControl¶
The dvcVersionControl class is a subclass of BaseResourceVersionControl and provides methods for versioning datasets and models using Data Version Control (DVC). It is used to manage the versioning of resources in a machine learning or data science project.
Configuration¶
Required Configuration¶
dvcVersionControl contains no required configuration.
Optional Configuration¶
dvcVersionControl contains no default configuration.
Default Configuration¶
dvcVersionControl contains no default configuration.
dvc_remote: the dvc remote to use, defaults tolocaldvc_dir: the directory corresponding to the dvc directory, defaults todvc/local_dir: a local directorydisable_git_commit: Disable all git commits in the workflow. This is meant to be used only in development/test settings. Defaults to False.disable_git_push: Disable all git pushes in the workflow. This is meant ot be used only in development/test settings. Defaults to False.git_path_to_dvc_dir: The path to the dvc directory relative to the git repository. This is needed if you executedvc init --subdirsuch that the dvc repo and the git repo are not at the same level. This is mainly intended to be used for testing purposes. Defaults toNone.
Note
The dvcVersionControl component assumes that the lolpop workflow is running from a git repository and that a subfolder is used to designate the dvc artifacts. User should configure which folder to use for the dvc_dir and the dvc_remote to use. If no dvc_remote is configured, lolpop will attempt to use dvc in local mode, which saves objects to a directory in your local file system. By default this wil be configured to be local_dir/dvc
Methods¶
version_data¶
This method versions the input dataset using DVC and outputs information about the versioned dataset, including the versioning ID and URI.
def version_data(self, dataset_version, data, key=None, *args, **kwargs)
Arguments:
dataset_version: (object), dataset version object to versiondata: (DataFrame), the dataset to versionkey: (str), optional, default: None, a unique identifier for the dataset version
Returns:
A dictionary containing dataset versioning information including the URI and hexsha.
get_data¶
This method retrieves the versioned dataset from DVC using information about the dataset version.
def get_data(self, dataset_version, vc_info=None, key=None, *args, **kwargs)
Arguments:
dataset_version: (object) dataset version objectvc_info: (dictionary), optional, containing versioning information for the datasetkey: (str), optional, default: None, a unique identifier for the dataset version
Returns:
A DataFrame containing the versioned dataset.
version_model¶
This method versions the input model using DVC and outputs information about the versioned model, including the versioning ID and URI.
def version_model(self, experiment, model, algo=None, key=None, *args, **kwargs)
Arguments:
experiment: (object), experiment object being versionedmodel: (object), the model to versionalgo: (str), the algorithm used to train the modelkey: (str), optional, default: None, a unique identifier for the model version
Returns:
A dictionary containing model versioning information including the URI and hexsha.
get_model¶
This method retrieves the versioned model using versioning information about the model.
def get_model(self, experiment, key=None, args, kwargs)
Arguments:
experiment: (object), experiment whose model we are retrievingkey: (str), optional, default: None, a unique identifier for the model version
Returns: The versioned model.
version_feature_transformer¶
Version the input feature transformer using dvc (Data Version Control) and output information about the versioned transformer including versioning id and URI.
def version_feature_transformer(self, experiment, transformer, transformer_class=None, key=None, *args, **kwargs)
Arguments:
experiment(object): The experiment to version intotransformer(object): The feature transformer to versiontransformer_class(str): The name of the feature class. Defaults to None.key(str): String to append to the experiment id which is used for the file name of the transformer object. Defaults to None.
Returns:
- A dictionary containing model versioning information including the URI and hexsha.
get_feature_transformer¶
Get the versioned feature transformer using versioning information.
def get_feature_transformer(self, experiment, key=None *args, **kwargs) -> Any
Arguments:
experiment(object): The experiment to retrieve the feature_transformer fromkey(str): String to append to the experiment id which is used for the file name of the transformer object. Defaults to None.
Returns:
feature_transformer: The feature transformer object from the experiment.
Usage¶
from lolpop.component import dvcVersionControl
... # create data and dataset_version
config = {
#insert component config
}
rvc = dvcVersionControl(conf=config)
vc_info = rvc.version_data(dataset_version, data)
df = rvc.get_data(dataset_verion, vc_info=vc_info)