Integration Frameworks¶
One of the key tenets of lolpop is that workflows can be designed and built using integrations that slide into specific layers of abstraction. Users can collaborate and build workflows based on their knowledge of the system and expertise with different tools. I.E. low-level users might wish to build and implement integrations into model training frameworks, metadata trackers, prediction monitoring tools, etc., whereas higher level users might use those same integrations to develop higher level workflows that take various data sets and solve specific use cases.
lolpop also believes strongly in the system being extensible -- meaning that users can build extensions to accomplish whatever it is that is needed and easily slot that into existing or new workloads. We believe everything should be extensible, from the integrations and tests used, to the CLI experience itself. This also applies to the layers of abstraction within the workflow itself, which we call the integration framework
.
In this section we'll discuss lolpop's integration framework and illustrate how to customize this for users who wish to have additional flexibility on their workflows.
Note
Using custom integration frameworks is considered experimental and should be done cautiously. Let us know if you experiment with this and what your impressions are.
Default Integration Framework¶
lolpop leverages an existing integration framework that is particularly well suited for machine learning workflows. In particular, the layers of abstraction are:
-
Component
: Components are low-level building blocks that typically define integration with an external library, such assklearn
,mlflow
,evidentlyai
, etc. Components do most of the heavy lifting in understanding how to leverage these libraries to execute atomic functionality (i.e. training a model artifact, logging a metric, comparing the performance of two models, etc.). Components implement a common set of APIs according to their type. -
Pipeline
: Pipelines use any number of components to execute a task that is more complex than any single component can handle. This might include training a model -- where features need to be created (via a feature transformer), a model artifact needs to be created (via a model trainer), the model needs to be versioned (via a metadata tracker). Pipelines execute work via component APIs such that components can be seamlessly swapped in and out of workflows without needed to rewrite code. -
Runner
: Runners execute end-to-end workflows and are meant to match up to external orchestration tools. Runner typically work with one or more pipelines to execute work. This might include refreshing a model by grabbing the latest data, creating features, training a model, and comparing performance metrics to the currently deployed model to determine if a promotion is required. Runners typically leverage pipeline APIs so that workflows can be execute regardless of the pipelines used to execute them.
lolpop's default integration framework also comes with a heirarchy, in which:
-
Pipelines and Components are children of Runners.
-
Components can additionally also be children of Pipelines.
This can be represented visually via:
runner: component: pipeline: component:
Users need not do anything in order to use this integration framework. It is the default behavior by lolpop and will be implemented in lieu of any other framework.
Defining Custom Integration Frameworks¶
lolpop allows users to specify their own integration framework and will process it accordingly. In order to specify a custom integration framework, simply input integration_framework:
in your workflow yaml
file with the desired framework:
integration_framework:
<insert framework here...>
<normal configuration...>
For example, the default integration framework would look like this:
integration_framework:
runner:
component:
pipeline:
component:
Let's imagine a world where our components are very complex. They are so complex that we need to abstract away functionality into "widgets". If we wished to build an integration framework that includes widgets, we would need to specify that in our integration_framework as follows:
integration_framework:
runner:
component:
widget:
pipeline:
component:
widget:
Behavior of the Integration Framework¶
lolpop's integration frameworks suppors any type of (yaml-compatible) name for your integration framework abstractions, but there are a few assumptions that lolpop makes that you should be aware of if you travel down this path.
-
The framework must have a single root node. I.E. in our example, our root is
runner
. lolpop does not currently support having multiple nodes at the root contact us if you have an example where multiple roots are needed. -
The names used in your integration framework correspond to section names in your
yaml
definition. I.E. using the default framework, we might have a yaml file that looks like this:pipeline: process : MyProcessPipeline train : MyTrainingPipeline predict : MyPredictionPipeline component: metadata_tracker : MLFlowMetadataTracker config: train_data: /path/to/train.csv eval_data: /path/to/test.csv
If you include the default integration framework, you should notice that framework names match up exactly w/ the yaml section names:
Your custom integration framework works exactly the same. You can build an integration framework using abstraction layerintegration_framework: runner: component: pipeline: component: pipeline: process : MyProcessPipeline train : MyTrainingPipeline predict : MyPredictionPipeline component: metadata_tracker : MLFlowMetadataTracker config: train_data: /path/to/train.csv eval_data: /path/to/test.csv
integration_type
, and then define that in the configuration by having a section forintegration_type
as well. -
Classes defined in your custom sections are found using standard lolpop search paths. As an example, assuming you have an integration layer
widget
with classMyWidget
, i.e.:integration_framework: runner: widget: component: widget: ... widget: some_cool_widget: MyWidget ...
lolpo will search for
MyWidget
along the pathlolpop.widget
andlolpop.extension.widget
.MyWidget
would then be initialized and accessible under thesome_cool_widget
attribute of the parent (in this case, therunner
object). For most use cases, we should expect users to create a custom extension for non-default abstraction layers in the integration framework.In the future we may define custom search paths for each layer of the integration framework. (contact us if this is something that interest you. (Technically, this is already possible, just not properly documented/tested at the moment))
-
The order of sibling nodes is important in the integration framework. The default behavior is as follows:
a. Siblings are processed in the order written.
b. After a node is processed, it is passed into sibling nodes. This means that it will be processed as a dependent integration and will be accessible to children of that node (via standard attribute assignment). For example, in the default setup, we have the following:
Default Integration Frameworkintegration_framework: runner: component: pipeline: component:
In this scenario, runner components are processed first, and all runner components are passed into each pipeline. This effectively scopes the runner components are "global", as all pipelines can access them. Pipelines may additionally have pipeline specific components. By reversing the order, we create a different situation:
A (likely) Bad Integration Frameworkintegration_framework: runner: pipeline: component: component:
This scenario would process
pipeline
beforecomponent
. Pipelines would then be passed into each runner component. This might be desirable in some use cases, but it is not the intention behind the default integration framework.Note that you can prevent any layer of the integration framework from passing itself down to siblings via the configuration item
pass_integration_to_siblings
. This defaults toTrue
, but can be turned off via setting it toFalse
in the configuration for that layer. -
By default, all leaf integrations of the same type are aware of each other. For example, in our default framework, a
component
has access to all other components, but apipeline
does not know about other pipelines. This can be configured via the configuration valueupdate_peer_integrations
. The default value is to only enable this behavior for leaf nodes in your integration framework, but this can be overridden at each layer as needed. As an example, consider the following configuration:In the above, we've defined two global components and two pipelines. The components are leaf nodes in the default integration framework, meaning that they will, by default, be passed into each other. This means the following attributes will exist:pipeline: process : MyProcessPipeline train : MyTrainingPipeline predict : MyPredictionPipeline component: metadata_tracker : MLFlowMetadataTracker metrics_tracker : MLFlowMetricsTracker
runner.metadata_tracker.metrics_tracker
andrunner.metrics_tracker.metadata_tracker
. However, as pipelines are not leaf nodes, pipelines will not, by default, know about each other. I.E., there is norunner.process.train
attribute. -
All layers of the integration framework support having a
config
section. This will be accessible to all integrations via the standard_get_config
method.
Building Custom Integrations¶
To leverage customer integration frameworks, you'll likely need to build some custom integrations. This should be as simple as building some lolpop extensions that inherit the BaseIntegration
class. Please see the section on creating extensions for more information.