PutMLflow
Description
Record metadata in MLflow. Reporting metadata for a MLflow run is typically done using multiple PutMLflow processors that are added throughout the pipeline. An initial PutMLflow processor can be configured at the start of the pipeline with 'Create New Run' set to 'true' and 'Attribute Type' set to 'Parameters'. A map of input parameters can be recorded in this processor using dynamic attributes. The 'mlflow.run.id' FlowFile attribute will be set which can be used to send additional metadata later in the pipeline. Additional PutMLflow processors can be added with 'Create New Run' set to 'false' and 'Run ID' set to '${mlflow.run.id}'. These additional processors can send additional parameters, metrics, or tags to the same run. Multiple metadata values can be specified by setting Dynamic Attributes.
Tags
MLflow, ai, databricks, learning, machine
Properties
In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Allowable Values | Description |
---|---|---|---|---|
Tracking URI * | Tracking URI | URI to a MLflow Tracking service. For Databricks' Managed MLflow, the URI will be https://<workspace_id>.cloud.databricks.com | ||
Authentication Method * | Authentication Method | None |
| Authentication method to use to authenticate to the tracking URI |
Username * | Username | Username for basic authentication This property is only considered if:
| ||
Password * | Password | Password for basic authentication This property is only considered if:
| ||
Databricks Token * | Databricks Token | Databricks' Managed MLflow Token. Obtain an access token from your Databricks account. This is the required authentication if using Databricks' Managed MLflow This property is only considered if:
| ||
Create New Run | Create New Run | false |
| Whether to create a new run or use an existing run. |
Experiment Reference * | Experiment Reference | Name |
| Method to reference an experiment. Set to ID to create a run under an existing experiment. Set to Name to either use a an existing experiment with matching name or create a new named experiment. This property is only considered if:
|
Experiment ID * | Experiment ID | Experiment ID to create the new run in. Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| ||
Experiment Name * | Experiment Name | Experiment name to create the new run in, if the experiment name does not exists, a new experiment is created.On Databricks Managed MLflow, an experiment name must be an absolute path within the Databricks workspace, e.g. '/Users/<some-username>/my-experiment' Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| ||
Run Name | Run Name | If set, creates the run with the given name. Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| ||
Run ID * | Run ID | ${mlflow.run.id} | Run ID of an existing run to record metadata. Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| |
Attribute Type * | Attribute Type | Parameters |
| Whether to record Metrics, Parameters, or Tags. Metrics measure an output of the flow. Parameters track inputs to the flow. Tags are key/value labels that can be attached to runs. Use dynamic attributes to record metadata. |
Run Status | Run Status |
| Mark the run as terminated with the corresponding run status. Leave unset if the run is not completed. |
Dynamic Properties
This component does not support dynamic properties.
Relationships
Name | Description |
---|---|
failure | MLflow failure relationship |
success | MLflow success relationship |
Reads Attributes
This processor does not read attributes.
Writes Attributes
Name | Description |
---|---|
mlflow.run.id | The run id is set if a new run was created. |
State Management
This component does not store state.
Restricted
This component is not restricted.
Input Requirement
Input requirements are not specified for this component.
Example Use Cases Involving Other Components
Multiprocessor Use Case 1
Record parameters and metrics for a single MLflow run in a Databricks managed MLflow workspace. In the Databricks workspace create a new experiment and record the name. This will allow pipeline metrics to be linked to pipeline tuning parameters.
Components Involved
- PutMLflow
- Set 'Tracking URI' to
https://<workspace_id>.cloud.databricks.com
- Configure your authentication method
- Set 'Create New Run' to
true
- Set 'Experiment Reference' to
Name
- Set 'Experiment Name' to
/Users/<databricks_username>/<experiment name>
- Set 'Attribute Type' to
Parameters
- Set a list of dynamic attributes to record how the pipeline is configured
- Set 'Run Status' to
Running
- Set 'Tracking URI' to
- PutMLflow
- Set 'Tracking URI' to
https://<workspace_id>.cloud.databricks.com
- Configure your authentication method
- Set 'Create New Run' to
false
- Set 'Run ID' to
${mlflow.run.id}
- Set "Attribute Type" to
Metrics
- Set a list of dynamic attributes to record any output metrics from the pipeline
- Set 'Run Status' to
Finished
- Set 'Tracking URI' to
System Resource Considerations
This component does not specify system resource considerations.