Skip to main content

PutMLflow

Description

Record metadata in MLflow. Reporting metadata for a MLflow run is typically done using multiple PutMLflow processors that are added throughout the pipeline. An initial PutMLflow processor can be configured at the start of the pipeline with 'Create New Run' set to 'true' and 'Attribute Type' set to 'Parameters'. A map of input parameters can be recorded in this processor using dynamic attributes. The 'mlflow.run.id' FlowFile attribute will be set which can be used to send additional metadata later in the pipeline. Additional PutMLflow processors can be added with 'Create New Run' set to 'false' and 'Run ID' set to '${mlflow.run.id}'. These additional processors can send additional parameters, metrics, or tags to the same run. Multiple metadata values can be specified by setting Dynamic Attributes.

Tags

MLflow, ai, databricks, learning, machine

Properties

In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Tracking URI *Tracking URIURI to a MLflow Tracking service. For Databricks' Managed MLflow, the URI will be https://<workspace_id>.cloud.databricks.com
Authentication Method *Authentication MethodNone
  • None
  • Basic
  • Databricks Token
Authentication method to use to authenticate to the tracking URI
Username *UsernameUsername for basic authentication

This property is only considered if:
  • the property Authentication Method has a value of BASIC
Password *PasswordPassword for basic authentication

This property is only considered if:
  • the property Authentication Method has a value of BASIC
Databricks Token *Databricks TokenDatabricks' Managed MLflow Token. Obtain an access token from your Databricks account. This is the required authentication if using Databricks' Managed MLflow

This property is only considered if:
  • the property Authentication Method has a value of DATABRICKS_TOKEN
Create New RunCreate New Runfalse
  • true
  • false
Whether to create a new run or use an existing run.
Experiment Reference *Experiment ReferenceName
  • ID
  • Name
Method to reference an experiment. Set to ID to create a run under an existing experiment. Set to Name to either use a an existing experiment with matching name or create a new named experiment.

This property is only considered if:
  • the property Create New Run has a value of true
Experiment ID *Experiment IDExperiment ID to create the new run in.

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Experiment Reference has a value of ID
Experiment Name *Experiment NameExperiment name to create the new run in, if the experiment name does not exists, a new experiment is created.On Databricks Managed MLflow, an experiment name must be an absolute path within the Databricks workspace, e.g. '/Users/<some-username>/my-experiment'

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Experiment Reference has a value of Name
Run NameRun NameIf set, creates the run with the given name.

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Create New Run has a value of true
Run ID *Run ID${mlflow.run.id}Run ID of an existing run to record metadata.

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Create New Run has a value of false
Attribute Type *Attribute TypeParameters
  • Metrics
  • Parameters
  • Tag
Whether to record Metrics, Parameters, or Tags. Metrics measure an output of the flow. Parameters track inputs to the flow. Tags are key/value labels that can be attached to runs. Use dynamic attributes to record metadata.
Run StatusRun Status
  • Running
  • Scheduled
  • Finished
  • Failed
  • Killed
Mark the run as terminated with the corresponding run status. Leave unset if the run is not completed.

Dynamic Properties

This component does not support dynamic properties.

Relationships

NameDescription
failureMLflow failure relationship
successMLflow success relationship

Reads Attributes

This processor does not read attributes.

Writes Attributes

NameDescription
mlflow.run.idThe run id is set if a new run was created.

State Management

This component does not store state.

Restricted

This component is not restricted.

Input Requirement

Input requirements are not specified for this component.

Example Use Cases Involving Other Components

Multiprocessor Use Case 1

Record parameters and metrics for a single MLflow run in a Databricks managed MLflow workspace. In the Databricks workspace create a new experiment and record the name. This will allow pipeline metrics to be linked to pipeline tuning parameters.

Components Involved

  • PutMLflow
    1. Set 'Tracking URI' to https://<workspace_id>.cloud.databricks.com
    2. Configure your authentication method
    3. Set 'Create New Run' to true
    4. Set 'Experiment Reference' to Name
    5. Set 'Experiment Name' to /Users/<databricks_username>/<experiment name>
    6. Set 'Attribute Type' to Parameters
    7. Set a list of dynamic attributes to record how the pipeline is configured
    8. Set 'Run Status' to Running
  • PutMLflow
    1. Set 'Tracking URI' to https://<workspace_id>.cloud.databricks.com
    2. Configure your authentication method
    3. Set 'Create New Run' to false
    4. Set 'Run ID' to ${mlflow.run.id}
    5. Set "Attribute Type" to Metrics
    6. Set a list of dynamic attributes to record any output metrics from the pipeline
    7. Set 'Run Status' to Finished

System Resource Considerations

This component does not specify system resource considerations.

See Also