Skip to main content

Monitoring with Grafana

tip

Gain confidence in your data pipeline by integrating with monitoring and alerting from your vendor of choice!

With more than 45 vendors to choose from, OpenTelemetry is a natural choice for secure, performant, and flexible observability.

Datavolo Runtimes provide the OpenTelemetry Reporting Task that allows you to send all of your Flow's metrics to any OTLP-compatible endpoint.

Follow this guide to monitor your Datavolo Runtime with Grafana Cloud.

Overview

Through this guide, we'll complete the following steps:

  1. Register for a Grafana Cloud account
  2. Authorize your Datavolo Runtime with an Access Token for OTLP
  3. Configure your Datavolo Runtime to send metrics to Grafana Cloud
  4. Visualize your Datavolo Runtime metrics with a Grafana dashboard
  5. Alert for common data pipeline scenarios, like files stuck in connections

This generally follows the steps at https://grafana.com/docs/grafana-cloud/send-data/otlp/send-data-otlp/, then completes the integration by configuring a Datavolo Runtime to send those OTLP metrics.

Grafana Cloud Account

Grafana Cloud supports free, pay-as-you-go, and enterprise accounts. All account types support the same OTLP ingest.

Visit https://grafana.com for more information on how to register for a new account or sign in to an existing account.

Configure an OpenTelemetry Receiver

From the overview page of your Grafana Cloud account:

  1. Click the Configure button under the OpenTelemetry connector

  2. Copy the OTLP Endpoint and numeric Instance ID to use later in your Datavolo Runtime
  3. Click the Generate now link under Password / API Token
  4. Enter any Token name, such as datavolo_runtime, then create the token
  5. Copy the value of the generated token to use later in your Datavolo Runtime

Configure Datavolo Runtime

Datavolo Runtimes support Reporting Tasks that allow you to send metrics and other events to various external systems.

Datavolo's OpenTelemetryReportingTask sends OTLP-compatible metrics through a variety of authentication mechanisms.

  1. Login to your Datavolo Runtime

  2. Create an instance of the reporting task using the top-right menu > Controller Settings > Reporting Tasks

  3. Switch to the Reporting Tasks tab

  4. Click the + button to create a new instance of the OpenTelemetryReportingTask

  5. Select the OpenTelemetryReportingTask

  6. Click the Add button

    1. The Reporting Task will initially show as Invalid: Configure the reporting task
  7. Click the right-hand pencil icon to edit the properties of the Reporting Task.

  8. For each property, enter:

    • Export Endpoint
      • The endpoint from the Grafana OTLP Configuration with /v1/metrics appended to the end
      • For example, https://otlp-gateway-prod-us-east-0.grafana.net/otlp/v1/metrics
    • Export Protocol
      • HTTP
    • Authentication Type
      • Basic Authentication
    • Basic Authentication Username
      • The Instance ID from the Grafana OTLP Configuration
    • Basic Authentication Password
      • The API Token that was generated in the Grafana OTLP Configuration
    • Resource Attributes
      • Any set of key=value pairs that can identify your instance.
      • For example, service.name=datavolo,environment=production

    By default, the OpenTelemetryReportingTask will export metrics every 10 seconds. This may be too aggressive for some use cases. You can change that through the Settings tab with the Run Schedule field.

  9. When you're ready, click Apply in the bottom-right of the modal to save all of these changes.

  10. Start the Reporting Task using the play button on the right-hand side

info

Congratulations! Your Flow metrics are now being published to Grafana Cloud.

Running Reporting Task

Visualizing Flow Metrics

Getting started quickly? Import the Datavolo Runtime Overview dashboard (ID: 21172) and add your Prometheus data source created from the OpenTelemetry Configuration.

Datavolo Runtime Dashboard in Grafana

Need more? You can build your own Grafana dashboard with all of the metrics available from this Reporting Task.

Alerting on Flow Metrics

Grafana provides many ways to alert on metrics from your Datavolo Runtime.

One example is alerting whenever a FlowFile has been queued for longer than some period of time (e.g., 1 minute).

To receive a notification from Grafana whenever this happens:

  1. Navigate to the Alerts & IRM > Alerting > Alert Rules section
  2. Click "New Alert Rule"
  3. Enter a name, such as "Stuck File"
  4. Choose the Prometheus data source that contains your OTLP metrics.
    • For example, grafanacloud-username-prom
  5. Enter the query connection_queued_duration_max_millisecond
  6. Set the Threshold expression to Input A IS ABOVE 60000
  7. In the Set Evaluation Behavior section, create a new evaluation group named All Connections
  8. Further down in the Configure labels and notifications section, you can opt to receive an email whenever this alert fires.
  9. Once you're satisfied with the setup, click the Save rule and exit button in the top-right.
  10. Back on the Alert Rules page, you'll now see a preview of any pending alerts from connections with FlowFiles that have been queued for more than 1 minute.