Skip to main content

Packaging Python Processors

Introduction

Apache NiFi version 2.0.0-M1 introduced support for Processors written using Python. NiFi 2.0.0-M3 added support for loading Python Processors from NAR files, bundling Processors and dependencies together.

Summary

Hatch is a project management tool for Python that supports common development lifecycle operations using a straightforward command-line interface. In addition to supporting standard Python packaging formats, Hatch provides a configurable plugin system for other types of distribution strategies.

The Hatch Datavolo NAR plugin supports building packaged archives for Apache NiFi. Adding the plugin to the build system configuration of a Python project using Hatch provides a new build target for packaging Processors and dependencies in a NAR.

Prerequisites

Packaging Python Processors with Hatch and the NAR plugin requires Python 3.11 or later. Python binaries are available through package managers for popular operating systems.

The Hatch installation walkthrough includes instructions for Linux distributions, macOS, and Windows.

The pipx command also supports installing Hatch regardless of operating system.

pipx install hatch

Packaging

Creating an Apache NiFi NAR with a Python Processor involves three steps:

  1. Create a project with Hatch
  2. Configure the project with the Hatch Datavolo NAR
  3. Build the project with the NAR target specified

Project Creation

The Hatch new command supports creating a new project from templates.

hatch new processors

The command creates a project directory structure in the current directory using the src layout strategy and prints the details.

processors
├── src
│ └── processors
│ ├── __about__.py
│ └── __init__.py
├── tests
│ └── __init__.py
├── LICENSE.txt
├── README.md
└── pyproject.toml

Apache NiFi Python Processors should be added to the src/processors directory for subsequent packaging.

Project Configuration

The pyproject.toml contains standard metadata and should be updated to reflect current project information.

Configuring the Hatch Datavolo NAR plugin requires updating the build-system.requires field to include the hatch-datavolo-nar dependency.

[build-system]
requires = ["hatchling", "hatch-datavolo-nar"]

Adding the hatch-datavolo-nar dependency enables the nar target argument for the Hatch build command.

The NAR plugin needs to have a defined set of source directories when building bundle files. The Python project configuration should include a tool section with a packages field that matches the source directory containing Python Processors.

[tool.hatch.build.targets.nar]
packages = ["src/processors"]

The NAR plugin uses the dependencies field in the [project] section to retrieve and package external libraries. The field should be empty for Python Processors that do not need additional dependencies.

The default pyproject.toml configuration reads the version number from the __about__.py file in the source directory.

The following pyproject.toml configuration omits other metadata, but includes the minimum information required for building a NAR.

[build-system]
requires = ["hatchling", "hatch-datavolo-nar"]
build-backend = "hatchling.build"

[project]
name = "processors"
dynamic = ["version"]
dependencies = []

[tool.hatch.version]
path = "src/processors/__about__.py"

[tool.hatch.build.targets.nar]
packages = ["src/processors"]

Building

The Hatch build command supports Python source and distribution packaging in absence of any arguments.

With the Hatch Datavolo NAR plugin configured, the --target argument supports nar for Apache NiFi NAR bundles.

hatch build --target nar

The command prints specified build targets and also lists files produced.

dist/processors-0.0.1.nar

The packaged NAR can be loaded on a system that provides a compatible runtime version of Python.

The Apache NiFi Python Developer's Guide provides details on deploying a packaged NAR.

Apache NiFi 2.0.0-M3 or later is required for loading Python Processor NAR packages.