Packaging Python Processors
Introduction
Apache NiFi version 2.0.0-M1 introduced support for Processors written using Python. NiFi 2.0.0-M3 added support for loading Python Processors from NAR files, bundling Processors and dependencies together.
Summary
Hatch is a project management tool for Python that supports common development lifecycle operations using a straightforward command-line interface. In addition to supporting standard Python packaging formats, Hatch provides a configurable plugin system for other types of distribution strategies.
The Hatch Datavolo NAR plugin supports building packaged archives for Apache NiFi. Adding the plugin to the build system configuration of a Python project using Hatch provides a new build target for packaging Processors and dependencies in a NAR.
Prerequisites
Packaging Python Processors with Hatch and the NAR plugin requires Python 3.11 or later. Python binaries are available through package managers for popular operating systems.
The Hatch installation walkthrough includes instructions for Linux distributions, macOS, and Windows.
The pipx command also supports installing Hatch regardless of operating system.
pipx install hatch
Packaging
Creating an Apache NiFi NAR with a Python Processor involves three steps:
- Create a project with Hatch
- Configure the project with the Hatch Datavolo NAR
- Build the project with the NAR target specified
Project Creation
The Hatch new
command supports creating a new project from templates.
hatch new processors
The command creates a project directory structure in the current directory using the src layout strategy and prints the details.
processors
├── src
│ └── processors
│ ├── __about__.py
│ └── __init__.py
├── tests
│ └── __init__.py
├── LICENSE.txt
├── README.md
└── pyproject.toml
Apache NiFi Python Processors should be added to the src/processors
directory for subsequent packaging.
Project Configuration
The pyproject.toml contains standard metadata and should be updated to reflect current project information.
Configuring the Hatch Datavolo NAR plugin requires updating the build-system.requires
field to include the
hatch-datavolo-nar
dependency.
[build-system]
requires = ["hatchling", "hatch-datavolo-nar"]
Adding the hatch-datavolo-nar
dependency enables the nar
target argument for the Hatch build
command.
The NAR plugin needs to have a defined set of source directories when building bundle files. The Python project
configuration should include a tool section with a packages
field that matches the source directory containing
Python Processors.
[tool.hatch.build.targets.nar]
packages = ["src/processors"]
The NAR plugin uses the dependencies
field in the [project]
section to retrieve and package external libraries.
The field should be empty for Python Processors that do not need additional dependencies.
The default pyproject.toml
configuration reads the version number from the __about__.py
file in the source directory.
The following pyproject.toml
configuration omits other metadata, but includes the minimum information required for
building a NAR.
[build-system]
requires = ["hatchling", "hatch-datavolo-nar"]
build-backend = "hatchling.build"
[project]
name = "processors"
dynamic = ["version"]
dependencies = []
[tool.hatch.version]
path = "src/processors/__about__.py"
[tool.hatch.build.targets.nar]
packages = ["src/processors"]
Building
The Hatch build
command supports Python source and distribution packaging in absence of any arguments.
With the Hatch Datavolo NAR plugin configured, the --target
argument supports nar
for Apache NiFi NAR bundles.
hatch build --target nar
The command prints specified build targets and also lists files produced.
dist/processors-0.0.1.nar
The packaged NAR can be loaded on a system that provides a compatible runtime version of Python.
The Apache NiFi Python Developer's Guide provides details on deploying a packaged NAR.
Apache NiFi 2.0.0-M3 or later is required for loading Python Processor NAR packages.