DetectDocumentPII
Description
This processor accepts a parsed document then returns the document with metadata containing text positions of recognized PII entities. The processor may optionally redact those entities in the document text and titles.
Tags
datavolo, document, pii
Properties
In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Allowable Values | Description |
---|---|---|---|---|
Service Location Strategy * | Service Location Strategy | Default |
| Determines how Service Locations are configured within this processor for PII Detection and Redaction Services |
Custom Presidio Analyzer Service Host * | Custom Presidio Analyzer Service Host | Custom URL for connecting to the Presidio Analyzer This property is only considered if:
| ||
Custom Presidio Anonymizer Service Host * | Custom Presidio Anonymizer Service Host | Custom URL for connecting to the Presidio Anonymizer This property is only considered if:
| ||
Communication Timeout * | Communication Timeout | 60 sec | The amount of time to wait for a response from the microservices before timing out. | |
Default Language * | Default Language | en | Default Language to use for PII detection if language is not detected | |
Entity Inclusion List * | Entity Inclusion List | CREDIT_CARD, CRYPTO, DATE_TIME, EMAIL_ADDRESS, IBAN_CODE, IP_ADDRESS, NRP, LOCATION, PERSON, PHONE_NUMBER, MEDICAL_LICENSE, URL, US_BANK_NUMBER, US_DRIVER_LICENSE, US_ITIN, US_PASSPORT, US_SSN | List of entities to include from the following list : CREDIT_CARD, CRYPTO, DATE_TIME, EMAIL_ADDRESS, IBAN_CODE, IP_ADDRESS, NRP, LOCATION, PERSON, PHONE_NUMBER, MEDICAL_LICENSE, URL, US_BANK_NUMBER, US_DRIVER_LICENSE, US_ITIN, US_PASSPORT, US_SSN, UK_NHS, ES_NIF, IT_FISCAL_CODE, IT_DRIVER_LICENSE, IT_VAT_CODE, IT_PASSPORT, IT_IDENTITY_CARD, PL_PESEL, SG_NRIC_FIN, SG_UEN, AU_ABN, AU_ACN, AU_TFN, AU_MEDICARE, IN_PAN, IN_AADHAAR, IN_VEHICLE_REGISTRATION | |
PII Redaction Strategy * | PII Redaction Strategy | No Redaction |
| Strategy to use to redact PII from document |
Add Original Text to Metadata | Add Original Text to Metadata | false |
| Adds the original text or title to the container's metadata. This should be used for debugging as other Redaction Strategies like ENCRYPT allow for data to be de-anonmoized safely |
PII Redaction Encryption Key | PII Redaction Encryption Key | Key to use to encrypt PII in document. Keys must be 16, 24, or 32 characters long. (128, 192, or 256 bits) This property is only considered if:
| ||
Replacement Value | Replacement Value | Value to use to replace PII in document. If unset use the Entity Name (Ex PHONE_NUMBER) This property is only considered if:
|
Dynamic Properties
This component does not support dynamic properties.
Relationships
Name | Description |
---|---|
comms.failure | If the processor is unable to communicate with one of the necessary services, the input FlowFile will be routed to this relationship. |
failure | If the PII in the FlowFile cannot be extracted for any reason, the input FlowFile will be routed to this relationship. |
success | The document is routed to the success relationship. |
Reads Attributes
This processor does not read attributes.
Writes Attributes
Name | Description |
---|---|
pii.detected | Attribute that indicates PII was found in a document |
State Management
This component does not store state.
Restricted
This component is not restricted.
Input Requirement
This component requires an incoming relationship.
System Resource Considerations
This component does not specify system resource considerations.