Skip to main content

DetectDocumentPII

Description

This processor accepts a parsed document then returns the document with metadata containing text positions of recognized PII entities. The processor may optionally redact those entities in the document text and titles.

Tags

datavolo, document, pii

Properties

In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Service Location Strategy *Service Location StrategyDefault
  • Default
  • Custom
Determines how Service Locations are configured within this processor for PII Detection and Redaction Services
Custom Presidio Analyzer Service Host *Custom Presidio Analyzer Service HostCustom URL for connecting to the Presidio Analyzer

This property is only considered if:
  • the property Service Location Strategy has a value of Custom
Custom Presidio Anonymizer Service Host *Custom Presidio Anonymizer Service HostCustom URL for connecting to the Presidio Anonymizer

This property is only considered if:
  • the property Service Location Strategy has a value of Custom
Communication Timeout *Communication Timeout60 secThe amount of time to wait for a response from the microservices before timing out.
Default Language *Default LanguageenDefault Language to use for PII detection if language is not detected
Entity Inclusion List *Entity Inclusion ListCREDIT_CARD, CRYPTO, DATE_TIME, EMAIL_ADDRESS, IBAN_CODE, IP_ADDRESS, NRP, LOCATION, PERSON, PHONE_NUMBER, MEDICAL_LICENSE, URL, US_BANK_NUMBER, US_DRIVER_LICENSE, US_ITIN, US_PASSPORT, US_SSNList of entities to include from the following list : CREDIT_CARD, CRYPTO, DATE_TIME, EMAIL_ADDRESS, IBAN_CODE, IP_ADDRESS, NRP, LOCATION, PERSON, PHONE_NUMBER, MEDICAL_LICENSE, URL, US_BANK_NUMBER, US_DRIVER_LICENSE, US_ITIN, US_PASSPORT, US_SSN, UK_NHS, ES_NIF, IT_FISCAL_CODE, IT_DRIVER_LICENSE, IT_VAT_CODE, IT_PASSPORT, IT_IDENTITY_CARD, PL_PESEL, SG_NRIC_FIN, SG_UEN, AU_ABN, AU_ACN, AU_TFN, AU_MEDICARE, IN_PAN, IN_AADHAAR, IN_VEHICLE_REGISTRATION
PII Redaction Strategy *PII Redaction StrategyNo Redaction
  • No Redaction
  • Redact Entities
  • Replace Entries
  • Encrypt PII
Strategy to use to redact PII from document
Add Original Text to MetadataAdd Original Text to Metadatafalse
  • false
  • true
Adds the original text or title to the container's metadata. This should be used for debugging as other Redaction Strategies like ENCRYPT allow for data to be de-anonmoized safely
PII Redaction Encryption KeyPII Redaction Encryption KeyKey to use to encrypt PII in document. Keys must be 16, 24, or 32 characters long. (128, 192, or 256 bits)

This property is only considered if:
  • the property PII Redaction Strategy has a value of ENCRYPT
Replacement ValueReplacement ValueValue to use to replace PII in document. If unset use the Entity Name (Ex PHONE_NUMBER)

This property is only considered if:
  • the property PII Redaction Strategy has a value of REPLACE

Dynamic Properties

This component does not support dynamic properties.

Relationships

NameDescription
comms.failureIf the processor is unable to communicate with one of the necessary services, the input FlowFile will be routed to this relationship.
failureIf the PII in the FlowFile cannot be extracted for any reason, the input FlowFile will be routed to this relationship.
successThe document is routed to the success relationship.

Reads Attributes

This processor does not read attributes.

Writes Attributes

NameDescription
pii.detectedAttribute that indicates PII was found in a document

State Management

This component does not store state.

Restricted

This component is not restricted.

Input Requirement

This component requires an incoming relationship.

System Resource Considerations

This component does not specify system resource considerations.

See Also

ParsePdfDocument