Skip to main content

CaptureSharepointChanges

Description

Captures changes from a Sharepoint Document Library and emits a FlowFile for each change that occurs. This includes additions and deletions of files and folders, as well as changes to permissions, metadata, and file content.

Tags

cdc, document, experimental, graph, library, microsoft, sharepoint, unstructured

Properties

In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Site Name *Site NameThe name of the Sharepoint Site that data will be retrieved from.
Document Library NameDocument Library NameThe name of the Document Library to list. If not specified, all Document Libraries associated with the Site will be listed.
Folder Name *Folder Name/The name of the Folder/Directory to list
Authentication Service *Authentication ServiceController Service:
MicrosoftGraphAuthenticationService

Implementations:
MicrosoftGraphAuthenticationProvider
The service that provides authentication for the SharePoint API
Change Capture Initial Action *Change Capture Initial ActionList Existing Items
  • List Existing Items
  • Skip Existing Items
If the Processor is run without having any prior state, this property dictates how the Processor should treat existing Sharepoint items.

Dynamic Properties

This component does not support dynamic properties.

Relationships

NameDescription
createdA FlowFile is routed to this relationship for each Sharepoint item that is created.
deletedA FlowFile is routed to this relationship for each Sharepoint item that is deleted.
updatedA FlowFile is routed to this relationship for each Sharepoint item that is updated.

Reads Attributes

This processor does not read attributes.

Writes Attributes

NameDescription
filenameThe name of the Sharepoint item that was changed. This attribute is not available for 'Deleted' changes.
hash.crc32The CRC32 hash of the Sharepoint item that was changed. This attribute is not always available.
hash.quickxorThe QuickXor hash of the Sharepoint item that was changed. This attribute is not always available.
hash.sha1The SHA-1 hash of the Sharepoint item that was changed. This attribute is not always available.
hash.sha256The SHA-256 hash of the Sharepoint item that was changed. This attribute is not always available.
mime.typeThe MIME type of the Sharepoint item that was changed. This attribute is only available for 'File' items.
pathThe path of the Sharepoint item that was changed. This is the path relative to the root of the Document Library.
sharepoint.change.typeThe type of change that occurred. Possible values are 'Created', 'Updated', 'PermissionsUpdated', 'Deleted'.
sharepoint.ctagThe CTag of the Sharepoint item that was changed.
sharepoint.drive.idThe ID of the Sharepoint Drive that contains the item that was changed.
sharepoint.drive.nameThe name of the Sharepoint Drive that contains the item that was changed.
sharepoint.etagThe ETag of the Sharepoint item that was changed.
sharepoint.filenameThe name of the Sharepoint item that was changed. This attribute is not available for 'Deleted' changes.
sharepoint.item.idThe ID of the Sharepoint item that was changed.
sharepoint.item.typeThe type of the Sharepoint item that was changed. Possible values are 'File' and 'Folder'.
sharepoint.lastModifiedThe last modified timestamp of the Sharepoint item that was changed.
sharepoint.pathThe path of the Sharepoint item that was changed. This is the path relative to the root of the Document Library.
sharepoint.permissions.read.groupsA comma-separated list of groups that have read permissions on the Sharepoint item that was changed. For each group, if an e-mail address is available in Sharepoint, it will be included. Additionally, the group principal, such as mygroup@mytenant.onmicrosoft.com, is included.
sharepoint.permissions.read.usersA comma-separated list of users that have read permissions on the Sharepoint item that was changed. For each user, if an e-mail address is available in Sharepoint, it will be included. Additionally, the user principal, such as johndoe@mytenant.onmicrosoft.com, is included.
sharepoint.site.idThe ID of the Sharepoint Site that contains the item that was changed.
sharepoint.site.nameThe name of the Sharepoint Site that contains the item that was changed.
sharepoint.sizeThe size of the Sharepoint item that was changed.

State Management

ScopeDescription
CLUSTERStores tokens for each Sharepoint folder to track state about which events have already been captured.

Restricted

This component is not restricted.

Input Requirement

This component does not allow an incoming relationship.

Example Use Cases Involving Other Components

Multiprocessor Use Case 1

Perform Change Data Capture on a Sharepoint Document Library, retrieving all data in the Document Library, including permissions, in order to keep a destination system in sync with Sharepoint.

This use case describes how to retrieve the data from Sharepoint continually. The FetchSharepointFile processor can be connected to a destination system, or it may be connected to a processing flow, such as a Retrieval Augmented Generation (RAG) specific flow, that parses parses PDF files and keeps a Vector Store in sync with Sharepoint.

Components Involved

  • CaptureSharepointChanges
    1. Configure the "Authentication Service" property to use an instance of MSGraphAuthenticationProvider that is configured with appropriate credentials.
    2. Configure the "Site Name" property to specify the name of the Sharepoint Site to keep in sync.
    3. In order to synchronize a specific Document Library, configure the "Document Library Name" property to specify the name of the Document Library.
    4. To synchronize all Document Libraries that are associated with the Sharepoint Site, leave the "Document Library Name" property unset.
    5. Configure the "Folder Name" property to specify the name of the Folder/Directory to list, or leave it set to the default value of / to monitor the entire Document Library.
    6. Set "Change Capture Initial Action" to "List Existing Items".
    7. Connect the 'created' and 'updated' relationships to the FetchSharepointFile Processor.
    8. Connect the 'deleted' relationship to the appropriate follow-on processor that is capable of deleting from the destination system.
  • FetchSharepointFile
    1. Configure the "Authentication Service" property to use the same instance of MSGraphAuthenticationProvider that is used by the CaptureSharepointChanges processor.
    2. Leave the "Drive ID" property set to the default value of ${sharepoint.drive.id}.
    3. Leave the "Item ID" property set to the default value of ${sharepoint.item.id}.
    4. If performing a RAG type of flow that requires processing, set the "Download PDF/HTML Version" property to true.
    5. If pushing files to the destination system unchanged, set the "Download PDF/HTML Version" property to false.
    6. Connect the 'success' relationship to the appropriate follow-on processor that is capable of processing the file or delivering to the destination system.

System Resource Considerations

This component does not specify system resource considerations.

See Also

FetchSharepointFile