CaptureSharepointChanges
Description
Captures changes from a Sharepoint Document Library and emits a FlowFile for each change that occurs. This includes additions and deletions of files and folders, as well as changes to permissions, metadata, and file content.
Tags
cdc, document, experimental, graph, library, microsoft, sharepoint, unstructured
Properties
In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Allowable Values | Description |
---|---|---|---|---|
Site Name * | Site Name | The name of the Sharepoint Site that data will be retrieved from. | ||
Document Library Name | Document Library Name | The name of the Document Library to list. If not specified, all Document Libraries associated with the Site will be listed. | ||
Folder Name * | Folder Name | / | The name of the Folder/Directory to list | |
Authentication Service * | Authentication Service | Controller Service: MicrosoftGraphAuthenticationService Implementations: MicrosoftGraphAuthenticationProvider | The service that provides authentication for the SharePoint API | |
Change Capture Initial Action * | Change Capture Initial Action | List Existing Items |
| If the Processor is run without having any prior state, this property dictates how the Processor should treat existing Sharepoint items. |
Dynamic Properties
This component does not support dynamic properties.
Relationships
Name | Description |
---|---|
created | A FlowFile is routed to this relationship for each Sharepoint item that is created. |
deleted | A FlowFile is routed to this relationship for each Sharepoint item that is deleted. |
updated | A FlowFile is routed to this relationship for each Sharepoint item that is updated. |
Reads Attributes
This processor does not read attributes.
Writes Attributes
Name | Description |
---|---|
filename | The name of the Sharepoint item that was changed. This attribute is not available for 'Deleted' changes. |
hash.crc32 | The CRC32 hash of the Sharepoint item that was changed. This attribute is not always available. |
hash.quickxor | The QuickXor hash of the Sharepoint item that was changed. This attribute is not always available. |
hash.sha1 | The SHA-1 hash of the Sharepoint item that was changed. This attribute is not always available. |
hash.sha256 | The SHA-256 hash of the Sharepoint item that was changed. This attribute is not always available. |
mime.type | The MIME type of the Sharepoint item that was changed. This attribute is only available for 'File' items. |
path | The path of the Sharepoint item that was changed. This is the path relative to the root of the Document Library. |
sharepoint.change.type | The type of change that occurred. Possible values are 'Created', 'Updated', 'PermissionsUpdated', 'Deleted'. |
sharepoint.ctag | The CTag of the Sharepoint item that was changed. |
sharepoint.drive.id | The ID of the Sharepoint Drive that contains the item that was changed. |
sharepoint.drive.name | The name of the Sharepoint Drive that contains the item that was changed. |
sharepoint.etag | The ETag of the Sharepoint item that was changed. |
sharepoint.filename | The name of the Sharepoint item that was changed. This attribute is not available for 'Deleted' changes. |
sharepoint.item.id | The ID of the Sharepoint item that was changed. |
sharepoint.item.type | The type of the Sharepoint item that was changed. Possible values are 'File' and 'Folder'. |
sharepoint.lastModified | The last modified timestamp of the Sharepoint item that was changed. |
sharepoint.path | The path of the Sharepoint item that was changed. This is the path relative to the root of the Document Library. |
sharepoint.permissions.read.groups | A comma-separated list of groups that have read permissions on the Sharepoint item that was changed. For each group, if an e-mail address is available in Sharepoint, it will be included. Additionally, the group principal, such as mygroup@mytenant.onmicrosoft.com , is included. |
sharepoint.permissions.read.users | A comma-separated list of users that have read permissions on the Sharepoint item that was changed. For each user, if an e-mail address is available in Sharepoint, it will be included. Additionally, the user principal, such as johndoe@mytenant.onmicrosoft.com , is included. |
sharepoint.site.id | The ID of the Sharepoint Site that contains the item that was changed. |
sharepoint.site.name | The name of the Sharepoint Site that contains the item that was changed. |
sharepoint.size | The size of the Sharepoint item that was changed. |
State Management
Scope | Description |
---|---|
CLUSTER | Stores tokens for each Sharepoint folder to track state about which events have already been captured. |
Restricted
This component is not restricted.
Input Requirement
This component does not allow an incoming relationship.
Example Use Cases Involving Other Components
Multiprocessor Use Case 1
Perform Change Data Capture on a Sharepoint Document Library, retrieving all data in the Document Library, including permissions, in order to keep a destination system in sync with Sharepoint.
This use case describes how to retrieve the data from Sharepoint continually. The FetchSharepointFile processor can be connected to a destination system, or it may be connected to a processing flow, such as a Retrieval Augmented Generation (RAG) specific flow, that parses parses PDF files and keeps a Vector Store in sync with Sharepoint.
Components Involved
- CaptureSharepointChanges
- Configure the "Authentication Service" property to use an instance of
MSGraphAuthenticationProvider
that is configured with appropriate credentials. - Configure the "Site Name" property to specify the name of the Sharepoint Site to keep in sync.
- In order to synchronize a specific Document Library, configure the "Document Library Name" property to specify the name of the Document Library.
- To synchronize all Document Libraries that are associated with the Sharepoint Site, leave the "Document Library Name" property unset.
- Configure the "Folder Name" property to specify the name of the Folder/Directory to list, or leave it set to the default value of
/
to monitor the entire Document Library. - Set "Change Capture Initial Action" to "List Existing Items".
- Connect the 'created' and 'updated' relationships to the FetchSharepointFile Processor.
- Connect the 'deleted' relationship to the appropriate follow-on processor that is capable of deleting from the destination system.
- Configure the "Authentication Service" property to use an instance of
- FetchSharepointFile
- Configure the "Authentication Service" property to use the same instance of
MSGraphAuthenticationProvider
that is used by theCaptureSharepointChanges
processor. - Leave the "Drive ID" property set to the default value of
${sharepoint.drive.id}
. - Leave the "Item ID" property set to the default value of
${sharepoint.item.id}
. - If performing a RAG type of flow that requires processing, set the "Download PDF/HTML Version" property to
true
. - If pushing files to the destination system unchanged, set the "Download PDF/HTML Version" property to
false
. - Connect the 'success' relationship to the appropriate follow-on processor that is capable of processing the file or delivering to the destination system.
- Configure the "Authentication Service" property to use the same instance of
System Resource Considerations
This component does not specify system resource considerations.