FetchGCSObject
Description
Fetches a file from a Google Cloud Bucket. Designed to be used in tandem with ListGCSBucket.
Tags
fetch, gcs, google, google cloud, storage
Properties
In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Allowable Values | Description |
---|---|---|---|---|
GCP Credentials Provider Service * | GCP Credentials Provider Service | Controller Service: GCPCredentialsService Implementations: GCPCredentialsControllerService | The Controller Service used to obtain Google Cloud Platform credentials. | |
Project ID | gcp-project-id | Google Cloud Project ID Supports Expression Language, using Environment variables. | ||
Bucket * | gcs-bucket | ${gcs.bucket} | Bucket of the object. Supports Expression Language, using FlowFile attributes and Environment variables. | |
Key * | gcs-key | ${filename} | Name of the object. Supports Expression Language, using FlowFile attributes and Environment variables. | |
Object Generation | gcs-generation | The generation of the Object to download. If not set, the latest generation will be downloaded. Supports Expression Language, using FlowFile attributes and Environment variables. | ||
Server Side Encryption Key | gcs-server-side-encryption-key | An AES256 Key (encoded in base64) which the object has been encrypted in. Supports Expression Language, using FlowFile attributes and Environment variables. | ||
Range Start | gcs-object-range-start | The byte position at which to start reading from the object. An empty value or a value of zero will start reading at the beginning of the object. Supports Expression Language, using FlowFile attributes and Environment variables. | ||
Range Length | gcs-object-range-length | The number of bytes to download from the object, starting from the Range Start. An empty value or a value that extends beyond the end of the object will read to the end of the object. Supports Expression Language, using FlowFile attributes and Environment variables. | ||
Number of retries * | gcp-retry-count | 6 | How many retry attempts should be made before routing to the failure relationship. | |
Storage API URL | storage-api-url | Overrides the default storage URL. Configuring an alternative Storage API URL also overrides the HTTP Host header on requests as described in the Google documentation for Private Service Connections. Supports Expression Language, using Environment variables. | ||
Proxy Configuration Service | proxy-configuration-service | Controller Service: ProxyConfigurationService Implementations: StandardProxyConfigurationService | Specifies the Proxy Configuration Controller Service to proxy network requests. Supported proxies: HTTP + AuthN |
Dynamic Properties
This component does not support dynamic properties.
Relationships
Name | Description |
---|---|
failure | FlowFiles are routed to this relationship if the Google Cloud Storage operation fails. |
success | FlowFiles are routed to this relationship after a successful Google Cloud Storage operation. |
Reads Attributes
This processor does not read attributes.
Writes Attributes
Name | Description |
---|---|
filename | The name of the file, parsed if possible from the Content-Disposition response header |
gcs.bucket | Bucket of the object. |
gcs.cache.control | Data cache control of the object. |
gcs.component.count | The number of components which make up the object. |
gcs.content.disposition | The data content disposition of the object. |
gcs.content.encoding | The content encoding of the object. |
gcs.content.language | The content language of the object. |
gcs.crc32c | The CRC32C checksum of object's data, encoded in base64 in big-endian order. |
gcs.create.time | The creation time of the object (milliseconds) |
gcs.encryption.algorithm | The algorithm used to encrypt the object. |
gcs.encryption.sha256 | The SHA256 hash of the key used to encrypt the object |
gcs.etag | The HTTP 1.1 Entity tag for the object. |
gcs.generated.id | The service-generated for the object |
gcs.generation | The data generation of the object. |
gcs.key | Name of the object. |
gcs.md5 | The MD5 hash of the object's data encoded in base64. |
gcs.media.link | The media download link to the object. |
gcs.metageneration | The metageneration of the object. |
gcs.owner | The owner (uploader) of the object. |
gcs.owner.type | The ACL entity type of the uploader of the object. |
gcs.size | Size of the object. |
gcs.update.time | The last modification time of the object (milliseconds) |
gcs.uri | The URI of the object as a string. |
mime.type | The MIME/Content-Type of the object |
State Management
This component does not store state.
Restricted
This component is not restricted.
Input Requirement
This component requires an incoming relationship.
Example Use Cases Involving Other Components
Multiprocessor Use Case 1
Retrieve all files in a Google Compute Storage (GCS) bucket
Components Involved
- ListGCSBucket
- The "Bucket" property should be set to the name of the GCS bucket that files reside in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize this property by setting it to something like
#{GCS_SOURCE_BUCKET}
. - Configure the "Project ID" property to reflect the ID of your Google Compute Cloud Project.
- The "GCP Credentials Provider Service" property should specify an instance of the GCPCredentialsService in order to provide credentials for accessing the bucket.
- The 'success' Relationship of this Processor is then connected to FetchGCSObject.
- The "Bucket" property should be set to the name of the GCS bucket that files reside in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize this property by setting it to something like
- FetchGCSObject
- "Bucket" = "${gcs.bucket}"
- "Name" = "${filename}"
- The "GCP Credentials Provider Service" property should specify an instance of the GCPCredentialsService in order to provide credentials for accessing the bucket.
System Resource Considerations
This component does not specify system resource considerations.