Legacy Google Cloud Storage support#

Object storage connectors can access Google Cloud Storage data using the gs:// URI prefix.

Requirements#

To use Google Cloud Storage with non-anonymous access objects, you need:

Configuration#

The use of Google Cloud Storage as a storage location for an object storage catalog requires setting a configuration property that defines the authentication method for any non-anonymous access object. Access methods cannot be combined.

The default root path used by the gs:\\ prefix is set in the catalog by the contents of the specified key file, or the key file used to create the OAuth token.

Google Cloud Storage configuration properties#

Property Name

Description

hive.gcs.json-key-file-path

JSON key file used to authenticate your Google Cloud service account with Google Cloud Storage.

hive.gcs.use-access-token

Use client-provided OAuth token to access Google Cloud Storage.

The following uses the Delta Lake connector in an example of a minimal configuration file for an object storage catalog using a JSON key file:

connector.name=delta_lake
hive.metastore.uri=thrift://example.net:9083
hive.gcs.json-key-file-path=${ENV:GCP_CREDENTIALS_FILE_PATH}

General usage#

Create a schema to use if one does not already exist, as in the following example:

CREATE SCHEMA storage_catalog.sales_data_in_gcs WITH (location = 'gs://example_location');

Once you have created a schema, you can create tables in the schema, as in the following example:

CREATE TABLE storage_catalog.sales_data_in_gcs.orders (
    orderkey BIGINT,
    custkey BIGINT,
    orderstatus VARCHAR(1),
    totalprice DOUBLE,
    orderdate DATE,
    orderpriority VARCHAR(15),
    clerk VARCHAR(15),
    shippriority INTEGER,
    comment VARCHAR(79)
);

This statement creates the folder gs://sales_data_in_gcs/orders in the root folder defined in the JSON key file.

Your table is now ready to populate with data using INSERT statements. Alternatively, you can use CREATE TABLE AS statements to create and populate the table in a single statement.

Migration to native Google Cloud Storage implementation#

SEP includes a native implementation that does not rely on the legacy Hadoop implementation. Starburst recommends upgrading existing deployments to the new native implementation at your earliest convenience.

To migrate a catalog to use the native filesystem implementation for Google Cloud Storage, make the following edits to your catalog configuration:

  1. Set the fs.hadoop.enabled=false catalog configuration property.

  2. Add the fs.native-gcs.enabled=true catalog configuration property.

  3. Refer to the following table to rename your existing legacy catalog configuration properties to the corresponding native configuration properties. Supported configuration values are identical unless otherwise noted.

Legacy property

Native property

Notes

hive.gcs.use-access-token

gcs.use-access-token

hive.gcs.json-key-file-path

gcs.json-key-file-path

Also see gcs.json-key in Google Cloud Storage file system support

  1. Remove the following legacy configuration properties if they exist in your catalog configuration:

    • warp-speed.config.internal-communication.shared-secret

    • warp-speed.use-http-server-port

    • warp-speed.config.http-rest-port-enabled