Legacy Google Cloud Storage support#
Object storage connectors can access
Google Cloud Storage data using the
gs://
URI prefix.
Requirements#
To use Google Cloud Storage with non-anonymous access objects, you need:
The key for the service account in JSON format
Configuration#
The use of Google Cloud Storage as a storage location for an object storage catalog requires setting a configuration property that defines the authentication method for any non-anonymous access object. Access methods cannot be combined.
The default root path used by the gs:\\
prefix is set in the catalog by the
contents of the specified key file, or the key file used to create the OAuth
token.
Property Name |
Description |
---|---|
|
JSON key file used to authenticate your Google Cloud service account with Google Cloud Storage. |
|
Use client-provided OAuth token to access Google Cloud Storage. |
The following uses the Delta Lake connector in an example of a minimal configuration file for an object storage catalog using a JSON key file:
connector.name=delta_lake
hive.metastore.uri=thrift://example.net:9083
hive.gcs.json-key-file-path=${ENV:GCP_CREDENTIALS_FILE_PATH}
General usage#
Create a schema to use if one does not already exist, as in the following example:
CREATE SCHEMA storage_catalog.sales_data_in_gcs WITH (location = 'gs://example_location');
Once you have created a schema, you can create tables in the schema, as in the following example:
CREATE TABLE storage_catalog.sales_data_in_gcs.orders (
orderkey BIGINT,
custkey BIGINT,
orderstatus VARCHAR(1),
totalprice DOUBLE,
orderdate DATE,
orderpriority VARCHAR(15),
clerk VARCHAR(15),
shippriority INTEGER,
comment VARCHAR(79)
);
This statement creates the folder gs://sales_data_in_gcs/orders
in the root
folder defined in the JSON key file.
Your table is now ready to populate with data using INSERT
statements.
Alternatively, you can use CREATE TABLE AS
statements to create and
populate the table in a single statement.
Migration to native Google Cloud Storage implementation#
SEP includes a native implementation that does not rely on the legacy Hadoop implementation. Starburst recommends upgrading existing deployments to the new native implementation at your earliest convenience.
To migrate a catalog to use the native filesystem implementation for Google Cloud Storage, make the following edits to your catalog configuration:
Set the
fs.hadoop.enabled=false
catalog configuration property.Add the
fs.native-gcs.enabled=true
catalog configuration property.Refer to the following table to rename your existing legacy catalog configuration properties to the corresponding native configuration properties. Supported configuration values are identical unless otherwise noted.
Legacy property |
Native property |
Notes |
---|---|---|
|
|
|
|
|
Also see |
Remove the following legacy configuration properties if they exist in your catalog configuration:
warp-speed.config.internal-communication.shared-secret
warp-speed.use-http-server-port
warp-speed.config.http-rest-port-enabled