Metastores#

Object storage access is mediated through a metastore. Metastores provide information on directory structure, file format, and metadata about the stored data. Object storage connectors support the use of one or more metastores. A supported metastore is required to use any object storage connector.

Additional configuration is required in order to access tables with Athena partition projection metadata or implement first class support for Avro tables. These requirements are discussed later in this topic.

General metastore configuration properties#

The following table describes general metastore configuration properties, most of which are used with either metastore.

At a minimum, each Delta Lake, Hive or Hudi object storage catalog file must set the hive.metastore configuration property to define the type of metastore to use. Iceberg catalogs instead use the iceberg.catalog.type configuration property to define the type of metastore to use.

Note

SEP supports the Hive Metastore Service (HMS) version 3.13. HMS version 4.X is not supported.

Additional configuration properties specific to the Thrift and Glue Metastores are also available. They are discussed later in this topic.

Warning

The Glue v1 SDK is deprecated and will be removed in a future release. When Glue v1 is removed, glue will be updated to default to Glue v2 across all connectors, including Iceberg.

For Iceberg, setting iceberg.catalog.type=glue still defaults to Glue v1. For other lake connectors, glue already maps to Glue v2 by default.

To explicitly opt into Glue v2 for Iceberg:

iceberg.catalog.type=glue_v2

General metastore configuration properties#
Property Name	Description	Default
`hive.metastore`	The type of Hive metastore to use. Trino currently supports the default Hive Thrift metastore (`thrift`), and the AWS Glue Catalog (`glue`) as metadata sources. You must use this for all object storage catalogs except Iceberg.	`thrift`
`iceberg.catalog.type`	The Iceberg table format manages most metadata in metadata files in the object storage itself. A small amount of metadata, however, still requires the use of a metastore. In the Iceberg ecosystem, these smaller metastores are called Iceberg metadata catalogs, or just catalogs. The examples in each subsection depict the contents of a Trino catalog file that uses the Iceberg connector to configures different Iceberg metadata catalogs. You must set this property in all Iceberg catalog property files. Valid values are `hive_metastore`, `glue`, `jdbc`, `rest`, `nessie`, and `snowflake`.	`hive_metastore`
`hive.metastore-cache.cache-partitions`	Enable caching for partition metadata. You can disable caching to avoid inconsistent behavior that results from it. This property is not compatible with the default Glue V2 REST interface, and can only be set when `hive.metastore` is set to `glue-v1`.	`true`
`hive.metastore-cache.cache-missing`	Enable caching the fact that a table is missing to prevent future metastore calls for that table. This property is not compatible with the default Glue V2 REST interface, and can only be set when `hive.metastore` is set to `glue-v1`.	`true`
`hive.metastore-cache.cache-missing-partitions`	Enable caching the fact that a partition is missing to prevent future metastore calls for that partition. This property is not compatible with the default Glue V2 REST interface, and can only be set when `hive.metastore` is set to `glue-v1`.	`false`
`hive.metastore-cache.cache-missing-stats`	Enable caching the fact that table statistics for a specific table are missing to prevent future metastore calls. This property is not compatible with the default Glue V2 REST interface, and can only be set when `hive.metastore` is set to `glue-v1`.	`false`
`hive.metastore-cache-ttl`	Duration of how long cached metastore data is considered valid.	`20m`
`hive.metastore-stats-cache-ttl`	Duration of how long cached metastore statistics are considered valid.	`5m`
`hive.metastore-cache-maximum-size`	Maximum number of metastore data objects in the Hive metastore cache.	`100000`
`hive.metastore-refresh-interval`	Asynchronously refresh cached metastore data after access if it is older than this but is not yet expired, allowing subsequent accesses to see fresh data.	`10m`
`hive.metastore-refresh-max-threads`	Maximum threads used to refresh cached metastore data.	`10`
`hive.user-metastore-cache-ttl`	Duration of how long cached metastore statistics, which are user specific in user impersonation scenarios, are considered valid.	`20m`
`hive.user-metastore-cache-maximum-size`	Maximum number of metastore data objects in the Hive metastore cache, which are user specific in user impersonation scenarios.	`1000`
`hive.hide-delta-lake-tables`	Controls whether to hide Delta Lake tables in table listings. Currently applies only when using the AWS Glue metastore.	`false`

Thrift metastore configuration properties#

In order to use a Hive Thrift metastore, you must configure the metastore with hive.metastore=thrift and provide further details with the following properties:

Thrift metastore configuration properties#
Property name	Description	Default
`hive.metastore.uri`	The URIs of the Hive metastore to connect to using the Thrift protocol. If a comma-separated list of URIs is provided, the first URI is used by default, and the rest of the URIs are fallback metastores. This property is required. Example: `thrift://192.0.2.3:9083` or `thrift://192.0.2.3:9083,thrift://192.0.2.4:9083`
`hive.metastore.username`	The username Trino uses to access the Hive metastore.
`hive.metastore.authentication.type`	Hive metastore authentication type. Possible values are `NONE` or `KERBEROS`.	`NONE`
`hive.metastore.thrift.client.connect-timeout`	Socket connect timeout for metastore client.	`10s`
`hive.metastore.thrift.client.read-timeout`	Socket read timeout for metastore client.	`10s`
`hive.metastore.thrift.impersonation.enabled`	Enable Hive metastore end user impersonation.
`hive.metastore.thrift.use-spark-table-statistics-fallback`	Enable usage of table statistics generated by Apache Spark when Hive table statistics are not available.	`true`
`hive.metastore.thrift.delegation-token.cache-ttl`	Time to live delegation token cache for metastore.	`1h`
`hive.metastore.thrift.delegation-token.cache-maximum-size`	Delegation token cache maximum size.	`1000`
`hive.metastore.thrift.client.ssl.enabled`	Use SSL when connecting to metastore.	`false`
`hive.metastore.thrift.client.ssl.key`	Path to private key and client certification (key store).
`hive.metastore.thrift.client.ssl.key-password`	Password for the private key.
`hive.metastore.thrift.client.ssl.trust-certificate`	Path to the server certificate chain (trust store). Required when SSL is enabled.
`hive.metastore.thrift.client.ssl.trust-certificate-password`	Password for the trust store.
`hive.metastore.service.principal`	The Kerberos principal of the Hive metastore service.
`hive.metastore.client.principal`	The Kerberos principal that Trino uses when connecting to the Hive metastore service.
`hive.metastore.client.keytab`	Hive metastore client keytab location.
`hive.metastore.thrift.delete-files-on-drop`	Actively delete the files for managed tables when performing drop table or partition operations, for cases when the metastore does not delete the files.	`false`
`hive.metastore.thrift.assume-canonical-partition-keys`	Allow the metastore to assume that the values of partition columns can be converted to string values. This can lead to performance improvements in queries which apply filters on the partition columns. Partition keys with a `TIMESTAMP` type do not get canonicalized.	`false`
`hive.metastore.thrift.client.socks-proxy`	SOCKS proxy to use for the Thrift Hive metastore.
`hive.metastore.thrift.client.max-retries`	Maximum number of retry attempts for metastore requests.	`9`
`hive.metastore.thrift.client.backoff-scale-factor`	Scale factor for metastore request retry delay.	`2.0`
`hive.metastore.thrift.client.max-retry-time`	Total allowed time limit for a metastore request to be retried.	`30s`
`hive.metastore.thrift.client.min-backoff-delay`	Minimum delay between metastore request retries.	`1s`
`hive.metastore.thrift.client.max-backoff-delay`	Maximum delay between metastore request retries.	`1s`
`hive.metastore.thrift.txn-lock-max-wait`	Maximum time to wait to acquire hive transaction lock.	`10m`
`hive.metastore.thrift.catalog-name`	The term “Hive metastore catalog name” refers to the abstraction concept within Hive, enabling various systems to connect to distinct, independent catalogs stored in the metastore. By default, the catalog name in Hive metastore is set to “hive.” When this configuration property is left empty, the default catalog of the Hive metastore will be accessed.

Iceberg-specific Hive catalog configuration properties#

When using the Hive catalog, the Iceberg connector supports the same general Thrift metastore configuration properties as previously described with the following additional property:

Iceberg Hive catalog configuration property#
Property name	Description	Default
`iceberg.hive-catalog.locking-enabled`	Commit to tables using Hive locks.	`true`

Warning

Setting iceberg.hive-catalog.locking-enabled=false will cause the catalog to commit to tables without using Hive locks. This should only be set to false if all following conditions are met:

HIVE-26882 is available on the Hive metastore server. Requires version 2.3.10, 4.0.0-beta-1 or later.
HIVE-28121 is available on the Hive metastore server, if it is backed by MySQL or MariaDB. Requires version 2.3.10, 4.1.0, 4.0.1 or later.
All other catalogs committing to tables that this catalogs commits to are also on Iceberg 1.3 or later, and disabled Hive locks on commit.

Thrift metastore authentication#

In a Kerberized Hadoop cluster, Trino connects to the Hive metastore Thrift service using SASL and authenticates using Kerberos. Kerberos authentication for the metastore is configured in the connector’s properties file using the following optional properties:

Hive metastore Thrift service authentication properties#
Property value	Description	Default
`hive.metastore.authentication.type`	Hive metastore authentication type. One of `NONE` or `KERBEROS`. When using the default value of `NONE`, Kerberos authentication is disabled, and no other properties must be configured. When set to `KERBEROS` the Hive connector connects to the Hive metastore Thrift service using SASL and authenticate using Kerberos.	`NONE`
`hive.metastore.thrift.impersonation.enabled`	Enable Hive metastore end user impersonation. See KERBEROS authentication with impersonation for more information.	`false`
`hive.metastore.service.principal`	The Kerberos principal of the Hive metastore service. The coordinator uses this to authenticate the Hive metastore. The `_HOST` placeholder can be used in this property value. When connecting to the Hive metastore, the Hive connector substitutes in the hostname of the metastore server it is connecting to. This is useful if the metastore runs on multiple hosts. Example: `hive/hive-server-host@EXAMPLE.COM` or `hive/_HOST@EXAMPLE.COM`.
`hive.metastore.client.principal`	The Kerberos principal that Trino uses when connecting to the Hive metastore service. Example: `trino/trino-server-node@EXAMPLE.COM` or `trino/_HOST@EXAMPLE.COM`. The `_HOST` placeholder can be used in this property value. When connecting to the Hive metastore, the Hive connector substitutes in the hostname of the worker node Trino is running on. This is useful if each worker node has its own Kerberos principal. Unless KERBEROS authentication with impersonation is enabled, the principal specified by `hive.metastore.client.principal` must have sufficient privileges to remove files and directories within the `hive/warehouse` directory. Warning: If the principal does have sufficient permissions, only the metadata is removed, and the data continues to consume disk space. This occurs because the Hive metastore is responsible for deleting the internal table data. When the metastore is configured to use Kerberos authentication, all of the HDFS operations performed by the metastore are impersonated. Errors deleting data are silently ignored.
`hive.metastore.client.keytab`	The path to the keytab file that contains a key for the principal specified by `hive.metastore.client.principal`. This file must be readable by the operating system user running Trino.

The following sections describe the configuration properties and values needed for the various authentication configurations needed to use the Hive metastore Thrift service with the Hive connector.

Default `NONE` authentication without impersonation#

hive.metastore.authentication.type=NONE

The default authentication type for the Hive metastore is NONE. When the authentication type is NONE, Trino connects to an unsecured Hive metastore. Kerberos is not used.

`KERBEROS` authentication with impersonation#

hive.metastore.authentication.type=KERBEROS
hive.metastore.thrift.impersonation.enabled=true
hive.metastore.service.principal=hive/hive-metastore-host.example.com@EXAMPLE.COM
hive.metastore.client.principal=trino@EXAMPLE.COM
hive.metastore.client.keytab=/etc/trino/hive.keytab

When the authentication type for the Hive metastore Thrift service is KERBEROS, Trino connects as the Kerberos principal specified by the property hive.metastore.client.principal. Trino authenticates this principal using the keytab specified by the hive.metastore.client.keytab property, and verifies that the identity of the metastore matches hive.metastore.service.principal.

When using KERBEROS Metastore authentication with impersonation, the principal specified by the hive.metastore.client.principal property must be allowed to impersonate the current Trino user, as discussed in the section HDFS impersonation.

Keytab files must be distributed to every node in the Trino cluster.

AWS Glue catalog configuration properties#

In order to use an AWS Glue catalog, you must configure your catalog file as follows:

hive.metastore=glue and provide further details with the following properties:

AWS Glue catalog configuration properties#
Property Name	Description	Default
`hive.metastore.glue.region`	AWS region of the Glue Catalog. This is required when not running in EC2, or when the catalog is in a different region. Example: `us-east-1`
`hive.metastore.glue.endpoint-url`	Glue API endpoint URL (optional). Example: `https://glue.us-east-1.amazonaws.com`
`hive.metastore.glue.sts.region`	AWS region of the STS service to authenticate with. This is required when running in a GovCloud region. Example: `us-gov-east-1`
`hive.metastore.glue.sts.endpoint`	STS endpoint URL to use when authenticating to Glue (optional). Example: `https://sts.us-gov-east-1.amazonaws.com`
`hive.metastore.glue.pin-client-to-current-region`	Pin Glue requests to the same region as the EC2 instance where Trino is running.	`false`
`hive.metastore.glue.max-connections`	Max number of concurrent connections to Glue.	`30`
`hive.metastore.glue.max-error-retries`	Maximum number of error retries for the Glue client.	`10`
`hive.metastore.glue.default-warehouse-dir`	Default warehouse directory for schemas created without an explicit `location` property.
`hive.metastore.glue.use-web-identity-token-credentials-provider`	If you are running Trino on Amazon EKS, and authenticate using a Kubernetes service account, you can set this property to `true`. Setting to `true` forces Trino to not try using different credential providers from the default credential provider chain, and instead directly use credentials from the service account.	`false`
`hive.metastore.glue.aws-access-key`	AWS access key to use to connect to the Glue Catalog. If specified along with `hive.metastore.glue.aws-secret-key`, this parameter takes precedence over `hive.metastore.glue.iam-role`.
`hive.metastore.glue.aws-secret-key`	AWS secret key to use to connect to the Glue Catalog. If specified along with `hive.metastore.glue.aws-access-key`, this parameter takes precedence over `hive.metastore.glue.iam-role`.
`hive.metastore.glue.catalogid`	The ID of the Glue Catalog in which the metadata database resides.
`hive.metastore.glue.iam-role`	ARN of an IAM role to assume when connecting to the Glue Catalog.
`hive.metastore.glue.external-id`	External ID for the IAM role trust policy when connecting to the Glue Catalog.
`hive.metastore.glue.partitions-segments`	Number of segments for partitioned Glue tables.	`5`
`hive.metastore.glue.skip-archive`	AWS Glue has the ability to archive older table versions and a user can roll back the table to any historical version if needed. By default, the Hive Connector backed by Glue will not skip the archival of older table versions.	`false`

Iceberg-specific Glue catalog configuration properties#

When using the Glue catalog, the Iceberg connector supports the same general Glue configuration properties as previously described with the following additional property:

Iceberg Glue catalog configuration property#
Property name	Description	Default
`iceberg.glue.cache-table-metadata`	While updating the table in AWS Glue, store the table metadata with the purpose of accelerating `information_schema.columns` and `system.metadata.table_comments` queries.	`true`

Starburst data catalog#

Starburst offers a self-hosted metastore called Starburst data catalog. Starburst data catalog provides a Glue-compatible API and an Iceberg REST API, and is deployable in Kubernetes environments using Helm. It lets you run self-managed SEP clusters with metadata management without requiring access to AWS infrastructure.

Installation#

Deploy Starburst data catalog with Helm. As of version 476-e, Starburst data catalog and Starburst Gateway are deployed together using a unified Helm chart called starburst-portal. This new deployment artifact simplifies installation and ensures compatibility with the latest features and fixes. Standalone installation of Starburst data catalog using the old starburst-catalog chart is no longer supported.

You can install from the OCI registry with the following command, replacing <version> with the appropriate Helm chart version:

helm install starburst-portal oci://harbor.starburstdata.net/starburstdata/charts/starburst-portal --version <version> --values registry-access.yaml --values <your-values-file>.yaml --namespace <namespace>

Alternatively, you can install from a local chart file using the following command:

helm install starburst-portal starburst-portal-<version>.tgz

Configuration#

The values.yaml file defines how Starburst data catalog is configured during Helm deployment. You must specify settings such as the metadata backend, service ports, and credentials. The following example shows a basic configuration that sets the service environment and HTTP port, points to a PostgreSQL database used for metadata storage, and specifies credentials for SEP to authenticate with the Glue-compatible API:

etcFiles:
  config: |
    node.environment=starburst_portal
    http-server.http.port=8080
    http-server.http.enabled=true
    http-server.https.port: 8443
    http-server.https.enabled: false
    credentials-provider.type=file
    credentials-provider.credentials-file-path=/etc/starburst/catalog-credentials.json
    persistence.jdbc.url=jdbc:postgresql://postgresql:5432/<db/schema-name>
    persistence.jdbc.user=<db-username>
    persistence.jdbc.password=<db-password>

  catalogCredentials: |
    [
      {
        "emulated": {
          "accessKey": "<sep access-key>",
          "secretKey": "<sep secret>"
        }
      }
    ]

Secrets#

Starburst data catalog supports using secrets for sensitive configuration values such as passwords, keys, and credentials.

For more information on configuring secrets, see Secrets.

Logging configuration#

To set log levels, add the logProperties field within the etcFiles section of your Helm values:

etcFiles:
  logProperties: |
    io.starburst=INFO

Configuration properties#

Starburst data catalog includes the following optional configuration properties.

Starburst data catalog configuration properties#
Property name	Description
`node.environment`	Sets the environment name for the Starburst Data Catalog service.
`log.path`	Specifies the file path for Starburst Data Catalog server logs.
`http-server.http.port`	Sets the HTTP port where Starburst Data Catalog listens.
`http-server.log.path`	Specifies the file path for HTTP server logs.
`credentials-provider.type`	Defines the type of credentials provider to use.
`credentials-provider.credentials-file-path`	Specifies the path to the credentials file when using file-based credentials provider.
`persistence.jdbc.url`	Specifies the JDBC URL for the metadata database connection.
`persistence.jdbc.user`	Sets the username for authenticating to the metadata database.
`persistence.jdbc.password`	Sets the password for authenticating to the metadata database.
`catalog.default.catalog.id`	Sets the default namespace to use for SEP catalogs that do not specify a `catalogid`. Starburst data catalog uses namespaces to separate metadata from different SEP catalogs. The default value is `starburst-catalog`.
`catalog.default.location.uri`	Specifies a default storage location URI for schemas in SEP catalogs that do not specify a `location` property. Use as a fallback location prefix for storage-backed metadata. When you set this property, database names are automatically sanitized before being appended to the S3 location URI to ensure compliance with AWS Glue requirements. The generated S3 path may not exactly match the database name. Note Starburst data catalog only applies this URI as the database location when the database’s `catalogid` matches the value you configure in `catalog.default.catalog.id`. If the `catalogid` does not match, Starburst data catalog creates the database without a location, and you must specify a location for each table explicitly.

Configuration examples#

After you deploy and configure Starburst data catalog, point SEP to it using a connector that supports AWS Glue as a metastore. Use the hive.metastore.glue.catalogid property to specify different namespaces within the same Starburst data catalog instance.

Hive connector#

The following example configures the Hive connector to use Starburst data catalog:

connector.name=hive
hive.metastore=glue
hive.metastore.glue.endpoint-url=http://starburst-portal:8080/api/v1/glue
hive.metastore.glue.region=us-east-1
hive.metastore.glue.catalogid=<catalog-namespace>
hive.metastore.glue.aws-access-key=<sep-access-key>
hive.metastore.glue.aws-secret-key=<sep-secret-key>

Iceberg connector#

The following example configures the Iceberg connector to use Starburst data catalog:

connector.name=iceberg
iceberg.catalog.type=glue_v2
hive.metastore.glue.endpoint-url=http://starburst-portal:8080/api/v1/glue
hive.metastore.glue.region=us-east-1
hive.metastore.glue.catalogid=<catalog-namespace>
hive.metastore.glue.aws-access-key=<sep-access-key>
hive.metastore.glue.aws-secret-key=<sep-secret-key>
iceberg.security=system
iceberg.register-table-procedure.enabled=true

Delta Lake connector#

The following example configures the Delta Lake connector to use Starburst data catalog:

connector.name=delta_lake
hive.metastore=glue
hive.metastore.glue.endpoint-url=http://starburst-portal:8080/api/v1/glue
hive.metastore.glue.region=us-east-1
hive.metastore.glue.catalogid=<catalog-namespace>
hive.metastore.glue.aws-access-key=<sep-access-key>
hive.metastore.glue.aws-secret-key=<sep-secret-key>
delta.security=starburst
delta.register-table-procedure.enabled=true
delta.enable-non-concurrent-writes=true

Known behavioral differences with AWS Glue#

The following describes behavior differences between the Starburst data catalog and AWS Glue APIs:

Table LastAccessTime - set when the table is created. It is not automatically updated on subsequent access.
Partition LastAccessTime - set when the partition is created. It is not automatically updated on subsequent access.

Iceberg REST API#

Starburst data catalog supports the Iceberg REST API specification, allowing you to use Starburst data catalog as an Iceberg REST catalog backend for Trino and Spark workloads. This functionality is disabled by default.

To enable Iceberg REST API in Starburst data catalog, set the catalog.iceberg.enabled configuration property to true in your Starburst Portal configuration.

Authentication#

OAuth2 is the only supported authentication mechanism for the Iceberg REST API with Starburst data catalog. Keycloak is the only officially supported identity provider (IdP).

Caution

Authentication mechanisms are specific to the API in use. OAuth2 authentication is supported only for the Iceberg REST API, while AWS authentication (SigV4) is supported only for the Glue API. The Glue API does not support OAuth2, and the Iceberg REST API does not support SigV4.

OAuth2 authentication

The Iceberg library allows configuring either a bearer token or client credentials to authenticate with the OAuth2-secured server. When a bearer token is used, token refresh functionality can be enabled to let the Iceberg client automatically refresh the token when necessary. When the client credentials flow is used, the scope that Iceberg includes in generated tokens must be configured. Optionally, the audience can also be configured. See the official Iceberg documentation or SEP Iceberg connector documentation for more information on configuring the client.

Configuration properties#

The following configuration properties are available:

Starburst data catalog Iceberg REST API configuration properties#
Property	Default	Description
`catalog.iceberg.catalog-name`	`<empty>`	Defines the name for an Iceberg catalog that exists by default. Required.
`catalog.iceberg.location`	`<empty>`	Defines the default location for schemas (Iceberg namespaces) created in the default catalog. Required.
`catalog.iceberg.enabled`	`false`	Enables Iceberg catalog support and Iceberg REST API server.
`catalog.iceberg.authentication.type`	`none`	Authentication mechanism to use for the Iceberg REST API. Allowed values: `none`, `oauth2`
`catalog.iceberg.oauth2.issuer`	`<empty>`	The required issuer of a token. Required when OAuth2 authentication is enabled for the Iceberg REST API.
`catalog.iceberg.oauth2.audience`	`<empty>`	Additional audiences to validate in the token. Accepts comma-separated list of strings. All provided audiences must be present for a token to be accepted.
`catalog.iceberg.oauth2.jwk-url`	`<empty>`	The URI of the JSON Web Key Set (JWKS) endpoint. Required when OAuth2 authentication is enabled for the Iceberg REST API.
`fs.native-s3.enabled`	`false`	Enables native S3 file system for file access in Starburst data catalog. Refer to S3 file system support for a detailed list of configuration options and more information.
`fs.native-gcs.enabled`	`false`	Enables native Google Cloud Storage file system for file access in Starburst data catalog. Refer to Google Cloud Storage file system support for a detailed list of configuration options and more information.
`fs.native-azure.enabled`	`false`	Enables native Azure Storage file system for file access in Starburst data catalog. Refer to Azure Storage file system support for a detailed list of configuration options and more information.
`fs.native-local.enabled`	`false`	Enables native local file system for file access in Starburst data catalog. Refer to local file system support for a detailed list of configuration options and more information.

File system support#

Starburst data catalog uses the same file system functionality as SEP object storage connectors. Only native file system implementations are supported. The following file systems are available:

Native S3 (fs.native-s3.enabled)
Native Google Cloud Storage (fs.native-gcs.enabled)
Native Azure Storage (fs.native-azure.enabled)
Native local file system (fs.native-local.enabled)

Note

The Alluxio file system and OAuth2 pass-through for ABFS are not supported with Starburst data catalog.

See the native file system documentation for detailed configuration options for each file system type.

Configuration examples#

View the following sections for example configurations for using Starburst Portal, Trino Iceberg connector, and Spark with the Iceberg REST API.

Starburst Portal

The following example shows a configuration of Starburst Portal with Iceberg REST API, file system enabled:

With OAuth2 authentication:

etcFiles:
  config: |
    node.environment=starburst_portal
    http-server.http.port=8080
    http-server.http.enabled=true
    http-server.https.port: 8443
    http-server.https.enabled: false
    persistence.jdbc.url=jdbc:postgresql://postgresql:5432/<db/schema-name>
    persistence.jdbc.user=<db-username>
    persistence.jdbc.password=<db-password>
    catalog.iceberg.catalog-name=starburst-catalog
    catalog.iceberg.location=s3://catalog-data/starburst-catalog
    catalog.iceberg.enabled=true
    catalog.iceberg.authentication.type=oauth2
    catalog.iceberg.oauth2.issuer=https://keycloak-idp/realms/<realm>
    catalog.iceberg.oauth2.jwk-url=https://keycloak-idp/realms/<realm>/protocol/openid-connect/certs
    fs.native-s3.enabled=true
    s3.aws-access-key=<aws_access_key_id>
    s3.aws-secret-key=<aws_secret_access_key>

Without authentication:

etcFiles:
  config: |
    node.environment=starburst_portal
    http-server.http.port=8080
    http-server.http.enabled=true
    http-server.https.port: 8443
    http-server.https.enabled: false
    persistence.jdbc.url=jdbc:postgresql://postgresql:5432/<db/schema-name>
    persistence.jdbc.user=<db-username>
    persistence.jdbc.password=<db-password>
    catalog.iceberg.catalog-name=starburst-catalog
    catalog.iceberg.location=s3://catalog-data/starburst-catalog
    catalog.iceberg.enabled=true
    fs.native-s3.enabled=true
    s3.aws-access-key=<aws_access_key_id>
    s3.aws-secret-key=<aws_secret_access_key>

Trino Iceberg connector

The following example shows a configuration for using the Iceberg connector with Starburst data catalog and the Iceberg REST API:

With OAuth2 authentication:

connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=http://starburst-portal:8080/iceberg
iceberg.rest-catalog.warehouse=<catalog-namespace>
iceberg.rest-catalog.security=oauth2
iceberg.rest-catalog.oauth2.server-uri=https://keycloak-idp/realms/<realm>/protocol/openid-connect/token
iceberg.rest-catalog.oauth2.credential=<oauth2_client_id>:<oauth2_client_secret>
iceberg.rest-catalog.oauth2.scope=<scope_for_the_token>
iceberg.security=system
iceberg.register-table-procedure.enabled=true

Without authentication:

connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=http://starburst-portal:8080/iceberg
iceberg.rest-catalog.warehouse=<catalog-namespace>
iceberg.security=system
iceberg.register-table-procedure.enabled=true

Spark

The following example shows a configuration for submitting Spark jobs with Starburst data catalog and the Iceberg REST API:

With OAuth2 authentication:

spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.defaultCatalog=iceberg
spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.iceberg.catalog-impl=org.apache.iceberg.rest.RESTCatalog
spark.sql.catalog.iceberg.uri=http://starburst-portal:8080/iceberg
spark.sql.catalog.iceberg.warehouse=<catalog-namespace>
spark.sql.catalog.iceberg.rest.auth.type=oauth2
spark.sql.catalog.iceberg.oauth2-server-uri=https://keycloak-idp/realms/<realm>/protocol/openid-connect/token
spark.sql.catalog.iceberg.scope=<scope_for_the_token>
spark.sql.catalog.iceberg.credential=<oauth2_client_id>:<oauth2_client_secret>
spark.sql.catalog.iceberg.audience=<token_audience>
spark.sql.catalog.iceberg.io-impl=org.apache.iceberg.hadoop.HadoopFileIO

spark.hadoop.fs.s3a.access.key=<s3_governance_access_key>
spark.hadoop.fs.s3a.secret.key=<s3_governance_secret_key>

Without OAuth2 authentication:

spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.defaultCatalog=iceberg
spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.iceberg.catalog-impl=org.apache.iceberg.rest.RESTCatalog
spark.sql.catalog.iceberg.uri=http://starburst-portal:8080/iceberg
spark.sql.catalog.iceberg.warehouse=<catalog-namespace>
spark.sql.catalog.iceberg.io-impl=org.apache.iceberg.hadoop.HadoopFileIO

spark.hadoop.fs.s3a.access.key=<s3_governance_access_key>
spark.hadoop.fs.s3a.secret.key=<s3_governance_secret_key>

The above example shows authentication for the file system assuming the job is submitted using Dell Data Processing Engine with object storage governance enabled.

Note

The spark.sql.catalog.iceberg.audience property should only be set if catalog.iceberg.oauth2.audience is configured in Starburst data catalog with a matching value. This configuration allows the Iceberg library to request tokens with the specified audience, while Starburst data catalog only accepts tokens that include that audience.

Setting the audience property in Starburst data catalog currently prevents usage with SEP, as the Iceberg connector does not support configuring the audience for tokens.

Limitations#

Starburst data catalog configures file system access globally at the deployment level, using a single endpoint and authentication configuration. This means all Iceberg catalogs managed by the same Starburst data catalog instance must store their table data on the same object storage system that is accessible through that single configuration. For example, when using Dell ECS for on-premises storage, all catalogs must use the same ECS instance and endpoint.
For Trino Iceberg connections, CREATE SCHEMA must include the location parameter.

Unity catalog#

Starburst supports integration with the Unity Catalog as a metastore for the Hive, Delta Lake, and Iceberg connectors.

The following example shows a minimal catalog configuration using the Unity Catalog as a metastore for the Delta Lake connector:

delta.security=unity
hive.metastore.unity.host=host
hive.metastore.unity.token=token
hive.metastore.unity.catalog-name=main

For more information, read the Unity catalog with the Delta Lake connector or Unity catalog with the Hive connector.

When connecting to Unity Catalog as a metastore using an Iceberg REST catalog, iceberg.security must be read_only:

connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=https://dbc-12345678-9999.cloud.databricks.com/api/2.1/unity-catalog/iceberg
iceberg.security=read_only
iceberg.rest-catalog.security=OAUTH2
iceberg.rest-catalog.oauth2.token=***

Read the REST catalogs section for more information when connecting to Databricks Unity Catalog using an Iceberg REST catalog.

Iceberg-specific metastores#

The Iceberg table format manages most metadata in metadata files in the object storage itself. A small amount of metadata still requires the use of a metastore. In the Iceberg ecosystem, these smaller metastores are called Iceberg metadata catalogs, or just catalogs.

You can use a general metastore such as an HMS or AWS Glue, or you can use an Iceberg-specific REST implementation such as Polaris, Nessie, or JDBC metadata catalog as discussed in this section.

REST catalogs#

In order to use the Iceberg REST Catalog implementation such as Polaris, configure the catalog type with iceberg.catalog.type=rest, and provide further details with the following properties:

Iceberg REST catalog configuration properties#
Property name	Description
`iceberg.rest-catalog.uri`	REST server API endpoint URI (required). Example: `http://iceberg-with-rest:8181`
`iceberg.rest-catalog.prefix`	The prefix for the resource path to use with the REST catalog server (optional). Example: `dev`
`iceberg.rest-catalog.warehouse`	Warehouse identifier/location for the catalog (optional). Example: `s3://my_bucket/warehouse_location`
`iceberg.rest-catalog.security`	The type of security to use (default: `NONE`). `OAUTH2` requires either a `token` or `credential`. Example: `OAUTH2`
`iceberg.rest-catalog.session`	Session information included when communicating with the REST Catalog. Options are `NONE` or `USER` (default: `NONE`).
`iceberg.rest-catalog.session-timeout`	Duration to keep authentication session in cache. Defaults to `1h`.
`iceberg.rest-catalog.oauth2.token`	The bearer token used for interactions with the server. A `token` or `credential` is required for `OAUTH2` security. Example: `AbCdEf123456`
`iceberg.rest-catalog.oauth2.credential`	The credential to exchange for a token in the OAuth2 client credentials flow with the server. A `token` or `credential` is required for `OAUTH2` security. Example: `AbCdEf123456`
`iceberg.rest-catalog.oauth2.scope`	Scope to be used when communicating with the REST Catalog. Applicable only when using `credential`.
`iceberg.rest-catalog.oauth2.server-uri`	The endpoint to retrieve access token from OAuth2 Server.
`iceberg.rest-catalog.oauth2.token-refresh-enabled`	Controls whether a token should be refreshed if information about its expiration time is available. Defaults to `true`.
`iceberg.rest-catalog.vended-credentials-enabled`	Use credentials provided by the REST backend for file system access. Defaults to `false`.
`iceberg.rest-catalog.nested-namespace-enabled`	Support querying objects under nested namespace. Defaults to `false`.
`iceberg.rest-catalog.case-insensitive-name-matching`	Match namespace, table, and view names case insensitively. Defaults to `false`.
`iceberg.rest-catalog.case-insensitive-name-matching.cache-ttl`	Duration for which case-insensitive namespace, table, and view names are cached. Defaults to `1m`.

The following example shows a minimal catalog configuration using an Iceberg REST metadata catalog:

connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=http://iceberg-with-rest:8181

The REST catalog supports view management using the Iceberg View specification.

The REST catalog does not support materialized view management.

JDBC catalog#

The Iceberg JDBC catalog is supported for the Iceberg connector. At a minimum, iceberg.jdbc-catalog.driver-class, iceberg.jdbc-catalog.connection-url, iceberg.jdbc-catalog.default-warehouse-dir, and iceberg.jdbc-catalog.catalog-name must be configured. When using any database besides PostgreSQL, a JDBC driver jar file must be placed in the plugin directory.

JDBC catalog configuration properties#
Property name	Description
`iceberg.jdbc-catalog.driver-class`	JDBC driver class name.
`iceberg.jdbc-catalog.connection-url`	The URI to connect to the JDBC server.
`iceberg.jdbc-catalog.connection-user`	User name for JDBC client.
`iceberg.jdbc-catalog.connection-password`	Password for JDBC client.
`iceberg.jdbc-catalog.catalog-name`	Iceberg JDBC metastore catalog name.
`iceberg.jdbc-catalog.default-warehouse-dir`	The default warehouse directory to use for JDBC.
`iceberg.jdbc-catalog.schema-version`	JDBC catalog schema version. Valid values are `V0` or `V1`. Defaults to `V1`.
`iceberg.jdbc-catalog.retryable-status-codes`	On connection error to JDBC metastore, retry if it is one of these JDBC status codes. Valid value is a comma-separated list of status codes. Note: JDBC catalog always retries the following status codes: `08000,08003,08006,08007,40001`. Specify only additional codes (such as `57000,57P03,57P04` if using PostgreSQL driver) here.

Warning

The JDBC catalog may have compatibility issues if Iceberg introduces breaking changes in the future. Consider the REST catalog as an alternative solution.

The JDBC catalog requires the metadata tables to already exist. Refer to Iceberg repository for creating those tables.

The following example shows a minimal catalog configuration using an Iceberg JDBC metadata catalog:

connector.name=iceberg
iceberg.catalog.type=jdbc
iceberg.jdbc-catalog.catalog-name=test
iceberg.jdbc-catalog.driver-class=org.postgresql.Driver
iceberg.jdbc-catalog.connection-url=jdbc:postgresql://example.net:5432/database
iceberg.jdbc-catalog.connection-user=admin
iceberg.jdbc-catalog.connection-password=test
iceberg.jdbc-catalog.default-warehouse-dir=s3://bucket

The JDBC catalog does not support materialized view management.

Nessie catalog#

In order to use a Nessie catalog, configure the catalog type with iceberg.catalog.type=nessie and provide further details with the following properties:

Nessie catalog configuration properties#
Property name	Description
`iceberg.nessie-catalog.uri`	Nessie API endpoint URI (required). Example: `https://localhost:19120/api/v1`
`iceberg.nessie-catalog.ref`	The branch/tag to use for Nessie. Defaults to `main`.
`iceberg.nessie-catalog.default-warehouse-dir`	Default warehouse directory for schemas created without an explicit `location` property. Example: `/tmp`
`iceberg.nessie-catalog.read-timeout`	The read timeout duration for requests to the Nessie server. Defaults to `25s`.
`iceberg.nessie-catalog.connection-timeout`	The connection timeout duration for connection requests to the Nessie server. Defaults to `5s`.
`iceberg.nessie-catalog.enable-compression`	Configure whether compression should be enabled or not for requests to the Nessie server. Defaults to `true`.
`iceberg.nessie-catalog.authentication.type`	The authentication type to use. Available value is `BEARER`. Defaults to no authentication.
`iceberg.nessie-catalog.authentication.token`	The token to use with `BEARER` authentication. Example: `SXVLUXUhIExFQ0tFUiEK`
`iceberg.nessie-catalog.client-api-version`	Optional version of the Client API version to use. By default it is inferred from the `iceberg.nessie-catalog.uri` value. Valid values are `V1` or `V2`.

connector.name=iceberg
iceberg.catalog.type=nessie
iceberg.nessie-catalog.uri=https://localhost:19120/api/v2
iceberg.nessie-catalog.default-warehouse-dir=/tmp

The Nessie catalog does not support view management or materialized view management.

Snowflake catalog#

In order to use a Snowflake catalog, configure the catalog type with iceberg.catalog.type=snowflake and provide further details with the following properties:

Snowflake catalog configuration properties#
Property name	Description
`iceberg.snowflake-catalog.account-uri`	Snowflake JDBC account URI (required). Example: `jdbc:snowflake://example123456789.snowflakecomputing.com`
`iceberg.snowflake-catalog.user`	Snowflake user (required).
`iceberg.snowflake-catalog.password`	Snowflake password (required).
`iceberg.snowflake-catalog.database`	Snowflake database name (required).
`iceberg.snowflake-catalog.role`	Snowflake role name

connector.name=iceberg
iceberg.catalog.type=snowflake
iceberg.snowflake-catalog.account-uri=jdbc:snowflake://example1234567890.snowflakecomputing.com
iceberg.snowflake-catalog.user=user
iceberg.snowflake-catalog.password=secret
iceberg.snowflake-catalog.database=db

When using the Snowflake catalog, data management tasks such as creating tables, must be performed in Snowflake because using the catalog from external systems like Trino only supports SELECT queries and other read operations.

Additionally, the Snowflake-created Iceberg tables do not expose partitioning information, which prevents efficient parallel reads and therefore can have significant negative performance implications.

The Snowflake catalog does not support view management or materialized view management.

Further information is available in the Snowflake catalog documentation.

Access Amazon S3 Tables#

To use an Iceberg REST catalog to access Amazon S3 Tables in an S3 table bucket, configure the catalog type with iceberg.catalog.type=rest. In addition to the REST catalogs properties include the following configuration properties:

AWS S3 Tables configuration properties#
Property name	Description
`iceberg.rest-catalog.view-endpoints-enabled`	Enables or disables view endpoints for the Iceberg REST catalog. Must be set to `false`. S3 tables does not support views.
`iceberg.rest-catalog.sigv4-enabled`	Enables SigV4 authentication for AWS S3, and is required to access S3 buckets. Must be Set to `true`.
`iceberg.rest-catalog.signing-name`	Specifies the AWS SigV4 signing service name. For example, set to `glue` when using AWS Glue as the catalog service.
`s3.region`	The AWS region where your S3 table bucket is located.
`s3.aws-access-key`	The AWS access key ID used for authentication to access the S3 table bucket.
`s3.aws-secret-key`	The AWS secret access key used for authentication to access the S3 table bucket.

To use and manage Amazon S3 Tables in an S3 table bucket using the AWS Glue Iceberg REST endpoint, add the following configuration properties to your catalog configuration file:

iceberg.catalog.type=rest
iceberg.rest-catalog.uri=https://glue.<aws-region>.amazonaws.com/iceberg
iceberg.rest-catalog.warehouse=<aws-account-id>:s3tablescatalog/<s3-tables-bucket-name>
iceberg.rest-catalog.view-endpoints-enabled=false
iceberg.rest-catalog.sigv4-enabled=true
iceberg.rest-catalog.signing-name=glue
fs.hadoop.enabled=false
fs.native-s3.enabled=true
s3.region=<aws-region>
s3.aws-access-key=<access_key>
s3.aws-secret-key=<secret_key>

Connect to the Amazon S3 Tables Iceberg REST endpoint:

iceberg.catalog.type=rest
iceberg.rest-catalog.uri=https://s3tables.<aws-region>.amazonaws.com/iceberg
iceberg.rest-catalog.warehouse=arn:aws:s3tables:<aws-region:<aws-account-id:bucket/<s3-tables-bucket-name>
iceberg.rest-catalog.view-endpoints-enabled=false
iceberg.rest-catalog.sigv4-enabled=true
iceberg.rest-catalog.signing-name=s3tables
fs.hadoop.enabled=false
fs.native-s3.enabled=true
s3.region=<aws-region>
s3.aws-access-key=<access_key>
s3.aws-secret-key=<secret_key>

For more information see the Iceberg connector documentation.

Access tables with Athena partition projection metadata#

Partition projection is a feature of AWS Athena often used to speed up query processing with highly partitioned tables when using the Hive connector.

Trino supports partition projection table properties stored in the Hive metastore or Glue catalog, and it reimplements this functionality. Currently, there is a limitation in comparison to AWS Athena for date projection, as it only supports intervals of DAYS, HOURS, MINUTES, and SECONDS.

If there are any compatibility issues blocking access to a requested table when partition projection is enabled, set the partition_projection_ignore table property to true for a table to bypass any errors.

Refer to Table properties and Column properties for configuration of partition projection.

Configure metastore for Avro#

For catalogs using the Hive connector, you must add the following property definition to the Hive metastore configuration file hive-site.xml and restart the metastore service to enable first-class support for Avro tables when using Hive 3.x:

<property>
     <!-- https://community.hortonworks.com/content/supportkb/247055/errorjavalangunsupportedoperationexception-storage.html -->
     <name>metastore.storage.schema.reader.impl</name>
     <value>org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader</value>
 </property>

Metastores#

General metastore configuration properties#

Thrift metastore configuration properties#

Iceberg-specific Hive catalog configuration properties#

Thrift metastore authentication#

Default NONE authentication without impersonation#

KERBEROS authentication with impersonation#

AWS Glue catalog configuration properties#

Iceberg-specific Glue catalog configuration properties#

Starburst data catalog#

Installation#

Configuration#

Secrets#

Logging configuration#

Configuration properties#

Configuration examples#

Hive connector#

Iceberg connector#

Delta Lake connector#

Known behavioral differences with AWS Glue#

Iceberg REST API#

Authentication#

Configuration properties#

File system support#

Configuration examples#

Limitations#

Unity catalog#

Iceberg-specific metastores#

REST catalogs#

JDBC catalog#

Nessie catalog#

Snowflake catalog#

Access Amazon S3 Tables#

Access tables with Athena partition projection metadata#

Configure metastore for Avro#

Default `NONE` authentication without impersonation#

`KERBEROS` authentication with impersonation#