Starburst Delta Lake connector#
The Starburst Delta Lake connector is an extended version of the Delta Lake connector with configuration and usage identical.
Requirements#
To connect to Databricks Delta Lake, you need:
Fulfill the Delta Lake connector requirements.
A valid Starburst Enterprise license.
Extensions#
The connector includes all the functionality described in the Delta Lake connector as well as the features and integrations detailed in the following section:
Unity catalog#
The connector supports reading from managed, internal tables when using the Databricks Unity Catalog as a metastore.
Note
The Databricks Unity Catalog metastore is available for Delta Lake as a public preview. Contact Starburst Support with questions or feedback.
To use Unity Catalog metastore, add the following configuration to your Delta Lake catalog:
hive.metastore=unity
delta.security=read_only
delta.metastore.unity.host=<unity catalog hostname>
delta.metastore.unity.access-token=<token>
The following table shows the configuration properties used to connect SEP to Unity Catalog as a metastore.
Property name |
Description |
---|---|
|
Name of the host, without http(s) prefix, for example:
|
|
The token used to authenticate a connection to the Unity Catalog metastore. For more information about generating access tokens, see the Databricks documentation. |
|
(Optional) Name of the catalog in Databricks. Default is |
SQL support#
The connector supports all of the SQL statements listed in the Delta Lake connector documentation.
The following improvements are included:
Security operations, see also SQL security
Table replacement, see Replacing tables
SQL security#
You must set the delta.security
property in your catalog properties file to
sql-standard
in order to use SQL security operation statements. See SQL standard based authorization for
more information.
Replacing tables#
The connector supports replacing a table as an atomic operation. Atomic table replacement creates a new snapshot with the new table definition (see CREATE TABLE and CREATE TABLE AS), but keeps table history.
The new table after replacement is completely new and separate from the old table. Only the name of the table remains identical.
For example a partitioned table my_table
can be replaced by a completely new
definition.
CREATE TABLE my_table (
a BIGINT,
b DATE,
c BIGINT)
WITH (partitioning = ARRAY['a']);
CREATE OR REPLACE TABLE my_table
WITH (sorted_by = ARRAY['a'])
AS SELECT * from another_table;
Table replacement in the Starburst Delta Lake connector has the following limitations:
Table replacement does not work on append-only Delta Lake tables.
Table replacement does not work for tables with the
change_data_feed_enabled
property set totrue
.Table replacement does not work if the new table after replacement has the
change_data_feed_enabled
property set totrue
.Table replacement does not work if the location specified in the property is different from the location of the existing table.
Table types must stay the same. For example, table replacement cannot be used to replace a managed table with an external table.
Performance#
The connector includes a number of performance improvements, detailed in the following sections:
Dynamic row filtering#
Dynamic filtering, and specifically also dynamic row filtering, is enabled by default. Row filtering improves the effectiveness of dynamic filtering for a connector by using dynamic filters to remove unnecessary rows during a table scan. It is especially powerful for selective filters on columns that are not used for partitioning, bucketing, or when the values do not appear in any clustered order naturally.
As a result the amount of data read from storage and transferred across the network is further reduced. You get access to higher query performance and a reduced cost.
You can use the following properties to configure dynamic row filtering:
Property name |
Description |
---|---|
|
Toggle dynamic row filtering. Defaults to |
|
Control the threshold for the fraction of the selected rows from the
overall table above which dynamic row filters are not used. Defaults to
0.7. Catalog session property name is
|
|
Duration to wait for completion of dynamic row filtering. Defaults to 0.
The default causes query processing to proceed without waiting for the
dynamic row filter, it is collected asynchronously and used as soon as
it becomes available. Catalog session property name is
|
Starburst Cached Views#
The connector supports table scan redirection to improve performance and reduce load on the data source.
Security#
The connector includes a number of security-related features, detailed in the following sections.
Built-in access control#
If you have enabled built-in access control for SEP, you must add the following configuration to all Delta Lake catalogs:
delta.security=starburst