Starburst SQL Server connector#
The Starburst SQL Server Connector is an extended version of the SQL Server connector with configuration and usage identical.
The following improvements are included:
Requirements#
Fulfill the SQL Server connector requirements.
Additional features of the connector require a valid Starburst Enterprise license, unless otherwise noted.
Configuration#
The connector configuration is identical to the configuration for the base SQL Server connector.
A minimal configuration, with the connector-name sqlserver
and the JDBC
connections details, is shown in the following snippet. If you do not have the
necessary configuration established, you can disable encryption in the
connection string with the encrypt property:
connector.name=sqlserver
connection-url=jdbc:sqlserver://<host>:<port>;database=<database>;encrypt=false
connection-user=sqlserver_username
connection-password=sqlserver_password
The connection-url
parameter can use any valid JDBC connection string
supported by the SQL Server JDBC
driver.
This allows you to specify a SQL Server instance and other parameters as desired
in your specific use case.
Dynamic catalog selection#
The default configuration, such as in a file etc/example.properties
,
enables the connection to one database running on a SQL Server instance.
connector.name=sqlserver
connection-url=jdbc:sqlserver://dbserver.example.com:1443/exampledb;encrypt=false
The connector supports connecting to multiple SQL Server databases using a
single catalog by setting an override_catalog
session property. This support
has to be enabled in the catalog properties file with the
sqlserver.override-catalog.enabled
property:
connector.name=sqlserver
connection-url=jdbc:sqlserver://dbserver.example.com:1443/exampledb;encrypt=false
sqlserver.override-catalog.enabled=true
With sufficient access rights, available to the user specified in
connection-user
, the preceding example allows you to query any table in any
schema in the database exampledb
on the SQL Server dbserver
:
SELECT * FROM example.exampleschema.exampletable;
In order to query another database, such as testdb
, you have to override the
database configured in the catalog. From then on you can query that database in
the current user session:
SET SESSION example.override_catalog=testdb;
SELECT * FROM example.testdbschema.testdbtable;
Note
The access rights to the databases, schemas, tables and actual rows, continue to be determined by the configured user for the connection, and any other security setup like impersonation or Apache Ranger integration. Make sure these access rights are as restrictive as required.
Type mapping#
Because SEP and SQL Server each support types that the other does not, this connector modifies some types when reading or writing data.
SQL Server to SEP type mapping#
The following read type mapping applies when data is read from existing tables in SQL Server, or inserted into existing tables in SQL Server from SEP.
SQL Server type |
SEP type |
Notes |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
for |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
No other types are supported.
SEP to SQL Server type mapping#
The following write type mapping applies when tables are created in SQL Server from SEP.
SEP type |
SQL Server type |
Notes |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SQL support#
The connector supports all of the SQL statements listed in the SQL Server connector documentation.
The following section describes additional SQL operations that are supported by SEP enhancements to the Trino connector.
Insert#
The connector supports adding rows using INSERT statements.
By default, data insertion is performed by writing data to a temporary table.
You can skip this step to improve performance and write directly to the target
table. Set the catalog property sqlserver.non-transactional-insert.enabled
or the corresponding catalog session property non_transactional_insert
to
true
.
In rare cases when exceptions occur during the insert operation, data in the target table can be corrupted. Since transactions have been disabled, no rollback can be performed.
ALTER TABLE EXECUTE#
The connector supports the following commands for use with ALTER TABLE EXECUTE:
collect_statistics#
The collect_statistics
command is used with
Managed statistics to collect statistics for a table
and its columns.
The following statement collects statistics for the example_table
table
and all of its columns:
ALTER TABLE example_table EXECUTE collect_statistics;
Collecting statistics for all columns in a table may be unnecessarily
performance-intensive, especially for wide tables. To only collect statistics
for a subset of columns, you can include the columns
parameter with an
array of column names. For example:
ALTER TABLE example_table
EXECUTE collect_statistics(columns => ARRAY['customer','line_item']);
Performance#
The connector includes a number of performance improvements, detailed in the following sections.
Managed statistics#
The connector supports Managed statistics allowing SEP to collect and store its own table and column statistics that can then be used for performance optimizations in query planning.
Statistics must be collected manually using the built-in collect_statistics
command, see collect_statistics for details and
examples.
Parallelism#
The connector is able to read data from SQL Server using multiple parallel connections for tables partitioned as described in the SQL Server partitioning documentation.
Property name |
Description |
Default |
---|---|---|
|
Defines the maximum number of parallel splits during query execution. Use
this property to limit the number of splits for tables with a large number
of partitions to avoid opening a large number of connections. The default
value of |
1 |
In the event that multiple parallel connections result in a deadlocked state, the connector attempts to retry the operation up to 3 times.
Dynamic filtering#
Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a JDBC query.
You can disable dynamic filtering by setting the dynamic-filtering.enabled
property in your catalog configuration file to false
.
Wait timeout#
By default, table scans on the connector are delayed up to 20 seconds until dynamic filters are collected from the build side of joins. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries.
You can configure the dynamic-filtering.wait-timeout
property in your
catalog properties file:
dynamic-filtering.wait-timeout=1m
You can use the dynamic_filtering_wait_timeout
catalog session property in a specific session:
SET SESSION example.dynamic_filtering_wait_timeout = 1s;
Compaction#
The maximum size of dynamic filter predicate, that is pushed down to the
connector during table scan for a column, is configured using the
domain-compaction-threshold
property in the catalog
properties file:
domain-compaction-threshold=100
You can use the domain_compaction_threshold
catalog
session property:
SET SESSION domain_compaction_threshold = 10;
By default, domain-compaction-threshold
is set to 32
.
When the dynamic predicate for a column exceeds this threshold, it is compacted
into a single range predicate.
For example, if the dynamic filter collected for a date column dt
on the
fact table selects more than 32 days, the filtering condition is simplified from
dt IN ('2020-01-10', '2020-01-12',..., '2020-05-30')
to dt BETWEEN '2020-01-10' AND '2020-05-30'
. Using a large threshold can result in increased
table scan overhead due to a large IN
list getting pushed down to the data
source.
Metrics#
Metrics about dynamic filtering are reported in a JMX table for each catalog:
jmx.current."io.trino.plugin.jdbc:name=example,type=dynamicfilteringstats"
Metrics include information about the total number of dynamic filters, the number of completed dynamic filters, the number of available dynamic filters and the time spent waiting for dynamic filters.
Pushdown#
The connector supports pushdown of all operations listed in the SQL Server connector documentation.
Starburst Cached Views#
The connectors supports table scan redirection to improve performance and reduce load on the data source.
Security#
The connector includes a number of security-related features, detailed in the following sections.
User impersonation#
The SQL Server connector supports user impersonation.
User impersonation can be enabled in the catalog file:
sqlserver.impersonation.enabled=true
User impersonation in SQL Server connector is based on EXECUTE AS USER
. For
more details visit:
docs.microsoft.com.
Kerberos authentication#
The connector supports Kerberos authentication. Use the following properties in the catalog properties file to configure it.
sqlserver.authentication.type=KERBEROS
kerberos.client.principal=example@example.com
kerberos.client.keytab=etc/kerberos/example.keytab
kerberos.config=etc/kerberos/krb5.conf
In this example the user example@example.com
, as defined
in the kerberos.client.principal
property, is used to connect to the
database. The related Kerberos service ticket is located in the file defined in
the kerberos.client.keytab
property.
Kerberos credential pass-through#
The connector can be configured to pass through Kerberos credentials, received by SEP, to the SQL server database. This allows you to apply Kerberos-defined permissions to SQL Server connections through SEP.
To configure credential pass-through in Kerberos and SEP, see Kerberos credential pass-through.
After you configure Kerberos and SEP, edit the catalog properties file to enable the connector to pass the credentials to the SQL Server database. Configure the following Kerberos client configuration properties in the catalog properties file:
sqlserver.authentication.type=KERBEROS_PASS_THROUGH
http.authentication.krb5.config=/etc/krb5.conf
http-server.authentication.krb5.service-name=exampleServiceName
http-server.authentication.krb5.keytab=/path/to/Keytab/File
Any SQL server database accessed using SEP is now subject to the Kerberos-defined data access restrictions and permissions.
Password credential pass-through#
The connector supports password credential pass-through. To enable it, edit the catalog properties file to include the authentication type:
sqlserver.authentication.type=PASSWORD_PASS_THROUGH
NTLM authentication#
The connector supports NTLM authentication as an alternative to Kerberos authentication for environments where Kerberos auth is not feasible.
To enable NTLM authentication, configure the following properties in the catalog properties file:
connection-user=ad_username
connection-password=ad_password
sqlserver.authentication.type=NTLM_PASSWORD
In this example, the user ad_username
and password ad_password
are
Active Directory credentials for a single user who has underlying access to the
data source.
TLS/HTTPS settings#
Microsoft strongly suggests encrypting traffic using TLS when using NTLM authentication.
If you have globally-trusted certificates installed on both SEP and the data
source, enable TLS by appending the encrypt=true
parameter to the
connection-url
catalog configuration property:
connection-url=jdbc:sqlserver://<host>:<port>;database=<database>;encrypt=true
If you do not have globally-trusted certificates installed, you can instead use certificates trusted by a custom truststore. The following catalog configuration properties manage custom truststore configuration for the SQL Server connector:
Property name |
Description |
---|---|
|
Path to the truststore file in SEP. |
|
Password used to generate the truststore file |
|
The type of truststore. Supports either |
The following catalog properties configuration specifies a custom JKS truststore to enable TLS for NTLM authentication:
sqlserver.tls.truststore-path=path/to/truststore.jks
sqlserver.tls.truststore-password=insecurepassword
sqlserver.tls.truststore-type=JKS
NTLM credential pass-through#
THe connector supports NTLM credential pass-through. To enable this, edit the catalog properties file to include the authentication type:
sqlserver.authentication.type=NTLM_PASSWORD_PASS_THROUGH