Starburst SQL Server connector#

The Starburst SQL Server Connector is an extended version of the SQL Server connector with configuration and usage identical.

The following improvements are included:

Requirements#

Fulfill the SQL Server connector requirements.
Additional features of the connector require a valid Starburst Enterprise license, unless otherwise noted.

Configuration#

The connector configuration is identical to the configuration for the base SQL Server connector.

A minimal configuration, with the connector-name sqlserver and the JDBC connections details, is shown in the following snippet. If you do not have the necessary configuration established, you can disable encryption in the connection string with the encrypt property:

connector.name=sqlserver
connection-url=jdbc:sqlserver://<host>:<port>;database=<database>;encrypt=false
connection-user=sqlserver_username
connection-password=sqlserver_password

The connection-url parameter can use any valid JDBC connection string supported by the SQL Server JDBC driver. This allows you to specify a SQL Server instance and other parameters as desired in your specific use case.

Dynamic catalog selection#

The default configuration, such as in a file etc/example.properties, enables the connection to one database running on a SQL Server instance.

connector.name=sqlserver
connection-url=jdbc:sqlserver://dbserver.example.com:1443/exampledb;encrypt=false

The connector supports connecting to multiple SQL Server databases using a single catalog by setting an override_catalog session property. This support has to be enabled in the catalog properties file with the sqlserver.override-catalog.enabled property:

connector.name=sqlserver
connection-url=jdbc:sqlserver://dbserver.example.com:1443/exampledb;encrypt=false
sqlserver.override-catalog.enabled=true

With sufficient access rights, available to the user specified in connection-user, the preceding example allows you to query any table in any schema in the database exampledb on the SQL Server dbserver:

SELECT * FROM example.exampleschema.exampletable;

In order to query another database, such as testdb, you have to override the database configured in the catalog. From then on you can query that database in the current user session:

SET SESSION example.override_catalog=testdb;
SELECT * FROM example.testdbschema.testdbtable;

Note

The access rights to the databases, schemas, tables and actual rows, continue to be determined by the configured user for the connection, and any other security setup like impersonation or Apache Ranger integration. Make sure these access rights are as restrictive as required.

Type mapping#

Because SEP and SQL Server each support types that the other does not, this connector modifies some types when reading or writing data.

SQL Server to SEP type mapping#

The following read type mapping applies when data is read from existing tables in SQL Server, or inserted into existing tables in SQL Server from SEP.

SQL Server to SEP read type mapping#
SQL Server type	SEP type	Notes
`BIT`	`BOOLEAN`
`BIGINT`	`BIGINT`
`SMALLINT`	`SMALLINT`
`TINYINT`	`TINYINT`
`INT`	`INTEGER`
`DECIMAL(p, s),` NUMERIC(p, s)``	`DECIMAL(p, s)`	for `p` <= 38
`DOUBLE PRECISION`	`DOUBLE`
`FLOAT`	`DOUBLE`
`BINARY`	`VARBINARY`
`CHAR(n)`	`CHAR(n)`
`VARCHAR(n)`	`VARCHAR(n)`
`NCHAR(n)`	`CHAR(n)`
`NVARCHAR(n)`	`VARCHAR(n)`
`DATE`	`DATE`
`DATETIME2(n)`	`TIMESTAMP(n)`
`TIME`	`TIME`

No other types are supported.

SEP to SQL Server type mapping#

The following write type mapping applies when tables are created in SQL Server from SEP.

SEP to SQL Server write type mapping#
SEP type	SQL Server type	Notes
`BOOLEAN`	`BIT`
`BIGINT`	`BIGINT`
`INTEGER`	`INT`
`SMALLINT`	`SMALLINT`
`DOUBLE`	`DOUBLE PRECISION`
`CHAR(n <= 4000)`	`NCHAR(n)`
`CHAR(n > 4000)`	`NVARCHAR(max)`
`VARCHAR(n <= 4000)`	`NVARCHAR(n)`
`VARCHAR, VARCHAR(n > 4000)`	`NVARCHAR(max)`
`VARBINARY`	`VARBINARY(max)`
`DATE`	`DATE`
`TIME`	`TIME`
`TIMESTAMP(n <= 7)` without timezone	`DATETIME2(n)`
`TIMESTAMP(n > 7)` without timezone	`DATETIME2(7)`

SQL support#

The connector supports all of the SQL statements listed in the SQL Server connector documentation.

The following section describes additional SQL operations that are supported by SEP enhancements to the Trino connector.

Insert#

The connector supports adding rows using INSERT statements. By default, data insertion is performed by writing data to a temporary table. You can skip this step to improve performance and write directly to the target table. Set the catalog property sqlserver.non-transactional-insert.enabled or the corresponding catalog session property non_transactional_insert to true.

In rare cases when exceptions occur during the insert operation, data in the target table can be corrupted. Since transactions have been disabled, no rollback can be performed.

ALTER TABLE EXECUTE#

The connector supports the following commands for use with ALTER TABLE EXECUTE:

collect_statistics#

The collect_statistics command is used with Managed statistics to collect statistics for a table and its columns.

The following statement collects statistics for the example_table table and all of its columns:

ALTER TABLE example_table EXECUTE collect_statistics;

Collecting statistics for all columns in a table may be unnecessarily performance-intensive, especially for wide tables. To only collect statistics for a subset of columns, you can include the columns parameter with an array of column names. For example:

ALTER TABLE example_table
    EXECUTE collect_statistics(columns => ARRAY['customer','line_item']);

Performance#

The connector includes a number of performance improvements, detailed in the following sections.

Managed statistics#

The connector supports Managed statistics allowing SEP to collect and store its own table and column statistics that can then be used for performance optimizations in query planning.

Statistics must be collected manually using the built-in collect_statistics command, see collect_statistics for details and examples.

Parallelism#

The connector is able to read data from SQL Server using multiple parallel connections for tables partitioned as described in the SQL Server partitioning documentation.

SQL Server parallelism configuration properties#
Property name	Description	Default
`sqlserver.parallel.connections_count`	Defines the maximum number of parallel splits during query execution. Use this property to limit the number of splits for tables with a large number of partitions to avoid opening a large number of connections. The default value of `1` disables parallelism. The maximum number of connections is automatically limited to the number of partitions in the table; if this number is set higher, it is ignored, and the number of partitions used instead. The equivalent catalog session property is `parallel_connections_count`.	1

In the event that multiple parallel connections result in a deadlocked state, the connector attempts to retry the operation up to 3 times.

Dynamic filtering#

Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a JDBC query.

You can disable dynamic filtering by setting the dynamic-filtering.enabled property in your catalog configuration file to false.

Wait timeout#

By default, table scans on the connector are delayed up to 20 seconds until dynamic filters are collected from the build side of joins. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries.

You can configure the dynamic-filtering.wait-timeout property in your catalog properties file:

dynamic-filtering.wait-timeout=1m

You can use the dynamic_filtering_wait_timeout catalog session property in a specific session:

SET SESSION example.dynamic_filtering_wait_timeout = 1s;

Compaction#

The maximum size of dynamic filter predicate, that is pushed down to the connector during table scan for a column, is configured using the domain-compaction-threshold property in the catalog properties file:

domain-compaction-threshold=100

You can use the domain_compaction_threshold catalog session property:

SET SESSION domain_compaction_threshold = 10;

By default, domain-compaction-threshold is set to 32. When the dynamic predicate for a column exceeds this threshold, it is compacted into a single range predicate.

For example, if the dynamic filter collected for a date column dt on the fact table selects more than 32 days, the filtering condition is simplified from dt IN ('2020-01-10', '2020-01-12',..., '2020-05-30') to dt BETWEEN '2020-01-10' AND '2020-05-30'. Using a large threshold can result in increased table scan overhead due to a large IN list getting pushed down to the data source.

Metrics#

Metrics about dynamic filtering are reported in a JMX table for each catalog:

jmx.current."io.trino.plugin.jdbc:name=example,type=dynamicfilteringstats"

Metrics include information about the total number of dynamic filters, the number of completed dynamic filters, the number of available dynamic filters and the time spent waiting for dynamic filters.

Pushdown#

The connector supports pushdown of all operations listed in the SQL Server connector documentation.

Starburst Cached Views#

The connectors supports table scan redirection to improve performance and reduce load on the data source.

Security#

The connector includes a number of security-related features, detailed in the following sections.

User impersonation#

The SQL Server connector supports user impersonation.

User impersonation can be enabled in the catalog file:

sqlserver.impersonation.enabled=true

User impersonation in SQL Server connector is based on EXECUTE AS USER. For more details visit: docs.microsoft.com.

Kerberos authentication#

The connector supports Kerberos authentication. Use the following properties in the catalog properties file to configure it.

sqlserver.authentication.type=KERBEROS
kerberos.client.principal=example@example.com
kerberos.client.keytab=etc/kerberos/example.keytab
kerberos.config=etc/kerberos/krb5.conf

In this example the user example@example.com, as defined in the kerberos.client.principal property, is used to connect to the database. The related Kerberos service ticket is located in the file defined in the kerberos.client.keytab property.

Kerberos credential pass-through#

The connector can be configured to pass through Kerberos credentials, received by SEP, to the SQL server database. This allows you to apply Kerberos-defined permissions to SQL Server connections through SEP.

To configure credential pass-through in Kerberos and SEP, see Kerberos credential pass-through.

After you configure Kerberos and SEP, edit the catalog properties file to enable the connector to pass the credentials to the SQL Server database. Configure the following Kerberos client configuration properties in the catalog properties file:

sqlserver.authentication.type=KERBEROS_PASS_THROUGH
http.authentication.krb5.config=/etc/krb5.conf
http-server.authentication.krb5.service-name=exampleServiceName
http-server.authentication.krb5.keytab=/path/to/Keytab/File

Any SQL server database accessed using SEP is now subject to the Kerberos-defined data access restrictions and permissions.

Password credential pass-through#

The connector supports password credential pass-through. To enable it, edit the catalog properties file to include the authentication type:

sqlserver.authentication.type=PASSWORD_PASS_THROUGH

NTLM authentication#

The connector supports NTLM authentication as an alternative to Kerberos authentication for environments where Kerberos auth is not feasible.

To enable NTLM authentication, configure the following properties in the catalog properties file:

connection-user=ad_username
connection-password=ad_password
sqlserver.authentication.type=NTLM_PASSWORD

In this example, the user ad_username and password ad_password are Active Directory credentials for a single user who has underlying access to the data source.

TLS/HTTPS settings#

Microsoft strongly suggests encrypting traffic using TLS when using NTLM authentication.

If you have globally-trusted certificates installed on both SEP and the data source, enable TLS by appending the encrypt=true parameter to the connection-url catalog configuration property:

connection-url=jdbc:sqlserver://<host>:<port>;database=<database>;encrypt=true

If you do not have globally-trusted certificates installed, you can instead use certificates trusted by a custom truststore. The following catalog configuration properties manage custom truststore configuration for the SQL Server connector:

TLS truststore configuration properties#
Property name	Description
`sqlserver.tls.truststore-path`	Path to the truststore file in SEP.
`sqlserver.tls.truststore-password`	Password used to generate the truststore file
`sqlserver.tls.truststore-type`	The type of truststore. Supports either `JKS` or `PKCS12` as values.

The following catalog properties configuration specifies a custom JKS truststore to enable TLS for NTLM authentication:

sqlserver.tls.truststore-path=path/to/truststore.jks
sqlserver.tls.truststore-password=insecurepassword
sqlserver.tls.truststore-type=JKS

NTLM credential pass-through#

THe connector supports NTLM credential pass-through. To enable this, edit the catalog properties file to include the authentication type:

sqlserver.authentication.type=NTLM_PASSWORD_PASS_THROUGH