Starburst SingleStore connector#
The Starburst SingleStore (formerly known as MemSQL) connector is an extended version of the SingleStore connector. It can be used to connect and query SingleStore databases. Configuration and usage is identical.
Requirements#
Fulfill the SingleStore connector requirements.
Additional features of the connector require a valid Starburst Enterprise license, unless otherwise noted.
SQL support#
The connector supports all of the SQL statements listed in the SingleStore connector documentation.
ALTER TABLE EXECUTE#
The connector supports the following commands for use with ALTER TABLE EXECUTE:
collect_statistics#
The collect_statistics
command is used with
Managed statistics to collect statistics for a table
and its columns.
The following statement collects statistics for the example_table
table
and all of its columns:
ALTER TABLE example_table EXECUTE collect_statistics;
Collecting statistics for all columns in a table may be unnecessarily
performance-intensive, especially for wide tables. To only collect statistics
for a subset of columns, you can include the columns
parameter with an
array of column names. For example:
ALTER TABLE example_table
EXECUTE collect_statistics(columns => ARRAY['customer','line_item']);
Fault-tolerant execution support#
The connector supports Fault-tolerant execution of query processing. Read and write operations are both supported with any retry policy.
Performance#
The connector includes a number of performance improvements, detailed in the following sections.
Table statistics#
This feature is available for free, and does not require a valid license.
The SingleStore connector can use table and column statistics for cost based optimizations, to improve query processing performance based on the actual data in the data source.
The statistics are collected by SingleStore and retrieved by the connector.
Table-level statistics are based on SingleStore’s
INFORMATION_SCHEMA.TABLE_STATISTICS
table.
SingleStore can automatically update its table and column statistics. In some cases, you may want to force a statistics update, for example after creating new columns or after changing data in the table. You can do that by executing the following statement in the SingleStore database:
ANALYZE TABLE table_name;
Note
SingleStore statistics are estimates, and SEP and SingleStore may use statistics information in different ways. For this reason, the accuracy of table and column statistics returned by the SingleStore connector might be lower than that of others connectors.
Improving statistics accuracy
You can improve statistics accuracy and access column-level statistics with
histogram statistics. Column-level statistics are based on SingleStore’s column
statistics INFORMATION_SCHEMA.ADVANCED_HISTOGRAMS
table, if available. If
that table is not available, the information is based on the
INFORMATION_SCHEMA.OPTIMIZER_STATISTICS
table instead.
The ADVANCED_HISTOGRAMS
table includes additional stats such as the MIN
and MAX
values for a column, which are not available in the
LEGACY_HISTOGRAMS
table. SingleStore’s ADVANCED_HISTOGRAMS
feature
requires version 6.5 or later, and a cardinality_estimation_level
>= 6.5.
To check for these conditions and determine whether ADVANCED_HISTOGRAMS
are
available to you, execute the following statement in SingleStore:
SELECT
COLUMN_NAME,
IF(RANGE_STATS=1, true, false) as histograms_available,
IF(ADVANCED_HISTOGRAMS=1, 'Advanced', 'Legacy') as histogram_type
FROM INFORMATION_SCHEMA.OPTIMIZER_STATISTICS
WHERE DATABASE_NAME = 'db' AND TABLE_NAME = 'table';
If they are available in your SingleStore version, use the following statement
to populate the ADVANCED_HISTOGRAMS
table:
ANALYZE TABLE table_name COLUMNS ALL ENABLE;
Refer to the SingleStore documentation for information about options, limitations and additional considerations.
Managed statistics#
The connector supports Managed statistics allowing SEP to collect and store its own table and column statistics that can then be used for performance optimizations in query planning.
Statistics must be collected manually using the built-in collect_statistics
command, see collect_statistics for details and examples.
Pushdown#
The connector supports pushdown of all operations listed in the SingleStore connector documentation.
In addition, the connector supports pushdown for the following aggregate functions:
count()
, alsocount(distinct x)
variance()
andvar_samp()
Cost-based join pushdown#
The connector supports cost-based Join pushdown to make intelligent decisions about whether to push down a join operation to the data source.
When cost-based join pushdown is enabled, the connector only pushes down join operations if the available Table statistics suggest that doing so improves performance. Note that if no table statistics are available, join operation pushdown does not occur to avoid a potential decrease in query performance.
The following table describes catalog configuration properties for join pushdown:
Property name |
Description |
Default value |
---|---|---|
|
Enable join pushdown. Equivalent catalog
session property is
|
|
|
Strategy used to evaluate whether join operations are pushed down. Set to
|
|
Dynamic filtering#
Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a JDBC query.
You can disable dynamic filtering by setting the dynamic-filtering.enabled
property in your catalog configuration file to false
.
Wait timeout#
By default, table scans on the connector are delayed up to 20 seconds until dynamic filters are collected from the build side of joins. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries.
You can configure the dynamic-filtering.wait-timeout
property in your
catalog properties file:
dynamic-filtering.wait-timeout=1m
You can use the dynamic_filtering_wait_timeout
catalog session property in a specific session:
SET SESSION example.dynamic_filtering_wait_timeout = 1s;
Compaction#
The maximum size of dynamic filter predicate, that is pushed down to the
connector during table scan for a column, is configured using the
domain-compaction-threshold
property in the catalog
properties file:
domain-compaction-threshold=100
You can use the domain_compaction_threshold
catalog
session property:
SET SESSION domain_compaction_threshold = 10;
By default, domain-compaction-threshold
is set to 32
.
When the dynamic predicate for a column exceeds this threshold, it is compacted
into a single range predicate.
For example, if the dynamic filter collected for a date column dt
on the
fact table selects more than 32 days, the filtering condition is simplified from
dt IN ('2020-01-10', '2020-01-12',..., '2020-05-30')
to dt BETWEEN '2020-01-10' AND '2020-05-30'
. Using a large threshold can result in increased
table scan overhead due to a large IN
list getting pushed down to the data
source.
Metrics#
Metrics about dynamic filtering are reported in a JMX table for each catalog:
jmx.current."io.trino.plugin.jdbc:name=example,type=dynamicfilteringstats"
Metrics include information about the total number of dynamic filters, the number of completed dynamic filters, the number of available dynamic filters and the time spent waiting for dynamic filters.
Starburst Cached Views#
The connectors supports table scan redirection to improve performance and reduce load on the data source.
Parallelism#
The connector is able to read data from SingleStore using multiple parallel connections for tables partitioned as described in the SingleStore documentation.
Parallelism is disabled by default. Set the following catalog configuration property to enable parallelism:
singlestore.parallelism-type=RESULT_TABLE_PARALLELISM
When this feature is enabled, SEP reads each partition of a SingleStore table in parallel and uses materialized result tables.
Security#
The SingleStore connector includes a number of security-related features, detailed in the following sections.
Kerberos authentication#
The connector supports Kerberos authentication. Use the following properties in the catalog properties file to configure it.
singlestore.authentication.type=KERBEROS
kerberos.client.principal=example@example.com
kerberos.client.keytab=etc/kerberos/example.keytab
kerberos.config=etc/kerberos/krb5.conf
With this configuration the user example@example.com
, defined in the
principal property, is used to connect to the database, and the related Kerberos
service ticket is located in the example.keytab
file.
The SingleStore connector authenticates to Kerberos using the Java
Authentication and Authorization Service (JAAS). The file is set via the
java.security.auth.login.config
JVM system property. If this system property
is not set, the connector automatically generates a file with the following
contents using values from the catalog configuration, and sets the system
property to the path of the generated file.
Krb5ConnectorContext {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
doNotPrompt=true
isInitiator=true
principal="${kerberos.client.principal}"
keyTab="${kerberos.client.keytab}";
};
A single JAAS configuration file is shared for the entire JVM. This means
multiple SingleStore catalogs must use the same principal and keytab.
Users can create their own JAAS configuration and set the system property in
the jvm.config
configuration, however there must be a
Krb5ConnectorContext
for the connector to work correctly.