Release 429-e LTS (29 Nov 2023)#

The 429-e release includes all improvements from the following Trino releases:

Highlights since 423-e#

Breaking changes since 423-e#

  • The SEP backend service has been updated to require PostgreSQL 12.0+ when using PostgreSQL as the underlying RDBMS.

  • TIMESTAMP type mapping between MySQL and Trino is no longer TIMESTAMP to TIMESTAMP. The new conversion is MySQL TIMESTAMP to Trino TIMESTAMP WITH TIMEZONE. Depending on the query, mapping from MySQL TIMESTAMP to Trino TIMESTAMP may result in an error message.

  • The deprecated.hive.metastore.glue-read-properties-based-column-statistics Hive Metastore configuration property and underlying functionality has been removed. You must remove this configuration property or the cluster fails to start.

  • The updated base Docker image for SEP no longer includes curl, vi, nano, sed, awk, grep, and other popular command line tools. Starburst recommends using an init container with a base image that includes your needed command line tools. Guidance on using init containers and selecting suitable base images can be found in our init container documentation.

  • SEP 427-e uses a base system image that does not contain a system-wide trust store. Trusted, self-signed certificates must now be added to the Java distribution CA certificates located under $JAVA_HOME/lib/security/cacerts.

  • The legacy parse-decimal-literals-as-double configuration property has been removed. Clusters that use this property must have it removed from configuration or the cluster does not start.

  • The following deprecated task writer configuration properties have been removed:

    • task.writer-count, replaced by prop-task-min-writer-count.

    • task.partitioned-writer-count, replaced by prop-task-max-writer-count.

    • task.scale-writers.max-writer-count, replaced by prop-task-max-writer-count.

    • writer-min-size, replaced by writer-scaling-min-data-processed.

    You must remove these properties from the cluster configuration and replace them with these replacement properties, or the cluster does not start.

  • The Snowflake distributed connector is now deprecated and is planned to be removed in a future SEP release, in favor of the improved Snowflake parallel connector. Existing catalogs that use the Snowflake distributed connector must be migrated to the Snowflake parallel connector.

  • The RPM package service daemon script is now deprecated and is planned to be removed in a future SEP release. Configurations that rely on this script must be updated to use the systemctl daemon script instead.

  • As of the 429-e release, table functions such as query are qualified with the system.builtin schema. This change results in Access Denied errors in circumstances where a role was granted a privilege to execute table functions not qualified under a schema. Permissions for these roles must now be updated accordingly to execute table functions when qualified with an appropriate schema.

429-e initial changes#

General#

  • Added support for publishing data products that contain decimal literals.

  • Updated usage metrics to upload data collected between previous upload and the coordinator shutdown or restart.

  • Fixed issue that prevented the Run and troubleshoot option in the query editor from working when built-in access control is enabled.

Security#

  • Added session logout to OAuth 2.0 providers when logging out from the SEP web UI.

  • Fixed issue that prevented tables and columns inside information_schema from being displayed when built-in access control is used.

  • Fixed JavaScript policy evaluation in Privacera.

Hive connector#

Delta Lake connector#

MongoDB connector#

Snowflake connector#

  • Updated connectors to use fully parallel mode by default for more query shapes.

SQL Server connector#

  • Added the sqlserver.database-prefix-for-schema.enabled catalog configuration property that allows SQL Server catalogs to access multiple databases.

429-e.0 changes (29 Nov 2023)#

  • Improved support for concurrent updates of table statistics in Glue.

  • Added masking for additional sensitive values in log files.

  • Added casting of char fields, if necessary, to varchar type in Hive view translations.

  • Added support for RENAME SCHEMA and RENAME TABLE when the snowflake.database-prefix-for-schema.enabled property is set to true.

  • Remediated CVE-2023-41900

  • Fixed incorrect results for queries involving an aggregation in a correlated subquery.

  • Fixed incorrect results for queries involving ORDER BY and window functions with ordered frames.

  • Fixed launcher start command not working with default directories.

  • Fixed possible JVM crash when reading short decimal columns in parquet files created by Impala. Applies to the Hive, Hudi, Delta, and Iceberg connectors.

  • Fixed incorrect results when a query contains several != or NOT IN predicates in MongoDB catalogs.

429-e.1 changes (21 Dec 2023)#

  • Improved query planning time on Hive tables without statistics generated.

  • Fixed long query planning times for queries with many local exchanges.

  • Fixed query failure when reading parquet column index for timestamped columns in Hive, Delta, Iceberg, and Hudi tables.

  • Fixed incorrect results for LIKE with some strings containing repeated substrings.

  • Fixed coordinator memory leak.

429-e.2 changes (18 Jan 2024)#

  • Fixed incorrect results on parquet files containing page indexes when the query has filters on multiple columns in Hive, Delta, and Hudi tables.

  • Fixed an issue with the Run and troubleshoot Run button option writing to empty directories without the option being selected.

429-e.3 changes (14 Feb 2024)#

  • Fixed Teradata custom dates format.

  • Fixed query failure when reading array columns.

  • Fixed a bug where an entire directory is skipped from schema discovery if at least one file matched the excludePatterns option.

  • Fixed out-of-bound (OOB) telemetry null pointer exception in parallel Snowflake connector.

  • Fixed complex expression pushdown in the Redshift connector.

  • Fixed a bug where query history displayed queries of another user.

429-e.4 changes (11 Mar 2024)#

  • Updated Kubernetes external secret operator.

  • Fixed UI authentication for large authentication tokens.

  • Fixed incorrect results for DATETIMEOFFSET values before the year 1400.

  • Fixed query failure when using char types with the reverse() function.

  • Fixed potential incorrect results when using the ST_Centroid() and ST_Buffer() functions for tiny geometries.

  • Fixed schema, table, and function visibility in BIAC filtering.

  • Fixed a bug where column statistics created in SEP would not be visible in Hive when using CDP 7.

429-e.5 changes (28 Mar 2024)#

  • Fixed an issue which caused the sync_partition_metadata operation to fail when partition paths had case changes.

  • Restored support for SymlinkTextInputFormat for text formats.

  • Fixed reading Delta Lake files with encoded characters on Azure.

  • Fixed failure when reading certain Avro data with UNION data types.

429-e.6 changes (17 Apr 2024)#

  • Enabled PyStarburst dataframe API by default.

  • Fixed possible worker crashes when running aggregation queries due to out-of-memory error.

  • Fixed incorrect results when querying a table being modified concurrently.

  • Fixed handling of union options in Hive and Avro to allow coercion to a single type.

  • Fixed a bug that caused the creation of materialized views to fail when using MySQL as the cache service backend database if materialized_view_definitions is longer than 64K characters.

429-e.7 changes (20 May 2024)#

  • Fixed potential query failure due to worker nodes running out of memory in concurrent scenarios.

  • Fixed incorrect result with deletion vector on Delta partitioned table.

  • Fixed correctness bug in constant literal distinct aggregation.

  • Fixed Prometheus whiteListObjectNames being overwritten when KEDA is enabled.

429-e.8 changes (14 Jun 2024)#

  • Fixed potential failure when reading ORC files larger than 2GB.

  • Fixed startup failure when fault-tolerant execution is enabled with Google Cloud Storage exchange.

  • Fixed potential loss of a query completion event when multiple queries fail at the same time.

  • Backported IMDSv2 service metadata access.

429-e.9 changes (28 Jun 2024)#

  • Fixed incorrect results when specifying a value for the cassandra.partition-size-for-batch-select configuration property.

  • Fixed failure when writing to tables with Iceberg VARBINARY values.

  • Fixed correctness issue on receivers refresh that could cause query hanging.

429-e.10 changes (11 Jul 2024)#

  • Added encoding to error code in OAuth2 callback handler.

  • Fixed reading empty files from S3 and GCS.

  • Fixed issue syncing partition metadata which could cause data deletion.

429-e.11 changes (29 Jul 2024)#

  • Fixed bug preventing use of Starburst security in Delta Lake connector.

429-e.12 changes (14 Aug 2024)#

  • Fixed optimizer timeout for certain queries involving aggregations and CASE expressions.

  • Fixed failure when adding new columns with a decimal type.

  • Fixed failure to read Hive tables migrated to Iceberg with Apache Spark.

  • Fixed issue that caused the error ‘Multiple masks on a single column are not supported’ to occur unintentionally.

429-e.13 changes (30 Aug 2024)#

  • Fixed query failure when file-based network topology is configured with the node-scheduler.network-topology.file configuration property.

429-e.14 changes (13 Sep 2024)#

  • Fixed a bug that caused cluster metrics to be created with incorrect intervals and subsequently led to loss of cluster metrics data.

  • Fixed Run and troubleshoot feature when insights.authorized-groups configuration property contains authorized groups.

  • Fixed numeric overflow during managed statistics computation for large tables in Teradata mode session.

429-e.15 was skipped.

429-e.16 changes (18 Oct 2024)#

  • Fixed OpenX JSON decoding a JSON array line that resulted in data being written to the wrong output column.

  • Fixed reading large Prometheus responses.

  • Fixed failures for count(*) queries with predicates containing non-ASCII strings. Applies to the Elasticsearch connector.

429-e.17 was skipped.

429-e.18 changes (4 Nov 2024)#

  • Use hive.metastore.partition-batch-size.max config property value in sync_partition_metadata procedure. The default batch size is changed to 100 from 1000.

429-e.19 changes (14 Nov 2024)#

  • Fixed memory leak in InMemoryEventClient within cache service.