Oracle connector#
The Oracle connector lets you query and create tables in an external Oracle database. This can be used to join data provided by different databases, like Oracle and Hive, or different Oracle database instances.
SEP includes additional enterprise features that are built on top of the existing Trino connector functionality. For more information on connector key feature differences between Trino and SEP, see the connectors feature matrix.
Requirements#
To connect to Oracle, you need:
Oracle 19 or higher.
Network access from the SEP coordinator and workers to Oracle. Port 1521 is the default port.
A valid Starburst Enterprise license.
Configuration#
To configure the Oracle connector, create a catalog properties file that
specifies the Oracle connector by setting the connector.name
to Oracle
.
For example, to access a database as the example
catalog, create the file
etc/catalog/example.properties
. Replace the connection properties as
appropriate for your setup:
connector.name=oracle
# The correct syntax of the connection-url varies by Oracle version and
# configuration. The following example URL connects to an Oracle SID named
# "orcl".
connection-url=jdbc:oracle:thin:@example.net:1521:orcl
connection-user=root
connection-password=secret
The connection-url
defines the connection information and parameters to pass
to the JDBC driver. The Oracle connector uses the Oracle JDBC Thin driver, and
the syntax of the URL may be different depending on your Oracle configuration.
For example, the connection URL is different if you are connecting to an Oracle
SID or an Oracle service name. See the Oracle Database JDBC driver
documentation
for more information.
The connection-user
and connection-password
are typically required and
determine the user credentials for the connection, often a service user. You can
use secrets to avoid exposing actual values in the
catalog properties files.
Note
Oracle does not expose metadata comment via REMARKS
column by default in JDBC
driver. You can enable it by using the oracle.remarks-reporting.enabled
config
option. See Additional Oracle Performance
Extensions
for more details.
By default, the Oracle connector uses connection pooling for performance improvement. The following configuration shows the typical default values:
oracle.connection-pool.max-size=30
oracle.connection-pool.min-size=1
oracle.connection-pool.inactive-timeout=20m
To disable connection pooling, update properties to include the following:
oracle.connection-pool.enabled=false
Data source authentication#
The connector can provide credentials for the data source connection in multiple ways:
inline, in the connector configuration file
in a separate properties file
in a key store file
as extra credentials set when connecting to Trino
You can use secrets to avoid storing sensitive values in the catalog properties files.
The following table describes configuration properties for connection credentials:
Property name |
Description |
---|---|
|
Type of the credential provider. Must be one of |
|
Connection user name. |
|
Connection password. |
|
Name of the extra credentials property, whose value to use as the user
name. See |
|
Name of the extra credentials property, whose value to use as the password. |
|
Location of the properties file where credentials are present. It must
contain the |
|
The location of the Java Keystore file, from which to read credentials. |
|
File format of the keystore file, for example |
|
Password for the key store. |
|
Name of the key store entity to use as the user name. |
|
Password for the user name key store entity. |
|
Name of the key store entity to use as the password. |
|
Password for the password key store entity. |
Multiple Oracle servers#
If you want to connect to multiple Oracle servers, configure another instance of the Oracle connector as a separate catalog.
To add another Oracle catalog, create a new properties file. For example, if
you name the property file sales.properties
, SEP creates a catalog named
sales
.
General configuration properties#
The following table describes general catalog configuration properties for the connector:
Property name |
Description |
---|---|
|
Support case insensitive schema and table names. Defaults to |
|
Duration for which case insensitive schema and table
names are cached. Defaults to |
|
Path to a name mapping configuration file in JSON format that allows
Trino to disambiguate between schemas and tables with similar names in
different cases. Defaults to |
|
Frequency with which Trino checks the name matching configuration file
for changes. The duration value defaults to |
|
Duration for which metadata, including table and
column statistics, is cached. Defaults to |
|
Cache the fact that metadata, including table and column statistics, is
not available. Defaults to |
|
Duration for which schema metadata is cached.
Defaults to the value of |
|
Duration for which table metadata is cached.
Defaults to the value of |
|
Duration for which tables statistics are cached.
Defaults to the value of |
|
Maximum number of objects stored in the metadata cache. Defaults to |
|
Maximum number of statements in a batched execution. Do not change
this setting from the default. Non-default values may negatively
impact performance. Defaults to |
|
Push down dynamic filters into JDBC queries. Defaults to |
|
Maximum duration for which Trino waits for dynamic
filters to be collected from the build side of joins before starting a
JDBC query. Using a large timeout can potentially result in more detailed
dynamic filters. However, it can also increase latency for some queries.
Defaults to |
Appending query metadata#
The optional parameter query.comment-format
allows you to configure a SQL
comment that is sent to the datasource with each query. The format of this
comment can contain any characters and the following metadata:
$QUERY_ID
: The identifier of the query.$USER
: The name of the user who submits the query to Trino.$SOURCE
: The identifier of the client tool used to submit the query, for exampletrino-cli
.$TRACE_TOKEN
: The trace token configured with the client tool.
The comment can provide more context about the query. This additional
information is available in the logs of the datasource. To include environment
variables from the Trino cluster with the comment , use the
${ENV:VARIABLE-NAME}
syntax.
The following example sets a simple comment that identifies each query sent by Trino:
query.comment-format=Query sent by Trino.
With this configuration, a query such as SELECT * FROM example_table;
is
sent to the datasource with the comment appended:
SELECT * FROM example_table; /*Query sent by Trino.*/
The following example improves on the preceding example by using metadata:
query.comment-format=Query $QUERY_ID sent by user $USER from Trino.
If Jane
sent the query with the query identifier
20230622_180528_00000_bkizg
, the following comment string is sent to the
datasource:
SELECT * FROM example_table; /*Query 20230622_180528_00000_bkizg sent by user Jane from Trino.*/
Note
Certain JDBC driver settings and logging configurations might cause the comment to be removed.
Domain compaction threshold#
Pushing down a large list of predicates to the data source can compromise
performance. Trino compacts large predicates into a simpler range predicate
by default to ensure a balance between performance and predicate pushdown.
If necessary, the threshold for this compaction can be increased to improve
performance when the data source is capable of taking advantage of large
predicates. Increasing this threshold may improve pushdown of large
dynamic filters.
The domain-compaction-threshold
catalog configuration property or the
domain_compaction_threshold
catalog session property can be used to adjust the default value of
32
for this threshold.
Procedures#
system.flush_metadata_cache()
Flush JDBC metadata caches. For example, the following system call flushes the metadata caches for all schemas in the
example
catalogUSE example.example_schema; CALL system.flush_metadata_cache();
Case insensitive matching#
When case-insensitive-name-matching
is set to true
, Trino
is able to query non-lowercase schemas and tables by maintaining a mapping of
the lowercase name to the actual name in the remote system. However, if two
schemas and/or tables have names that differ only in case (such as “customers”
and “Customers”) then Trino fails to query them due to ambiguity.
In these cases, use the case-insensitive-name-matching.config-file
catalog
configuration property to specify a configuration file that maps these remote
schemas/tables to their respective Trino schemas/tables:
{
"schemas": [
{
"remoteSchema": "CaseSensitiveName",
"mapping": "case_insensitive_1"
},
{
"remoteSchema": "cASEsENSITIVEnAME",
"mapping": "case_insensitive_2"
}],
"tables": [
{
"remoteSchema": "CaseSensitiveName",
"remoteTable": "tablex",
"mapping": "table_1"
},
{
"remoteSchema": "CaseSensitiveName",
"remoteTable": "TABLEX",
"mapping": "table_2"
}]
}
Queries against one of the tables or schemes defined in the mapping
attributes are run against the corresponding remote entity. For example, a query
against tables in the case_insensitive_1
schema is forwarded to the
CaseSensitiveName schema and a query against case_insensitive_2
is forwarded
to the cASEsENSITIVEnAME
schema.
At the table mapping level, a query on case_insensitive_1.table_1
as
configured above is forwarded to CaseSensitiveName.tablex
, and a query on
case_insensitive_1.table_2
is forwarded to CaseSensitiveName.TABLEX
.
By default, when a change is made to the mapping configuration file, Trino must
be restarted to load the changes. Optionally, you can set the
case-insensitive-name-mapping.refresh-period
to have Trino refresh the
properties without requiring a restart:
case-insensitive-name-mapping.refresh-period=30s
Non-transactional INSERT#
The connector supports adding rows using INSERT statements.
By default, data insertion is performed by writing data to a temporary table.
You can skip this step to improve performance and write directly to the target
table. Set the insert.non-transactional-insert.enabled
catalog property
or the corresponding non_transactional_insert
catalog session property to
true
.
Note that with this property enabled, data can be corrupted in rare cases where exceptions occur during the insert operation. With transactions disabled, no rollback can be performed.
Querying Oracle#
The Oracle connector provides a schema for every Oracle database.
Run SHOW SCHEMAS
to see the available Oracle databases:
SHOW SCHEMAS FROM example;
If you used a different name for your catalog properties file, use that catalog
name instead of example
.
Note
The Oracle user must have access to the table in order to access it from SEP. The user configuration in the connection properties file determines your privileges in these schemas.
Examples#
If you have an Oracle database named web
, run SHOW TABLES
to see the
tables it contains:
SHOW TABLES FROM example.web;
To see a list of the columns in the clicks
table in the web
database, run either of the following:
DESCRIBE example.web.clicks;
SHOW COLUMNS FROM example.web.clicks;
To access the clicks table in the web database, run the following:
SELECT * FROM example.web.clicks;
Type mapping#
Because Trino and Oracle each support types that the other does not, this connector modifies some types when reading or writing data. Data types may not map the same way in both directions between Trino and the data source. Refer to the following sections for type mapping in each direction.
Oracle to Trino type mapping#
The connector maps Oracle types to the corresponding Trino types following this table:
Oracle database type |
Trino type |
Notes |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
No other types are supported.
Trino to Oracle type mapping#
Trino supports creating tables with the following types in an Oracle database. The following table shows the mappings from Trino to Oracle data types:
Note
For types not listed in the table below, Trino can’t perform CREATE TABLE <table> AS SELECT
operations. When data is inserted into existing
tables, Oracle to Trino
type mapping is used.
Trino type |
Oracle database type |
Notes |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
No other types are supported.
Mapping numeric types#
An Oracle NUMBER(p, s)
maps to Trino’s DECIMAL(p, s)
except in these
conditions:
No precision is specified for the column (example:
NUMBER
orNUMBER(*)
), unlessoracle.number.default-scale
is set.Scale (
s
) is greater than precision.Precision (
p
) is greater than 38.Scale is negative and the difference between
p
ands
is greater than 38, unlessoracle.number.rounding-mode
is set to a different value thanUNNECESSARY
.
If s
is negative, NUMBER(p, s)
maps to DECIMAL(p + s, 0)
.
For Oracle NUMBER
(without precision and scale), you can change
oracle.number.default-scale=s
and map the column to DECIMAL(38, s)
.
Mapping datetime types#
Writing a timestamp with fractional second precision (p
) greater than 9
rounds the fractional seconds to nine digits.
Oracle DATE
type stores hours, minutes, and seconds, so it is mapped
to Trino TIMESTAMP(0)
.
Warning
Due to date and time differences in the libraries used by Trino and the
Oracle JDBC driver, attempting to insert or select a datetime value earlier
than 1582-10-15
results in an incorrect date inserted.
Mapping character types#
Trino’s VARCHAR(n)
maps to VARCHAR2(n CHAR)
if n
is less than or equal
to 4000. A larger or unbounded VARCHAR
maps to NCLOB
.
Trino’s CHAR(n)
maps to CHAR(n CHAR)
if n
is less than or equal to 2000.
A larger CHAR
maps to NCLOB
.
Using CREATE TABLE AS
to create an NCLOB
column from a CHAR
value
removes the trailing spaces from the initial values for the column. Inserting
CHAR
values into existing NCLOB
columns keeps the trailing spaces. For
example:
CREATE TABLE vals AS SELECT CAST('A' as CHAR(2001)) col;
INSERT INTO vals (col) VALUES (CAST('BB' as CHAR(2001)));
SELECT LENGTH(col) FROM vals;
_col0
-------
2001
1
(2 rows)
Attempting to write a CHAR
that doesn’t fit in the column’s actual size
fails. This is also true for the equivalent VARCHAR
types.
Type mapping configuration properties#
The following properties can be used to configure how data types from the connected data source are mapped to Trino data types and how the metadata is cached in Trino.
Property name |
Description |
Default value |
---|---|---|
|
Configure how unsupported column data types are handled:
The respective catalog session property is |
|
|
Allow forced mapping of comma separated lists of data types to convert to
unbounded |
Number to decimal configuration properties#
Configuration property name |
Session property name |
Description |
Default |
---|---|---|---|
|
|
Default Trino |
not set |
|
|
Rounding mode for the Oracle
|
|
SQL support#
The connector provides read access and write access to data and metadata in Oracle. In addition to the globally available and read operation statements, the connector supports the following statements:
UPDATE#
Only UPDATE
statements with constant assignments and predicates are
supported. For example, the following statement is supported because the values
assigned are constants:
UPDATE table SET col1 = 1 WHERE col3 = 1
Arithmetic expressions, function calls, and other non-constant UPDATE
statements are not supported. For example, the following statement is not
supported because arithmetic expressions cannot be used with the SET
command:
UPDATE table SET col1 = col2 + 2 WHERE col3 = 1
The =
, !=
, >
, <
, >=
, <=
, IN
, NOT IN
operators are supported in
predicates. The following statement is not supported because the AND
operator
cannot be used in predicates:
UPDATE table SET col1 = 1 WHERE col3 = 1 AND col2 = 3
All column values of a table row cannot be updated simultaneously. For a three column table, the following statement is not supported:
UPDATE table SET col1 = 1, col2 = 2, col3 = 3 WHERE col3 = 1
SQL DELETE#
If a WHERE
clause is specified, the DELETE
operation only works if the
predicate in the clause can be fully pushed down to the data source.
ALTER TABLE EXECUTE#
This connector supports the following commands for use with ALTER TABLE EXECUTE:
collect_statistics#
The collect_statistics
command is used with
Managed statistics to collect statistics for a table
and its columns.
The following statement collects statistics for the example_table
table
and all of its columns:
ALTER TABLE example_table EXECUTE collect_statistics;
Collecting statistics for all columns in a table may be unnecessarily
performance-intensive, especially for wide tables. To only collect statistics
for a subset of columns, you can include the columns
parameter with an
array of column names. For example:
ALTER TABLE example_table
EXECUTE collect_statistics(columns => ARRAY['customer','line_item']);
ALTER TABLE RENAME TO#
The connector does not support renaming tables across multiple schemas. For example, the following statement is supported:
ALTER TABLE example.schema_one.table_one RENAME TO example.schema_one.table_two
The following statement attempts to rename a table across schemas, and therefore is not supported:
ALTER TABLE example.schema_one.table_one RENAME TO example.schema_two.table_two
Fault-tolerant execution support#
The connector supports Fault-tolerant execution of query processing. Read and write operations are both supported with any retry policy.
Table functions#
The connector provides specific table functions to access Oracle.
query(varchar) -> table
#
The query
function lets you query the underlying database directly. It
requires syntax native to Oracle, because the full query is pushed down and
processed in Oracle. This can be useful for accessing native features which are
not available in SEP or for improving query performance in situations where
running a query natively may be faster.
The native query passed to the underlying data source is required to return a table as a result set. Only the data source performs validation or security checks for these queries using its own configuration. Trino does not perform these tasks. Only use passthrough queries to read data.
As a simple example, query the example
catalog and select an entire table:
SELECT
*
FROM
TABLE(
example.system.query(
query => 'SELECT
*
FROM
tpch.nation'
)
);
As a practical example, you can use the MODEL clause from Oracle SQL:
SELECT
SUBSTR(country, 1, 20) country,
SUBSTR(product, 1, 15) product,
year,
sales
FROM
TABLE(
example.system.query(
query => 'SELECT
*
FROM
sales_view
MODEL
RETURN UPDATED ROWS
MAIN
simple_model
PARTITION BY
country
MEASURES
sales
RULES
(sales['Bounce', 2001] = 1000,
sales['Bounce', 2002] = sales['Bounce', 2001] + sales['Bounce', 2000],
sales['Y Box', 2002] = sales['Y Box', 2001])
ORDER BY
country'
)
);
Note
The query engine does not preserve the order of the results of this
function. If the passed query contains an ORDER BY
clause, the
function result may not be ordered as expected.
Performance#
The connector includes a number of performance improvements, detailed in the following sections.
Synonyms#
To improve performance, SEP disables support for Oracle SYNONYM
. To
include SYNONYM
, add the following configuration property:
oracle.synonyms.enabled=true
Parallelism#
The connector is able to read data from Oracle using multiple parallel connections for tables partitioned as described in the Oracle partitioning documentation.
Property name |
Description |
Default |
---|---|---|
|
Determines the parallelism method. Possible values are:
|
|
|
Maximum number of parallel connections for a table scan |
10 |
Table statistics#
The Oracle connector can use table and column statistics for cost based optimizations to improve query processing performance based on the actual data in the data source.
The statistics are collected by Oracle and retrieved by the connector.
To collect statistics for a table, add the following statement to your Oracle database:
EXECUTE DBMS_STATS.GATHER_TABLE_STATS('USER_NAME', 'TABLE_NAME');
See Oracle’s documentation for additional options and instructions on invoking a procedure when you’re not using SQL*Plus.
Managed statistics#
The connector supports Managed statistics which lets SEP collect and store table and column statistics that can then be used for performance optimizations in query planning.
Statistics must be collected manually using the built-in collect_statistics
command, see collect_statistics for details
and examples.
Pushdown#
The connector supports pushdown for a number of operations:
In addition, the connector supports Aggregation pushdown for the following functions:
Pushdown is only supported for DOUBLE
type columns with the
following functions:
Pushdown is only supported for REAL
or DOUBLE
type column
with the following functions:
Note
The connector performs pushdown where performance may be improved, but in order to preserve correctness an operation may not be pushed down. When pushdown of an operation may result in better performance but risks correctness, the connector prioritizes correctness.
Join pushdown#
The join-pushdown.enabled
catalog configuration property or
join_pushdown_enabled
catalog session property control whether the connector pushes
down join operations. The property defaults to false
, and enabling join
pushdowns may negatively impact performance for some queries.
Cost-based join pushdown#
The connector supports cost-based Join pushdown to make intelligent decisions about whether to push down a join operation to the data source.
When cost-based join pushdown is enabled, the connector only pushes down join operations if the available Table statistics suggest that doing so improves performance. Note that if no table statistics are available, join operation pushdown does not occur to avoid a potential decrease in query performance.
The following table describes catalog configuration properties for join pushdown:
Property name |
Description |
Default value |
---|---|---|
|
Enable join pushdown. Equivalent catalog
session property is
|
|
|
Strategy used to evaluate whether join operations are pushed down. Set to
|
|
Predicate pushdown support#
The connector does not support pushdown of any predicates on columns that use
the CLOB
, NCLOB
, BLOB
, or RAW(n)
Oracle database types, or Trino data
types that map to these Oracle database types.
In the following example, the predicate is not pushed down for either query
because name
is a column of type VARCHAR
, which maps to NCLOB
in Oracle:
SHOW CREATE TABLE nation;
-- Create Table
----------------------------------------
-- CREATE TABLE oracle.trino_test.nation (
-- name VARCHAR
-- )
-- (1 row)
SELECT * FROM nation WHERE name > 'CANADA';
SELECT * FROM nation WHERE name = 'CANADA';
In the following example, the predicate is pushed down for both queries because
name
is a column of type VARCHAR(25)
, which maps to VARCHAR2(25)
in
Oracle:
SHOW CREATE TABLE nation;
-- Create Table
----------------------------------------
-- CREATE TABLE oracle.trino_test.nation (
-- name VARCHAR(25)
-- )
-- (1 row)
SELECT * FROM nation WHERE name > 'CANADA';
SELECT * FROM nation WHERE name = 'CANADA';
Dynamic filtering#
Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a JDBC query.
You can disable dynamic filtering by setting the dynamic-filtering.enabled
property in your catalog configuration file to false
.
Wait timeout#
By default, table scans on the connector are delayed up to 20 seconds until dynamic filters are collected from the build side of joins. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries.
You can configure the dynamic-filtering.wait-timeout
property in your
catalog properties file:
dynamic-filtering.wait-timeout=1m
You can use the dynamic_filtering_wait_timeout
catalog session property in a specific session:
SET SESSION example.dynamic_filtering_wait_timeout = 1s;
Compaction#
The maximum size of dynamic filter predicate, that is pushed down to the
connector during table scan for a column, is configured using the
domain-compaction-threshold
property in the catalog
properties file:
domain-compaction-threshold=100
You can use the domain_compaction_threshold
catalog
session property:
SET SESSION domain_compaction_threshold = 10;
By default, domain-compaction-threshold
is set to 32
.
When the dynamic predicate for a column exceeds this threshold, it is compacted
into a single range predicate.
For example, if the dynamic filter collected for a date column dt
on the
fact table selects more than 32 days, the filtering condition is simplified from
dt IN ('2020-01-10', '2020-01-12',..., '2020-05-30')
to dt BETWEEN '2020-01-10' AND '2020-05-30'
. Using a large threshold can result in increased
table scan overhead due to a large IN
list getting pushed down to the data
source.
Metrics#
Metrics about dynamic filtering are reported in a JMX table for each catalog:
jmx.current."io.trino.plugin.jdbc:name=example,type=dynamicfilteringstats"
Metrics include information about the total number of dynamic filters, the number of completed dynamic filters, the number of available dynamic filters and the time spent waiting for dynamic filters.
Starburst Cached Views#
The connector supports table scan redirection to improve performance and reduce load on the data source.
Security#
The connector includes a number of security-related features, detailed in the following sections.
User impersonation#
The connector supports user impersonation. In the Oracle connector, user impersonation creates proxy user accounts and authorizes users to connect through them in Oracle database.
To enable user impersonation in the catalog file, add the following property:
oracle.impersonation.enabled=true
For more information, go to docs.oracle.com.
Kerberos authentication#
The connector supports Kerberos authentication using either a keytab or credential cache.
To configure Kerberos authentication with a keytab, add the following catalog configuration properties to the catalog properties file:
oracle.authentication.type=KERBEROS
kerberos.client.principal=example@example.com
kerberos.client.keytab=etc/kerberos/example.keytab
kerberos.config=etc/kerberos/krb5.conf
To configure Kerberos authentication with a credential cache, add the following catalog configuration properties to the catalog properties file:
oracle.authentication.type=KERBEROS
kerberos.client.principal=example@example.com
kerberos.client.credential-cache.location=etc/kerberos/example.cache
kerberos.config=etc/kerberos/krb5.conf
In these configurations, the user example@example.com
connects to the
database. The related Kerberos service ticket is located in the
etc/kerberos/example.keytab
file or as cache credentials in the
etc/kerberos/example.cache
file.
Kerberos credential pass-through#
You can configure the Oracle connector to pass through Kerberos credentials, received by SEP, to the Oracle database. To configure Kerberos and SEP, see Kerberos credential pass-through.
After you configure Kerberos and SEP, edit the properties file to enable the connector to pass the credentials from the server to the database.
Confirm the correct Kerberos client configuration properties in the catalog properties file. For example:
oracle.authentication.type=KERBEROS_PASS_THROUGH
http.authentication.krb5.config=/etc/krb5.conf
http-server.authentication.krb5.service-name=exampleServiceName
http-server.authentication.krb5.keytab=/path/to/Keytab/File
Note
When delegated Kerberos authentication is configured
for the Starburst Enterprise web UI, make sure the http-server.authentication.krb5.service-name
value is set to HTTP
to match the configured Kerberos service name.
Now any database accessed using SEP is subject to the Kerberos-defined data access restrictions and permissions.
Password credential pass-through#
The connector supports password credential pass-through. To enable it, edit the catalog properties file to include the authentication type:
oracle.authentication.type=PASSWORD_PASS_THROUGH
For more information about configurations and limitations, see Password credential pass-through.