Starburst DynamoDB connector#
The Starburst DynamoDB connector allows querying data stored in Amazon DynamoDB.
Requirements#
To connect to DynamoDB, you need:
Amazon Web Services (AWS) access credentials with an attached policy to be able to read from DynamoDB.
Network access from the coordinator and workers to the AWS region hosting your DynamoDB tables.
A valid Starburst Enterprise license.
Configuration#
Create the example
catalog with a catalog properties file in etc/catalog
named example.properties
(replace example with your database name or some
other descriptive name of the catalog) with the following contents:
connector.name=dynamodb
dynamodb.aws-access-key=AKIAIOSFODNN7EXAMPLE
dynamodb.aws-secret-key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
dynamodb.aws-region=us-east-1
dynamodb.generate-schema-files=NEVER
dynamodb.schema-directory=/path/to/schema/directory
Specify the connector.name
property as dynamodb
. Configure the catalog
using your AWS access key, secret key, and region for your account. Additional
authentication methods are documented in the Security section.
The connector can only access DynamoDB with the configured access credentials in the catalog. If you need to access DynamoDB with different credentials or a different region, configure a separate catalog. See more information about Type mapping and schemas session and catalog session properties.
General configuration properties#
The following table describes general catalog configuration properties for the connector:
Property name |
Description |
Default value |
---|---|---|
|
Support case insensitive schema and table names. |
|
|
This value should be a duration. |
|
|
Path to a name mapping configuration file in JSON format that allows Trino to disambiguate between schemas and tables with similar names in different cases. |
|
|
Frequency with which Trino checks the name matching configuration file for changes. This value should be a duration. |
(refresh disabled) |
|
The duration for which metadata, including table and column statistics, is cached. |
|
|
Cache the fact that metadata, including table and column statistics, is not available |
|
|
Maximum number of objects stored in the metadata cache |
|
|
Maximum number of statements in a batched execution. Do not change this setting from the default. Non-default values may negatively impact performance. |
|
|
Push down dynamic filters into JDBC queries |
|
|
Maximum duration for which Trino will wait for dynamic filters to be collected from the build side of joins before starting a JDBC query. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries. |
|
Generate a schema file#
The connector exposes the data in your DynamoDB instance in the
amazondynamodb
schema of the catalog.
A schema file for a DynamoDB details the SEP table name, columns, and data types. It additionally specifies which DynamoDB attribute (or nested attribute) the connector uses to populate data for the column.
If you do not specify a schema file, the connector generates a schema file. The connector samples up to 50 rows from the DynamoDB table and uses those to generate a schema file. The schema files are in RealSQLDatabase (RSD) format and accessible using SQL commands.
The value dictates the behavior:
NEVER
(default): A schema file is never generated.ON_USE
: A schema file is generated the first time a table is referenced, provided the schema file for the table does not already exist.ON_START
: A schema file is generated at connection time for any tables that do not currently have a schema file.
Several attributes detail the mapping from the SQL type to DynamoDB:
# Attribute name
Description
Default Value
name
The name of the column in the SEP table
xs:type
The SQL type of the SEP table. Supported types are below
columnsize
Length of a SQL column that supports variable widths like
VARCHAR
255
key
true
if this column is part of the DynamoDB primary key
false
other:keytype
The type of the DynamoDB key. Valid values are either
HASH
orRANGE
other:relativepath
The name of the attribute in DynamoDB relative to any parent map or string set
other:fullpath
The full name of the attribute in DynamoDB
name
other:dynamodatatype
The DynamoDB Type. Valid values are
N
,S
,SS
, orM
Caution
A breaking change in the schema definition files is introduced with the 365-e release of SEP. If you update from 364-e or an older release to 365-e or newer, all schema definition files need to be regenerated.
Example#
This is an example of a schema file for the TPC-H orders
table. The name
of the file is orders.rsd
and goes in the directory specified by the
dynamodb.schema-directory
configuration property on every SEP host.
The api:info
element definition includes the name of the table ,
orders
, set to the title
and other:tableapiname
attributes,
along with any additional table and schema attributes.
The api:info
element contains all columns in the SEP table definition
as child elements.
<api:script xmlns:api="http://apiscript.com/ns?v1" xmlns:xs="http://www.cdata.com/ns/rsbscript/2" xmlns:other="http://apiscript.com/ns?v1">
<api:info title="orders" other:catalog="CData" other:schema="AmazonDynamoDB" description="orders" other:tableapiname="orders" other:version="20">
<attr name="orderkey" xs:type="bigint" other:relativepath="orderkey" other:fullpath="orderkey" other:dynamodatatype="N" key="true" other:keytype="HASH" />
<attr name="custkey" xs:type="bigint" other:relativepath="custkey" other:fullpath="custkey" other:dynamodatatype="N" />
<attr name="orderstatus" xs:type="string" columnsize="1" other:relativepath="orderstatus" other:fullpath="orderstatus" other:dynamodatatype="S" />
<attr name="totalprice" xs:type="double" other:relativepath="totalprice" other:fullpath="totalprice" other:dynamodatatype="N" />
<attr name="orderdate" xs:type="date" other:relativepath="orderdate" other:fullpath="orderdate" other:dynamodatatype="S" />
<attr name="orderpriority" xs:type="string" columnsize="15" other:relativepath="orderpriority" other:fullpath="orderpriority" other:dynamodatatype="S" />
<attr name="clerk" xs:type="string" columnsize="15" other:relativepath="clerk" other:fullpath="clerk" other:dynamodatatype="S" />
<attr name="shippriority" xs:type="integer" other:relativepath="shippriority" other:fullpath="shippriority" other:dynamodatatype="N" />
<attr name="comment" xs:type="string" columnsize="79" other:relativepath="comment" other:fullpath="comment" other:dynamodatatype="S" />
</api:info>
</api:script>
Type mapping#
Because Trino and DynamoDB each support types that the other does not, this connector modifies some types when reading data.
DynamoDB to Trino type mapping#
The connector maps DynamoDB types to the corresponding Trino types according to the following table:
DynamoDB type |
Trino type |
---|---|
BOOLEAN |
BOOLEAN |
BIGINT |
BIGINT |
DOUBLE |
DOUBLE |
STRING |
VARCHAR(N) |
BINARY |
VARBINARY |
DATE |
DATE |
No other types are supported.
Type mapping and schemas#
Review the table to learn more about mapping the relational values of nested attributes and arrays. This is used when generating a schema file and dictates how the table displays.
Property name |
Description |
Default value |
---|---|---|
|
The relative or absolute path to a directory that contains schema files. The directory must exist on every host. More information on generating schema files. |
|
|
Set to |
|
|
The default returns the array as a single column of JSON. Set the integer
value to |
|
Maps and string sets#
DynamoDB supports nested attributes of maps and string sets, and the connector
supports flattening these elements to relational columns. By default, any
map or string set is returned as a single JSON column of type VARCHAR
.
Take the below table foo
which has a single key row_id
and one column
bar
which is a DynamoDB type M
(only the api:info
section is shown
below in the example section).
<api:info title="foo" other:catalog="CData" other:schema="AmazonDynamoDB" description="foo" other:tableapiname="foo" other:version="20">
<attr name="row_id" xs:type="string" columnsize="255" other:relativepath="row_id" other:fullpath="row_id" other:dynamodatatype="S" key="true" other:keytype="HASH" />
<attr name="bar" xs:type="string" columnsize="2000" other:relativepath="bar" other:fullpath="bar" other:dynamodatatype="M" />
</api:info>
The following exmaple shows a value of bar
is below. It contains three keys
a
, a string, b
, a number, and c
, a string set.
{
"a": "string value",
"b": 123,
"c": ["foo", "bar"]
}
To flatten this value into a map, change the M
column in our schema file
to four columns named bar.a
, bar.b
, bar.c.0
, and bar.c.1
,
setting the SQL types and DynamoDB types via the xs:type
and
other:dynamodatatype
attributes, respectively. Then set
other:relativepath
to a
and other:fullpath
to bar.a
, where .
is used as a path separator to project into the map to select the a
attribute in the top-level bar
attribute.
The string set uses the same separator as the map. Instead of specifying an
attribute, use [0]
and [1]
to map the first and second elements of the
string set to c.0
and c.1
SQL columns.
<api:info title="foo" other:catalog="CData" other:schema="AmazonDynamoDB" description="foo" other:tableapiname="foo" other:version="20">
<attr name="row_id" xs:type="string" columnsize="255" other:relativepath="row_id" other:fullpath="row_id" other:dynamodatatype="S" key="true" other:keytype="HASH" />
<attr name="bar.a" xs:type="string" columnsize="255" other:relativepath="a" other:fullpath="bar.a" other:dynamodatatype="S" />
<attr name="bar.b" xs:type="bigint" other:relativepath="b" other:fullpath="bar.b" other:dynamodatatype="N" />
<attr name="bar.c.0" xs:type="string" columnsize="255" other:relativepath="c[0]" other:fullpath="bar.c[0]" other:dynamodatatype="S" />
<attr name="bar.c.1" xs:type="string" columnsize="255" other:relativepath="c[1]" other:fullpath="bar.c[1]" other:dynamodatatype="S" />
</api:info>
For string set types, it uses the same method of projection. Below, baz
is
of type SS
.
<api:info title="foo" other:catalog="CData" other:schema="AmazonDynamoDB" description="foo" other:tableapiname="foo" other:version="20">
<attr name="row_id" xs:type="string" columnsize="255" other:relativepath="row_id" other:fullpath="row_id" other:dynamodatatype="S" key="true" other:keytype="HASH" />
<attr name="baz" xs:type="string" columnsize="2000" other:relativepath="value" other:fullpath="value" other:dynamodatatype="SS" />
</api:info>
It splits it into three columns:
<api:info title="foo" other:catalog="CData" other:schema="AmazonDynamoDB" description="foo" other:tableapiname="foo" other:version="20">
<attr name="row_id" xs:type="string" columnsize="255" other:relativepath="row_id" other:fullpath="row_id" other:dynamodatatype="S" key="true" other:keytype="HASH" />
<attr name="baz.0" xs:type="string" columnsize="2000" other:relativepath="baz[0]" other:fullpath="baz[0]" other:dynamodatatype="S" />
<attr name="baz.1" xs:type="string" columnsize="2000" other:relativepath="baz[1]" other:fullpath="baz[1]" other:dynamodatatype="S" />
<attr name="baz.2" xs:type="string" columnsize="2000" other:relativepath="baz[2]" other:fullpath="baz[2]" other:dynamodatatype="S" />
</api:info>
If baz
contains more than three elements, it is omitted from the result set.
If baz
contains fewer than three elements, the value of the column is
NULL
where an element is not present.
Using this method, you can project into many nested attributes in DynamoDB for both maps and string sets.
SQL support#
The connector supports globally available and read operation statements to access data and metadata in DynamoDB.
Performance#
The connector includes a number of performance improvements, detailed in the following sections.
Dynamic filtering#
Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a JDBC query.
You can disable dynamic filtering by setting the dynamic-filtering.enabled
property in your catalog configuration file to false
.
Wait timeout#
By default, table scans on the connector are delayed up to 20 seconds until dynamic filters are collected from the build side of joins. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries.
You can configure the dynamic-filtering.wait-timeout
property in your
catalog properties file:
dynamic-filtering.wait-timeout=1m
You can use the dynamic_filtering_wait_timeout
catalog session
property in a specific session:
SET SESSION example.dynamic_filtering_wait_timeout = 1s;
Compaction#
The maximum size of dynamic filter predicate, that is pushed down to the
connector during table scan for a column, is configured using the
domain-compaction-threshold
property in the catalog
properties file:
domain-compaction-threshold=100
You can use the domain_compaction_threshold
catalog
session property:
SET SESSION domain_compaction_threshold = 10;
By default, domain-compaction-threshold
is set to 32
.
When the dynamic predicate for a column exceeds this threshold, it is compacted
into a single range predicate.
For example, if the dynamic filter collected for a date column dt
on the
fact table selects more than 32 days, the filtering condition is simplified from
dt IN ('2020-01-10', '2020-01-12',..., '2020-05-30')
to dt BETWEEN
'2020-01-10' AND '2020-05-30'
. Using a large threshold can result in increased
table scan overhead due to a large IN
list getting pushed down to the data
source.
Metrics#
Metrics about dynamic filtering are reported in a JMX table for each catalog:
jmx.current."io.trino.plugin.jdbc:name=example,type=dynamicfilteringstats"
Metrics include information about the total number of dynamic filters, the number of completed dynamic filters, the number of available dynamic filters and the time spent waiting for dynamic filters.
Starburst Cached Views#
The connector supports table scan redirection to improve performance and reduce load on the data source.
Security#
The connector includes a number of security-related features, detailed in the following sections.
Authenticating with AWS user credentials#
To authenticate using root or user credentials, set the following in your catalog properties, substituting the access key and secret key associated with the AWS user in your account. If you are not using root credentials, the user must have the correct policy for accessing DynamoDB.
dynamodb.aws-access-key=AKIAIOSFSJKFDFODNN7EXAMPLE
dynamodb.aws-secret-key=wJalrXUtnFEMI/Kadadf7MDENG/bPxRfiCYEXAMPLEKEY
Caution
Production use of root credentials is strongly discouraged as the root credentials have full access to your AWS account.
Authenticating from an EC2 instance#
To authenticate using the EC2 instance role, do not set access key or secret key properties. The connector automatically uses the IAM role credentials attached to the running EC2 instance and authenticate with them. Note that the EC2 instance role must have a correct policy attached for accessing DynamoDB.
Authenticating using a role#
To authenticate assuming a role, specify the role ARN in the catalog properties file. Note that an access key and secret key must be specified to authenticate the account unless running on an EC2 instance. In this case, the IAM role credentials attached to the EC2 instance is used to authenticate prior to assuming the role. If an external ID is required, specify this as well. Note the correct trust relationship for the role and the policy to access DynamoDB must be configured. Roles cannot be used when specifying the access key and secret key of the root user.
dynamodb.aws-role-arn=arn:aws:iam::account:role/role-name
# Optionally specify access key and secret key if not on EC2
dynamodb.aws-access-key=AKIAIOSFODNN7EXAMPLE
dynamodb.aws-secret-key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
# Optionally specify the external ID to assume the role
dynamodb.aws-external-id=my-external-id