Starburst Salesforce connector#
The Starburst Salesforce connector allows Dell Data Analytics Engine, powered by Starburst Enterprise platform (SEP) to query Salesforce data available in your existing Salesforce account.
Requirements#
To connect to Salesforce, you need:
Network access from the coordinator and workers to the Salesforce API endpoints.
Security token for your Salesforce account, or configured range of IP addresses for logins.
A valid Starburst Enterprise license.
Configuration#
Create the example
catalog with a catalog properties file in etc/catalog
named example.properties
(replace example
with your database name or
some other descriptive name of the catalog) with the following contents:
connector.name=salesforce
salesforce.user=mysalesforceaccount@example.com
salesforce.password=secret
salesforce.security-token=<token-from-Salesforce>
salesforce.enable-sandbox=true
Specify the connector.name
property as salesforce
. Configure the
connector using your Salesforce account username, password, and the security
token for your account. See Security for more
information on authenticating with Salesforce.
You can optionally enable the Salesforce Sandbox as shown in the last line of
the preceding example. The default value is false
.
The connector can only access Salesforce as the configured user in the catalog. If you need to access Salesforce with a different user name, configure a separate catalog.
General configuration properties#
The following table describes general catalog configuration properties for the connector:
Property name |
Description |
---|---|
|
Support case insensitive schema and table names. Defaults to |
|
Duration for which case insensitive schema and table
names are cached. Defaults to |
|
Path to a name mapping configuration file in JSON format that allows
Trino to disambiguate between schemas and tables with similar names in
different cases. Defaults to |
|
Frequency with which Trino checks the name matching configuration file
for changes. The duration value defaults to |
|
Duration for which metadata, including table and
column statistics, is cached. Defaults to |
|
Cache the fact that metadata, including table and column statistics, is
not available. Defaults to |
|
Duration for which schema metadata is cached.
Defaults to the value of |
|
Duration for which table metadata is cached.
Defaults to the value of |
|
Duration for which tables statistics are cached.
Defaults to the value of |
|
Maximum number of objects stored in the metadata cache. Defaults to |
|
Maximum number of statements in a batched execution. Do not change
this setting from the default. Non-default values may negatively
impact performance. Defaults to |
|
Push down dynamic filters into JDBC queries. Defaults to |
|
Maximum duration for which Trino waits for dynamic
filters to be collected from the build side of joins before starting a
JDBC query. Using a large timeout can potentially result in more detailed
dynamic filters. However, it can also increase latency for some queries.
Defaults to |
Warning
There is a known issue causing Salesforce catalogs to occasionally fail to list tables or views within their schemas. As a workaround, you can disable the metadata cache built-in to the JDBC driver used by the connector:
salesforce.extra-jdbc-properties=Other="cachemetadatatable=false;cachemetadatatablecolumns=false;cachemetadataschema=false;cachametadataschemas=false;";
You can enable the separate metadata cache of the connector:
metadata.cache-ttl=10m
Type mapping#
Because Trino and Salesforce each support types that the other does not, this connector modifies some types when reading data.
Salesforce to Trino type mapping#
The connector maps Salesforce types to the corresponding Trino types according to the following table:
Salesforce type |
Trino type |
---|---|
Auto Number |
VARCHAR |
Lookup Relationship |
VARCHAR |
Master-Detail Relationship |
VARCHAR |
External Lookup Relationship |
VARCHAR |
Checkbox |
VARCHAR |
Currency |
DECIMAL |
Date |
DATE |
Date/Time |
TIMESTAMP |
VARCHAR |
|
Geolocation |
VARCHAR |
Number |
DOUBLE |
Percent |
DOUBLE |
Phone per catalog |
VARCHAR |
Picklist |
VARCHAR |
Picklist (Multi-Select) |
VARCHAR |
Text |
VARCHAR |
Text Area |
VARCHAR |
Time |
TIME |
URL |
VARCHAR |
No other typs are supported.
Decimal type handling#
Salesforce Currency
data types are mapped to SEP DECIMAL
types with
the same precision and scale as the Salesforce type. Values that require
rounding or truncation to fit cause a failure at runtime, but the connector can
be configured to allow rounding and truncation by setting the
decimal-mapping
catalog configuration property to allow_overflow
and the
decimal-rounding-mode
catalog configuration property to one of the
following:
UNNECESSARY
(default)UP
DOWN
CEILING
FLOOR
HALF_UP
HALF_DOWN
HALF_EVEN
You can change these settings for just the current session with the
decimal_mapping
and decimal_rounding_mode
catalog session properties.
SQL support#
The connector supports globally available and read operation statements to access data and metadata in Salesforce.
Performance#
System information#
The Salesforce connector uses the Salesforce API to access data and metadata. Salesforce limits the number of API calls for organizations. If the limits are reached, query processing fails.
The connector provides API limit and current usage numbers in the limits
table in the system
schema. You can use this information for monitoring your
usage to ensure a limit is not exceeded. The following query returns the data
from your example
catalog:
SELECT * FROM example.system.limits;
Dynamic filtering#
Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a JDBC query.
You can disable dynamic filtering by setting the dynamic-filtering.enabled
property in your catalog configuration file to false
.
Wait timeout#
By default, table scans on the connector are delayed up to 20 seconds until dynamic filters are collected from the build side of joins. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries.
You can configure the dynamic-filtering.wait-timeout
property in your
catalog properties file:
dynamic-filtering.wait-timeout=1m
You can use the dynamic_filtering_wait_timeout
catalog session property in a specific session:
SET SESSION example.dynamic_filtering_wait_timeout = 1s;
Compaction#
The maximum size of dynamic filter predicate, that is pushed down to the
connector during table scan for a column, is configured using the
domain-compaction-threshold
property in the catalog
properties file:
domain-compaction-threshold=100
You can use the domain_compaction_threshold
catalog
session property:
SET SESSION domain_compaction_threshold = 10;
By default, domain-compaction-threshold
is set to 32
.
When the dynamic predicate for a column exceeds this threshold, it is compacted
into a single range predicate.
For example, if the dynamic filter collected for a date column dt
on the
fact table selects more than 32 days, the filtering condition is simplified from
dt IN ('2020-01-10', '2020-01-12',..., '2020-05-30')
to dt BETWEEN '2020-01-10' AND '2020-05-30'
. Using a large threshold can result in increased
table scan overhead due to a large IN
list getting pushed down to the data
source.
Metrics#
Metrics about dynamic filtering are reported in a JMX table for each catalog:
jmx.current."io.trino.plugin.jdbc:name=example,type=dynamicfilteringstats"
Metrics include information about the total number of dynamic filters, the number of completed dynamic filters, the number of available dynamic filters and the time spent waiting for dynamic filters.
Table scan redirection#
The connector supports table scan redirection to improve performance and reduce load on the data source.
This is particularly recommended for the Salesforce connector as Salesforce has API limits, which can cause queries to fail.
Security#
The connector includes a number of security-related features, detailed in the following sections.
Password authentication#
The connector supports authenticating via a user name. Use the following properties in the catalog properties file to configure it.
salesforce.authentication.type=PASSWORD
salesforce.user=mysalesforceaccount@example.com
salesforce.password=secret
salesforce.security-token=<token-from-Salesforce>
salesforce.enable-sandbox=true
The default value of salesforce.authentication.type
is PASSWORD
.
The security token is generated by Salesforce and emailed to you. To generate an access token, at the top navigation bar in Salesforce, go to <your name> > My Settings > Personal > Reset My Security Token. Note that whenever the user’s password changes, including regularly scheduled password expiration, the security token also changes and requires the connector to be reconfigured using the new password and token.
You can use a configured range of IP addresses approved for login as an
alternative to the security token. In this case, do not specify
salesforce.security-token
in the catalog file. You must correctly configure
the IP ranges for the coordinator and all worker nodes in Salesforce’s
Security Controls > Network Access page for your organization.
OAuth JWT authentication#
The connector supports authenticating using OAuth 2.0. This method requires setup in Salesforce, but unlike password authentication it does not expire until access is explicitly revoked by Salesforce. Use the following properties in the catalog properties file to configure this method.
salesforce.authentication.type=OAUTH_JWT
salesforce.oauth.pkcs12-path=/path/to/salesforce-ca.p12
salesforce.oauth.pkcs12-password=pkcs12-password
salesforce.oauth.jwt-issuer=<salesforce connected app consumer key>
salesforce.oauth.jwt-subject=<salesforce user name or email address>
The trust is set up on the Salesforce side using a certificate. At a high level, you create a PKCS12 archive and certificate, then create an OAuth-enabled Connected App in Salesforce that uses that certificate. You then give permission to the user to use that Connected App and configure the Salesforce connector to use the PKCS12 archive and user name. This sets up a trust that never expires until it is explicitly revoked by a user in the Salesforce UI.
An example of setting up this configuration is as follows:
Create a secure password and export it as the
PKCS12_PASS
environment variable. Keep this password, as it is necessary to configure the PKCS12 arvhive in the catalog configuration.
export PKCS12_PASS=averysecurepassword
Create a PKCS12 archive named
salesforce-ca.p12
.
keytool -genkeypair -v \
-alias salesforce-ca \
-dname "CN=Starburst Salesforce Connector" \
-storetype PKCS12 \
-keystore salesforce-ca.p12 \
-keypass:env PKCS12_PASS \
-storepass:env PKCS12_PASS \
-keyalg RSA \
-keysize 4096 \
-ext KeyUsage:critical="keyCertSign" \
-ext BasicConstraints:critical="ca:true" \
-validity 9999
Export the certificate to
salesforce-ca.crt
.
keytool -export -v \
-alias salesforce-ca \
-file salesforce-ca.crt \
-keypass:env PKCS12_PASS \
-storepass:env PKCS12_PASS \
-keystore salesforce-ca.p12 \
-rfc
Log in to Salesforce using a username and password that has authorization to create a Connected App. Select the gear icon in the upper right and select Setup, then Apps -> App Manager, then New Connected App in the upper right. Enter a name for the Connected App, such as SEP Salesforce Connector and provide a contact email address. A Callback URL is required but unused by this method; enter a valid URL, such as
http://localhost
.Select Enable OAuth Settings and Use digital signatures, uploading
salesforce-ca.crt
to the site. For Selected OAuth Scopes use Access content resources (content) at a minimum. Copy and keep the resulting Consumer Key, which is needed to configure the connector.Create a new Permission Set. Navigate to Users -> Permissions Sets and select New. Enter a label, such as SEP Salesforce Connector Permission Set and select Save. Select Assigned Connected Apps then Edit. Add the connected app name created in step 4.
Add the Permission Set to the User. Select Users and locate the username you want to use to authenticate with Salesforce. When found, click to select that username. Select Permission Set Assignments then Edit Assignments. Add the permission set to the username and save.
This concludes the Salesforce part of the setup procedure.
Configure the Salesforce catalog to enable OAuth 2.0 JWT. You need the PKCS12 archive’s password, the Consumer Key from Salesforce, and the user’s email address login.
salesforce.authentication.type=OAUTH_JWT
salesforce.oauth.pkcs12-path=/path/to/salesforce-ca.p12
salesforce.oauth.pkcs12-password=pkcs12-password
salesforce.oauth.jwt-issuer=<Salesforce connected app consumer key>
salesforce.oauth.jwt-subject=<Salesforce user name or email address>