Cache service CLI#

The Dell Data Analytics Engine, powered by Starburst Enterprise platform (SEP) cache service command line interface (CLI) provides a terminal-based interface for listing and configuring redirections managed by the cache service.

Requirements#

The cache service CLI requires a Java virtual machine available on the path. It can be used with Java version 17 and higher. It is a self-executing JAR file, which means it acts like a normal UNIX executable.

Installation#

  • To download the cache service CLI file, contact Starburst Support

  • Support provides access to a JAR file named starburst-cache-cli-nnn.jar where nnn is the version number.

  • Rename this file to cache-cli

  • Make it executable with chmod +x cache-cli

  • Place it in a directory on the PATH, such as ~/bin, or /usr/local/bin

cache-cli current_redirection --server localhost:8180 --source=mysql.test.orders

Run the CLI with the --help option to see the available options.

Authentication#

You can override your username with the --user option. It defaults to your operating system username. If your cache service requires password authentication, use the --password option to have the CLI prompt for a password. You can set the CACHE_SERVICE_PASSWORD environment variable with the password value to avoid the prompt.

HTTP client properties, such as keystore and truststore, can be specified through a configuration file that can be passed to the CLI with the --properties option.

For example, when the cache service is running in HTTPS mode, the CLI can be configured to connect to it by adding the below properties to a file and providing it’s path in the --properties option to the CLI.

http-client.trust-store-path=etc/localhost.truststore
http-client.trust-store-password=changeit
http-client.https.hostname-verification=false
cache-cli current_redirection --server https://localhost:8543 --source=mysql.test.orders --properties=etc/http-client-config.properties --user test --password

Output formats#

The cache service CLI provides the option --output-format to control how the output is displayed when running in non-interactive mode. The available options shown in the following table must be entered in uppercase. The default value is ALIGNED.

Output format options#

Option

Description

JSON

Output rows emitted as JSON objects with name-value pairs.

ALIGNED

Output emitted as an ASCII character table with values.

Examples#

Consider the following command run as shown, or with --output-format ALIGNED:

cache-cli current_redirection --source=mysql.test.nation_tmp

The output is as follows:

 id | status | target_catalog | target_schema |                  target_table                   |       create_time        |  ttl  | grace_period | columns | partition_columns | bucket_columns | bucket_count | sort_columns | incremental_column | predicate | delete_predicate | delete_error_message
----+--------+----------------+---------------+-------------------------------------------------+--------------------------+-------+--------------+---------+-------------------+----------------+--------------+--------------+--------------------+-----------+------------------+----------------------
 1  | ACTIVE | hive           | cache         | nation_tmp_45d616e8_d662_447f_acc2_db2ddb19e14f | 2021-04-12T05:28:54.357Z | 1.50h | 15.00m       | *       | regionkey         |                |              |              |                    |           |                  |
(1 row)

The output with --output-format JSON is:

{"id":"1","status":"ACTIVE","target_catalog":"hive","target_schema":"cache","target_table":"nation_tmp_45d616e8_d662_447f_acc2_db2ddb19e14f","create_time":"2021-04-12T05:28:54.357Z","ttl":"1.50h","grace_period":"15.00m","columns":"*","partition_columns":"regionkey","bucket_columns":"","bucket_count":"","sort_columns":"","incremental_column":"","predicate":"","delete_predicate":"","delete_error_message":""}

Commands#

The cache service CLI implements the following sub-commands for viewing and adding cached tables.

Current redirection#

The current_redirection command shows whether the specified source table has a valid redirection. The source table is provided through the --source option in the format <catalog>.<schema>.<table>:

cache-cli current_redirection --source=mysql.test.nation_tmp

The output either shows a valid redirection or 0 rows as follows:

 id | status | target_catalog | target_schema |                  target_table                   |       create_time        |  ttl  | grace_period | columns | partition_columns | bucket_columns | bucket_count | sort_columns | incremental_column | predicate | delete_predicate | delete_error_message
----+--------+----------------+---------------+-------------------------------------------------+--------------------------+-------+--------------+---------+-------------------+----------------+--------------+--------------+--------------------+-----------+------------------+----------------------
 1  | ACTIVE | hive           | cache         | nation_tmp_45d616e8_d662_447f_acc2_db2ddb19e14f | 2021-04-12T05:28:54.357Z | 1.50h | 15.00m       | *       | regionkey         |                |              |              |                    |           |                  |
(1 row)

 id | status | target_catalog | target_schema | target_table | create_time | ttl | grace_period | columns | partition_columns | bucket_columns | bucket_count | sort_columns | incremental_column | predicate | delete_predicate | delete_error_message
----+--------+----------------+---------------+--------------+-------------+-----+--------------+---------+-------------------+----------------+--------------+--------------+--------------------+-----------+------------------+----------------------
(0 rows)

List redirections#

The list_redirections command shows the list of cached tables for the specified source table. The source table is provided through the --source option in the format <catalog>.<schema>.<table>:

cache-cli list_redirections --source=mysql.test.nation_tmp

The output is as follows:

 id | status  | target_catalog | target_schema |                  target_table                   |       create_time        |  ttl  | grace_period | columns | partition_columns | bucket_columns | bucket_count | sort_columns | incremental_column | predicate | delete_predicate | delete_error_message
----+---------+----------------+---------------+-------------------------------------------------+--------------------------+-------+--------------+---------+-------------------+----------------+--------------+--------------+--------------------+-----------+------------------+----------------------
 1  | EXPIRED | hive           | cache         | nation_tmp_86bd8195_d0f0_497d_b0f4_b8562e4d2256 | 2021-03-10T15:21:36.142Z | 7.67m | 15.00m       | foo     | regionkey         |                |              |              |                    |           |                  |
 2  | EXPIRED | hive           | cache         | nation_tmp_1eb57b45_20e5_4940_9724_c0f781cd539a | 2021-03-10T15:26:36.142Z | 7.67m | 15.00m       | *       | regionkey         |                |              |              |                    |           |                  |
 3  | ACTIVE  | hive           | cache         | nation_tmp_b869f396_5798_4f06_b943_0f6ff5b40855 | 2021-03-10T15:31:36.142Z | 7.67m | 15.00m       | *       | regionkey         |                |              |              |                    |           |                  |
 4  | ACTIVE  | hive           | cache         | nation_tmp_8e10ccca_fdc8_4a47_ac7a_48da2012cf30 | 2021-03-10T15:36:36.142Z | 7.67m | 15.00m       | *       | regionkey         |                |              |              |                    |           |                  |
(4 rows)

List imports#

The list_imports command shows the list of import jobs for the redirection corresponding to a given identifier. The column id from the output of list_redirections or current_redirection commands can be used to identify a redirection for expiry. The column error_message shows the reasons for any failure encountered while trying to create or populate the cached table:

cache-cli list_imports --id=1

The output for an bulk loaded cached table is as follows:

 id | redirection_id |      scheduled_time      | max_duration |        start_time        | finish_time | row_count |                                                                                                                                         error_message
----+----------------+--------------------------+--------------+--------------------------+-------------+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 4  | 1              | 2021-03-10T15:50:45.432Z | 30.00m       | 2021-03-10T15:50:45.537Z |             |           | java.sql.SQLException: Query failed (#20210310_155047_01264_cbv62): line 1:83: Column 'foo' cannot be resolved [statement:"CREATE TABLE hive.cache.nation_tmp_ba1b66f2_3be8_42dc_a955_e42a6db3ac7b  AS SELECT foo FROM mysql.test.nation_tmp", arguments:{positional:{}, named:{}, finder:[]}]
(1 row)

The output for an incrementally cached table is as follows:

 id | redirection_id |      scheduled_time      | max_duration |        start_time        |       finish_time        | row_count | error_message
----+----------------+--------------------------+--------------+--------------------------+--------------------------+-----------+---------------
 1  | 1              | 2021-03-10T14:31:45.257Z | 1.06m        | 2021-03-10T14:31:49.124Z | 2021-03-10T14:31:50.909Z | 10        |
 2  | 1              | 2021-03-10T14:33:45.257Z | 1.06m        | 2021-03-10T14:33:48.876Z | 2021-03-10T14:33:51.692Z | 10        |
 3  | 1              | 2021-03-10T14:35:45.257Z | 1.06m        | 2021-03-10T14:35:48.783Z | 2021-03-10T14:35:51.118Z | 5         |
 4  | 1              | 2021-03-10T14:37:45.257Z | 1.06m        | 2021-03-10T14:37:48.716Z | 2021-03-10T14:37:51.532Z | 0         |
(4 rows)

Add redirection#

The cache command adds caching of a given source table into the target catalog and schema. The cache service automatically generates a name for the target table with a random suffix. The cached table is bulk loaded from the specified source table. Incrementally loaded cached tables can be defined only through a JSON file. The cached table is used to redirect table scans on source tables for a time interval specified by the --cache-ttl option:

cache-cli cache --source=mysql.test.nation_tmp --target-catalog=hive --target-schema=cache --cache-ttl=1h

The output shows the target table as follows:

 catalog_name | schema_name |                   table_name
--------------+-------------+-------------------------------------------------
 hive         | cache       | nation_tmp_9be03071_e197_4a8e_8e4e_46a44a32096b

The grace period is 15m by default. It can be extended using the --grace-period option based on the maximum duration that the queries are expected to run.

Optional cache command options#

Option name

Description

Default

allow-multiple-imports

Allow parallel imports for the same source table. By default another command does not start another import, if one is already running for the same source table. This is useful to force another import, for example when an existing import seems to be stalled.

false

grace-period

The service removes cached tables when they are no longer needed, such as when a newer redirection is present or ttl expired. The service waits for grace-period before removing the cached table. This allows any running queries which started just before the cached table is expired to finish gracefully.

15m

max-import-duration

Maximum allowed execution time for cached table import job

30m

columns

Comma separated list of columns to be cached from source table. All columns are cached by default.

partition-columns

Comma separated list of columns for partitioning the data in the target table

bucket-columns

Comma separated list of columns for bucketing the data in the target table (requires bucket count)

bucket-count

Number of buckets in target table when bucket columns are specified

sort-columns

Comma separated list of columns for sorting the data in the target table (requires table bucketing)

use-preferred-write-partitioning

Use table partitioning to parallelize writes between worker nodes. This reduces import memory usage and improves cached table file sizes.

true

preferred-write-partitioning-min-number-of-partitions

The minimum number of written partitions that is required to use connector preferred write partitioning

1

writer-count

Number of writers per task when writing table

4

scale-writers

Scale writers when writing table

false

use-server-import-config

Use caching server table import configuration. The cache server has different default import configuration for partitioned and non-partitioned tables.

true

Incremental import#

The incremental_import command schedules an incremental import for the redirection corresponding to a given identifier. The redirection selected for incremental import must be configured for incremental imports through a JSON file. The cached table is used to redirect table scans on a source table for a time interval specified by the --freshness-duration option:

cache-cli incremental_import --id=1 --freshness-duration=1h

The output shows the import id if the import is successfully scheduled:

Incremental import id (3)
Optional incremental_import command options#

Option name

Description

Default

max-import-duration

Maximum allowed execution time for a cached table import job

30m

use-preferred-write-partitioning

Use table partitioning to parallelize writes using multiple worker nodes. This reduces import memory usage and improves cached table file sizes.

true

preferred-write-partitioning-min-number-of-partitions

The minimum number of written partitions that is required to use connector preferred write partitioning

1

writer-count

Number of writers per task when writing table

4

scale-writers

Scale writers when writing table

false

use-server-import-config

Use caching server table import configuration. The cache server has different default import configuration for partitioned and non-partitioned tables.

true

Expire redirections#

The expire_redirection command expires the redirection corresponding to a given identifier. The redirection grace period is still honored. Therefore queries that use cached tables do not fail. The column id from the output of list_redirections or current_redirection commands can be used to identify a redirection for expiry.

cache-cli expire_redirection --id 1

The output is either Success or an error message containing the reason for failure.