Cache service CLI#
The Dell Data Analytics Engine, powered by Starburst Enterprise platform (SEP) cache service command line interface (CLI) provides a terminal-based interface for listing and configuring redirections managed by the cache service.
Requirements#
The cache service CLI requires a Java virtual machine available on the path. It can be used with Java version 17 and higher. It is a self-executing JAR file, which means it acts like a normal UNIX executable.
Installation#
To download the cache service CLI file, contact Starburst Support
Support provides access to a JAR file named
starburst-cache-cli-nnn.jar
where nnn is the version number.Rename this file to
cache-cli
Make it executable with
chmod +x cache-cli
Place it in a directory on the
PATH
, such as~/bin
, or/usr/local/bin
cache-cli current_redirection --server localhost:8180 --source=mysql.test.orders
Run the CLI with the --help
option to see the available options.
Authentication#
You can override your username with the --user
option. It defaults to your
operating system username. If your cache service requires password
authentication, use the --password
option to have the CLI prompt for a
password. You can set the CACHE_SERVICE_PASSWORD
environment variable with
the password value to avoid the prompt.
HTTP client properties, such as keystore and truststore, can be specified
through a configuration file that can be passed to the CLI with the
--properties
option.
For example, when the cache service is running in HTTPS mode, the CLI can be
configured to connect to it by adding the below properties to a file and
providing it’s path in the --properties
option to the CLI.
http-client.trust-store-path=etc/localhost.truststore
http-client.trust-store-password=changeit
http-client.https.hostname-verification=false
cache-cli current_redirection --server https://localhost:8543 --source=mysql.test.orders --properties=etc/http-client-config.properties --user test --password
Output formats#
The cache service CLI provides the option --output-format
to control how
the output is displayed when running in non-interactive mode. The available
options shown in the following table must be entered in uppercase.
The default value is ALIGNED
.
Option |
Description |
---|---|
|
Output rows emitted as JSON objects with name-value pairs. |
|
Output emitted as an ASCII character table with values. |
Examples#
Consider the following command run as shown, or with --output-format
ALIGNED
:
cache-cli current_redirection --source=mysql.test.nation_tmp
The output is as follows:
id | status | target_catalog | target_schema | target_table | create_time | ttl | grace_period | columns | partition_columns | bucket_columns | bucket_count | sort_columns | incremental_column | predicate | delete_predicate | delete_error_message
----+--------+----------------+---------------+-------------------------------------------------+--------------------------+-------+--------------+---------+-------------------+----------------+--------------+--------------+--------------------+-----------+------------------+----------------------
1 | ACTIVE | hive | cache | nation_tmp_45d616e8_d662_447f_acc2_db2ddb19e14f | 2021-04-12T05:28:54.357Z | 1.50h | 15.00m | * | regionkey | | | | | | |
(1 row)
The output with --output-format JSON
is:
{"id":"1","status":"ACTIVE","target_catalog":"hive","target_schema":"cache","target_table":"nation_tmp_45d616e8_d662_447f_acc2_db2ddb19e14f","create_time":"2021-04-12T05:28:54.357Z","ttl":"1.50h","grace_period":"15.00m","columns":"*","partition_columns":"regionkey","bucket_columns":"","bucket_count":"","sort_columns":"","incremental_column":"","predicate":"","delete_predicate":"","delete_error_message":""}
Commands#
The cache service CLI implements the following sub-commands for viewing and adding cached tables.
Current redirection#
The current_redirection
command shows whether the specified source table has
a valid redirection. The source table is provided through the --source
option in the format <catalog>.<schema>.<table>
:
cache-cli current_redirection --source=mysql.test.nation_tmp
The output either shows a valid redirection or 0 rows
as follows:
id | status | target_catalog | target_schema | target_table | create_time | ttl | grace_period | columns | partition_columns | bucket_columns | bucket_count | sort_columns | incremental_column | predicate | delete_predicate | delete_error_message
----+--------+----------------+---------------+-------------------------------------------------+--------------------------+-------+--------------+---------+-------------------+----------------+--------------+--------------+--------------------+-----------+------------------+----------------------
1 | ACTIVE | hive | cache | nation_tmp_45d616e8_d662_447f_acc2_db2ddb19e14f | 2021-04-12T05:28:54.357Z | 1.50h | 15.00m | * | regionkey | | | | | | |
(1 row)
id | status | target_catalog | target_schema | target_table | create_time | ttl | grace_period | columns | partition_columns | bucket_columns | bucket_count | sort_columns | incremental_column | predicate | delete_predicate | delete_error_message
----+--------+----------------+---------------+--------------+-------------+-----+--------------+---------+-------------------+----------------+--------------+--------------+--------------------+-----------+------------------+----------------------
(0 rows)
List redirections#
The list_redirections
command shows the list of cached tables for the
specified source table. The source table is provided through the --source
option in the format <catalog>.<schema>.<table>
:
cache-cli list_redirections --source=mysql.test.nation_tmp
The output is as follows:
id | status | target_catalog | target_schema | target_table | create_time | ttl | grace_period | columns | partition_columns | bucket_columns | bucket_count | sort_columns | incremental_column | predicate | delete_predicate | delete_error_message
----+---------+----------------+---------------+-------------------------------------------------+--------------------------+-------+--------------+---------+-------------------+----------------+--------------+--------------+--------------------+-----------+------------------+----------------------
1 | EXPIRED | hive | cache | nation_tmp_86bd8195_d0f0_497d_b0f4_b8562e4d2256 | 2021-03-10T15:21:36.142Z | 7.67m | 15.00m | foo | regionkey | | | | | | |
2 | EXPIRED | hive | cache | nation_tmp_1eb57b45_20e5_4940_9724_c0f781cd539a | 2021-03-10T15:26:36.142Z | 7.67m | 15.00m | * | regionkey | | | | | | |
3 | ACTIVE | hive | cache | nation_tmp_b869f396_5798_4f06_b943_0f6ff5b40855 | 2021-03-10T15:31:36.142Z | 7.67m | 15.00m | * | regionkey | | | | | | |
4 | ACTIVE | hive | cache | nation_tmp_8e10ccca_fdc8_4a47_ac7a_48da2012cf30 | 2021-03-10T15:36:36.142Z | 7.67m | 15.00m | * | regionkey | | | | | | |
(4 rows)
List imports#
The list_imports
command shows the list of import jobs for the redirection
corresponding to a given identifier. The column id
from the output of
list_redirections
or current_redirection
commands can be used to
identify a redirection for expiry. The column error_message
shows the
reasons for any failure encountered while trying to create or populate the
cached table:
cache-cli list_imports --id=1
The output for an bulk loaded cached table is as follows:
id | redirection_id | scheduled_time | max_duration | start_time | finish_time | row_count | error_message
----+----------------+--------------------------+--------------+--------------------------+-------------+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
4 | 1 | 2021-03-10T15:50:45.432Z | 30.00m | 2021-03-10T15:50:45.537Z | | | java.sql.SQLException: Query failed (#20210310_155047_01264_cbv62): line 1:83: Column 'foo' cannot be resolved [statement:"CREATE TABLE hive.cache.nation_tmp_ba1b66f2_3be8_42dc_a955_e42a6db3ac7b AS SELECT foo FROM mysql.test.nation_tmp", arguments:{positional:{}, named:{}, finder:[]}]
(1 row)
The output for an incrementally cached table is as follows:
id | redirection_id | scheduled_time | max_duration | start_time | finish_time | row_count | error_message
----+----------------+--------------------------+--------------+--------------------------+--------------------------+-----------+---------------
1 | 1 | 2021-03-10T14:31:45.257Z | 1.06m | 2021-03-10T14:31:49.124Z | 2021-03-10T14:31:50.909Z | 10 |
2 | 1 | 2021-03-10T14:33:45.257Z | 1.06m | 2021-03-10T14:33:48.876Z | 2021-03-10T14:33:51.692Z | 10 |
3 | 1 | 2021-03-10T14:35:45.257Z | 1.06m | 2021-03-10T14:35:48.783Z | 2021-03-10T14:35:51.118Z | 5 |
4 | 1 | 2021-03-10T14:37:45.257Z | 1.06m | 2021-03-10T14:37:48.716Z | 2021-03-10T14:37:51.532Z | 0 |
(4 rows)
Add redirection#
The cache
command adds caching of a given source table into the target
catalog and schema. The cache service automatically generates a name for the
target table with a random suffix. The cached table is bulk loaded from the
specified source table. Incrementally loaded cached tables can be defined only
through a JSON file. The cached table is
used to redirect table scans on source tables for a time interval specified by
the --cache-ttl
option:
cache-cli cache --source=mysql.test.nation_tmp --target-catalog=hive --target-schema=cache --cache-ttl=1h
The output shows the target table as follows:
catalog_name | schema_name | table_name
--------------+-------------+-------------------------------------------------
hive | cache | nation_tmp_9be03071_e197_4a8e_8e4e_46a44a32096b
The grace period is 15m by default. It can be extended using the
--grace-period
option based on the maximum duration that the queries are
expected to run.
Option name |
Description |
Default |
---|---|---|
|
Allow parallel imports for the same source table. By default another command does not start another import, if one is already running for the same source table. This is useful to force another import, for example when an existing import seems to be stalled. |
false |
|
The service removes cached tables when they are no longer needed, such as
when a newer redirection is present or ttl expired. The service waits for
|
15m |
|
Maximum allowed execution time for cached table import job |
30m |
|
Comma separated list of columns to be cached from source table. All columns are cached by default. |
|
|
Comma separated list of columns for partitioning the data in the target table |
|
|
Comma separated list of columns for bucketing the data in the target table (requires bucket count) |
|
|
Number of buckets in target table when bucket columns are specified |
|
|
Comma separated list of columns for sorting the data in the target table (requires table bucketing) |
|
|
Use table partitioning to parallelize writes between worker nodes. This reduces import memory usage and improves cached table file sizes. |
true |
|
The minimum number of written partitions that is required to use connector
|
1 |
|
Number of writers per task when writing table |
4 |
|
Scale writers when writing table |
false |
|
Use caching server table import configuration. The cache server has different default import configuration for partitioned and non-partitioned tables. |
true |
Incremental import#
The incremental_import
command schedules an incremental import for the
redirection corresponding to a given identifier. The redirection selected for
incremental import must be configured for incremental imports through a
JSON file. The cached table is used to
redirect table scans on a source table for a time interval specified by the
--freshness-duration
option:
cache-cli incremental_import --id=1 --freshness-duration=1h
The output shows the import id if the import is successfully scheduled:
Incremental import id (3)
Option name |
Description |
Default |
---|---|---|
|
Maximum allowed execution time for a cached table import job |
30m |
|
Use table partitioning to parallelize writes using multiple worker nodes. This reduces import memory usage and improves cached table file sizes. |
true |
|
The minimum number of written partitions that is required to use connector
|
1 |
|
Number of writers per task when writing table |
4 |
|
Scale writers when writing table |
false |
|
Use caching server table import configuration. The cache server has different default import configuration for partitioned and non-partitioned tables. |
true |
Expire redirections#
The expire_redirection
command expires the redirection corresponding
to a given identifier. The redirection grace period is still honored. Therefore
queries that use cached tables do not fail. The column id
from the output of
list_redirections
or current_redirection
commands can be used to
identify a redirection for expiry.
cache-cli expire_redirection --id 1
The output is either Success
or an error message containing the reason
for failure.