Configure the cache service in Kubernetes#
This topic covers configuring the Starburst cache service after you completed a basic install of Starburst Enterprise. If you have not yet completed that, go to our installation guide.
The starburst-cache-service
Helm chart configures a standalone
Cache service to use with Starburst Cached Views.
We strongly suggest that you follow best practices to customize your cluster. Creating small, targeted files to override any defaults and add any configuration properties. There is an example file set in our deployment guide that describes the recommended way to manage your customizations.
You must configure the following to use the Starburst cache service:
The cache service
Dell Data Analytics Engine, powered by Starburst Enterprise platform (SEP), to use the cache service
The SEP backend service
Make sure that the cache service is available before configuring and restarting SEP to use it.
Requirements#
Externally managed, compatible relational database.
Full access to a dedicated schema on the database with username and password credentials.
Network access on the configured port between the database and the cache service container in the Kubernetes cluster.
Configure and start the cache service#
Get the latest starburst-cache-service
Helm chart as described in our
installation guide.
There are several top-level nodes in the cache service Helm chart that you must modify for a minimum cache service configuration.
config
resources
database
expose
For more information on configuring these nodes, see the YAML file properties section.
As with SEP, we strongly suggest that you initially deploy the cache service with the minimum configuration described in this topic, and ensure that it deploys and is accessible before making any additional customizations described in this documentation.
Note
Store customizations in a separate file containing only changed values as
recommended in our best practices guide. In
this topic, for example, customizations are stored in a file named
cache-service-values.yaml
that is used in the helm upgrade
command.
To deploy or update the cache service with any configuration changes, run the
helm upgrade
command with the updated YAML files and the --install
switch. For example:
$ helm upgrade my-caching-service starburstdata/starburst-cache-service \
--install \
--version 443.15.0 \
--values ./registry-access.yaml
--values ./cache-service-prod.yaml
When you update the cache service, you can use the same command that you use to
upgrade to a new release. Helm compares all --values
files and the version,
and safely ignores any that are unchanged.
YAML file properties#
The nodes included in the values.yaml
file are described in the following
table.
Node name |
Description |
---|---|
|
Contains the details for the cache service Docker image. Review our
best practices for managing registry
access across all Starburst products. NOTE: The |
|
Contains the secret to be mounted under the specified
|
|
Specifies the configuration properties for the cache service
under |
|
Defines authentication details for Docker registry access. Typically, you
need to use your username and password for the Starburst Harbor instance. Cannot
be used if using |
|
Alternative authentication for selected Docker registry using secrets.
Cannot be used if using |
|
The CPU and memory resources to use for the cache service. Request and limit values should be identical. These settings can be adjusted to match your workload and available node sizes in the cluster. |
|
Defines the mechanisms and their options that expose the cache service to
an outside network. |
|
Defines the database backend for the cache service. Defaults to
|
|
Allows for the propagation of environment variables from different sources complying with the K8S schema specification. This can be used to deliver values to the cache service configuration properties files by creating a Kubernetes secret holding variable values. |
|
Allows to define additional environment variables for the cache service. |
|
Configuration to allow Kubernetes to determine the node and pod to use. These nodes are left empty by default. |
|
Defines common labels to identify all cache service objects in a KRM to use with the kustomize utility and other tools. |
image
#
The following are the default values for the cache service
image
top level node in the values.yaml
file:
image:
repository: "starburstdata/starburst-cache-service"
tag: "443-e.15"
pullPolicy: "IfNotPresent"
keystore
#
The keystore
section of the Helm chart configures keystore settings for the
cache service.
keystore:
localFileLocation: null
podFileLocation: null
Property name |
Description |
---|---|
|
Specifies the local path of the keystore file. |
|
Specifies the path in the pod where the keystore file is mounted. You can
provide a file path for a new folder or use an existing folder in the root
directory such as |
Note
When you create a secret for the keystore file, you must name the secret
keystore-configuration
for the cache service pod to recognize it.
config
#
The defaults for nodes nested under config:
are described below.
config.properties
#
This node specifies configuration properties and their values for the cache service. You must define the service accounts and locators to be used by the cache service:
config:
config.properties: |
service-database.user=alice
service-database.password=test123
service-database.jdbc-url=jdbc:mysql://mysql-server:13306/cachesvc
starburst.user=bob
starburst.jdbc-url=jdbc:trino://coordinator:8080
rules.file=etc/rules.json
config:jvm.config
#
This node specifies the command line configuration options for starting the Java Virtual Machine (JVM) used by the cache
service. The following are the default values for the cache service
config:jvm.config:
node in the values.yaml
file:
config:
jvm.config: |
-server
--add-opens=java-base/sun.nio.ch=ALL-UNNAMED
--add-opens=java-base/java.nio=ALL-UNNAMED
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-XX:ReservedCodeCacheSize=512M
-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000
-XX:+UnlockDiagnosticVMOptions
-XX:+UseAESCTRIntrinsics
-XX:InitialRAMPercentage=80
-XX:MaxRAMPercentage=80
-Djdk.nio.maxCachedBufferSize=2000000
-Djdk.attach.allowAttachSelf=true
log.properties
#
This optional node specifies the logging configuration for the cache service. The following are the
default values for the cache service log.properties:
node in the
values.yaml
file:
config:
jvm.config: |
io.airlift=INFO
rules.json
#
The rules.json
node is specific to table scan redirections. It specifies the
source tables and target connector for the cache, along with the schedule for
refreshing them. The following are the default values for the cache service
rules.json:
node in the values.yaml
file:
config:
rules.json: |
{
"rules": []
}
The following example demonstrates how to implement cache service refresh rules in this node by adding the JSON file content in a multi-line segment:
config:
rules.json: |
{
"defaultGracePeriod": "42m",
"defaultMaxImportDuration": "1m",
"defaultCacheCatalog": "default_cache_catalog",
"defaultCacheSchema": "default_cache_schema",
"rules": [
{
"catalogName": "mysql",
"schemaName": "marketing",
"tableName": "events",
"refreshInterval": "2m",
"gracePeriod": "15m",
"incrementalColumn": "event_id",
"deletePredicate": "event_date < date_add('day', -31, CURRENT_DATE)"
}
]
}
type-mapping.json
#
The type-mapping.json
node specifies type mapping rules between source and target catalogs. This
node is not included in the values.yaml
file by default. The following
example maps three different timestamp types to TIMESTAMP(3)
in the target:
config:
type-mapping.json: |
{
"rules": {
"tpch": {
"integer": "long"
},
"hive": {
"timestamp(0)": "timestamp(3)",
"timestamp(1)": "timestamp(3)",
"timestamp(2)": "timestamp(3)"
}
}
}
registryCredentials
#
The following are the default values for the cache service
registryCredentials:
top level node in the values.yaml
file:
registryCredentials:
enabled: false
registry:
username:
password:
imagePullSecrets
#
Instead of setting registryCredentials you can pass a list of secrets in the following format. This feature is disabled by default:
# imagePullSecrets:
# - name: secret1
# - name: secret2
imagePullSecrets:
resources
#
The following values must be defined in the resources
node of the cache
service Helm chart:
CPU resources for requests and limits: The defaults are sufficient for most environments; however, they must work with the instance type you are using.
Memory resources for requests and limits: The defaults are sufficient for most environments; however, they must work with the instance type you are using.
The following are the default values for the cache service resources
top
level node in the values.yaml
file:
resources:
requests:
memory: 2Gi
cpu: 0.5
limits:
memory: 2Gi
cpu: 4
expose
#
You must expose the service to allow it to connect to the SEP coordinator, and
to reach it with tools such as the cache service CLI. This service-type configuration is defined by the
expose:
top level node. You can choose from four different mechanisms by
setting the type:
value to the common configurations in k8s.
Depending on your choice, you only have to configure the identically-named sections.
Type |
Description |
---|---|
|
Default value. Only exposes the service internally within the k8s cluster using an IP address internal to the cluster. Use this in the early stages of configuration. |
|
Configures the internal port number of the server for requests from
outside the cluster on the |
|
Used with platforms that provide a load balancer. This
option automatically creates |
|
This option provides a production-level, securable configuration. It allows a load balancer to route to multiple apps in the cluster, and may provide load balancing, SSL termination, and name-based virtual hosting. |
The following are the default values for the cache service expose:
top level
node in the values.yaml
file:
expose:
port: 8180
# one of: nodePort, clusterIp, loadBalancer, ingress
type: "clusterIp"
clusterIp:
name: "cache-service"
ports:
http:
port: 8180
nodePort:
name: "cache-service"
ports:
http:
port: 8180
nodePort: 30180
loadBalancer:
name: "cache-service"
IP: ""
ports:
http:
port: 8180
annotations: {}
sourceRanges: []
ingress:
ingressName: "cache-service-ingress"
serviceName: "cache-service"
servicePort: 8180
ingressClassName:
tls:
enabled: true
secretName:
host:
path: "/"
annotations: {}
Configure TLS (optional)#
Note
This is separate from configuring TLS on SEP itself.
If your organization uses TLS, you must enable and configure your cache service to work with it. The most straightforward way to handle TLS is to terminate TLS at the load balancer or ingress, using a signed certificate. We strongly suggest this method, which method requires no additional configuration in the cache service.
If you choose not to handle TLS using that method, you can instead configure it
in the expose
top-level node of the HMS Helm chart:
expose:
type: "[clusterIp|nodePort|loadBalancer|ingress]"
The default type
is clusterIp
. However, this is not suitable for
production environments. If you need help choosing which type is best, refer to
the expose documentation for SEP.
database.internal
#
The configuration properties for the backend PostgreSQL database internal to the
cache service are found in the database
top-level node. You can either use
default PostgreSQL database internal to this service or a self-managed, existing
external database instance. We strongly
suggest using the default internal database. Use the type
property to
select the type of database:
database:
type: "[internal|external]"
Note
If you are using a self-managed, existing external database, ensure that it is available before proceeding.
As a minimal customization, you must ensure that the following are set correctly for your environment:
databaseName: "cacheservice"
databaseUser: "cacheservice"
databasePassword: "CacheServicePass1234"
You must also configure volume
persistence options, if desired, as well as
the resources
for the backing database itself in the database
node.
Note
The database.resources
node is separate from the top level resources
node. It defines the resources available to the backing database itself, not
the cache service.
In the .Values.config.config.properties
configuration it is required to
refer to appropriate environment variables to enable integration with the
database backend:
config.properties: |
service-database.user=${ENV:SERVICE_DATABASE_USER}
service-database.password=${ENV:SERVICE_DATABASE_PASSWORD}
service-database.jdbc-url=${ENV:SERVICE_DATABASE_JDBC_URL}
The following snippet shows the default configuration for the internal cache service backend database:
database:
type: internal
internal:
image:
repository: "library/postgres"
tag: "10.6"
pullPolicy: "IfNotPresent"
volume:
# use one of:
# - existingVolumeClaim to specify existing PVC
# - persistentVolumeClaim to specify spec for new PVC
# - other volume type inline configuration, e.g. emptyDir
# Examples:
# existingVolumeClaim: "my_claim"
# persistentVolumeClaim:
# storageClassName:
# accessModes:
# - ReadWriteOnce
# resources:
# requests:
# storage: "2Gi"
emptyDir: {}
resources:
requests:
memory: "1Gi"
cpu: 2
limits:
memory: "1Gi"
cpu: 2
driver: "org.postgresql.Driver"
port: 5432
databaseName: "cacheservice"
databaseUser: "cacheservice"
databasePassword: "CacheServicePass1234"
envFrom: []
env: []
Node name |
Description |
---|---|
|
Set to |
|
Docker container images used for the PostgreSQL server |
|
Storage volume to persist the database. The default configuration requests a new persistent volume (PV). |
|
The default configuration, which requests a new persistent volume (PV). |
|
Alternative volume configuration, which uses an existing volume claim by
referencing the name as the value in quotes, e.g., |
|
Alternative volume configuration, which configures an empty directory on the pod. Keep in mind that a pod replacement loses the database content |
|
|
|
Name of the internal database |
|
User to connect to the internal database |
|
Password to connect to internal database |
|
YAML sequence of mappings to define Secret or Configmap as a source of environment variables for the PostgreSQL container. |
|
YAML sequence of mappings to define two keys environment variables for the PostgreSQL container. |
database.external
#
This section shows the setup for using an external PostgreSQL or MySQL database instance. You must provide the
necessary details for the external server, and ensure that it can be reached
from the k8s cluster pod. Set the database.type
to external
and
configure the connection properties:
In the .Values.config.config.properties
configuration it is required to
refer to the appropriate environment variables in order to enable integration
with the database backend:
config.properties: |
service-database.user=${ENV:SERVICE_DATABASE_USER}
service-database.password=${ENV:SERVICE_DATABASE_PASSWORD}
service-database.jdbc-url=${ENV:SERVICE_DATABASE_JDBC_URL}
database:
type: external
external:
jdbcUrl:
user:
password:
Node name |
Description |
---|---|
|
Set to |
|
JDBC URL to connect to the external database as required by the database
and used driver, including hostname and port. Ensure you use a valid JDBC
URL as required by the PostgreSQL or MySQL driver. Typically the syntax
requires the host, port and database name
|
|
Database user name to access the external database using JDBC. |
|
Password for the user configured to access the external database using JDBC. |
commonLabels
#
The following are the default values for the commonLabels:
top level node in
the cache service values.yaml
file:
commonLabels: {}
# environment: dev
# myLabel: labelValue
Configure SEP to use the cache service#
The cache service requires a database schema to store configuration data. Ensure that you have created the schema, and note the connection information. Once the cache service and the external RDBMS is configured and running, SEP must be configured as in the following example, which shows a PostgreSQL database providing the backing schema:
coordinator:
etcFiles:
properties:
cache.properties: |
service-database.user=postgres
service-database.password=S3cr3t1v3
service-database.jdbc-url=jdbc:postgresql://<your_rds_endpoint>:5432/redirections
starburst.user=starburst_service
starburst.password=
starburst.jdbc-url=jdbc:trino://coordinator:8080
rules.file=secretRef:cache-rules:cache-rules.json
rules.refresh-period=1m
refresh-initial-delay=1m
refresh-interval=24h
Many connectors support the use of the cache service. For each supported catalog that you wish to use with the cache service, two lines must be added to the catalog properties configuration:
redirection.config-source=SERVICE
cache-service.uri=http://cache-service:8180
In the following example, the mysalesdata
catalog is configured to use the
cache service:
catalogs:
mysalesdata: |
connector.name=postgresql
connection-url=jdbc:postgresql://<mydbhost>:5432/bootcamp
connection-user=postgres
connection-password=S3cr3t1v3
statistics.enabled=true
redirection.config-source=SERVICE
cache-service.uri=http://cache-service:8180
Configuration examples#
External secret reference#
To configure the cache service to work with the cache rules as an external secret reference, first create a k8s secret holding the file:
kubectl create secret generic cache-rules --from-file=cache-rules.json
When the file is created, you can configure the secret reference usage for the above configuration:
config:
config.properties: |
service-database.user=${ENV:SERVICE_DATABASE_USER}
service-database.password=${ENV:SERVICE_DATABASE_PASSWORD}
service-database.jdbc-url=${ENV:SERVICE_DATABASE_JDBC_URL}
starburst.user=bob
starburst.jdbc-url=jdbc:trino://coordinator:8080
rules.file=secretRef:cache-rules:cache-rules.json
This mounts the secret named cache-rules
in the path
/mnt/secretsRef/cache-rules
and replaces the secretRef:cache-rules
occurrences with the absolute path, resulting in the following configuration
property setting:
rules.file=/mnt/secretRef/cache-rules/cache-rules.json
This mechanism can only be applied for properties files defined under
.Values.config
node. Specific secret values, such as passwords, can be
passed into properties files using the .Values.envFrom
.
Next steps#
Review the following topics to enable important performance features for your SEP deployment: