Configuring Starburst Enterprise in Kubernetes#
The starburst-enterprise
Helm chart configures the Dell Data Analytics Engine, powered by Starburst Enterprise platform (SEP) coordinator
and worker nodes and the internal cluster communication with the values.yaml
file detailed in the following sections.
We strongly suggest that you follow best practices when customizing your cluster, creating small, targeted files to override any defaults, and adding any configuration properties.
Note
Many section in this page have links to examples of the relevant YAML in our YAML examples page. While you are configuring your cluster, it is is helpful to refer to SEP’s directory structure to ensure you are configuring any file locations correctly.
Throughout this page there are links to more descriptive documentation for a given configuration area. In many cases, these refer to the legacy configuration file names by name.
Applying configuration changes#
To update SEP with any configuration changes, run the helm update
command with the updated YAML files and the --install
switch, as in this
example:
$ helm upgrade my-sep-prod-cluster starburstdata/starburst-enterprise \
--install \
--version 423.13.0 \
--values ./registry-access.yaml
--values ./sep-prod-setup.yaml
You can use the same command as if you were updating to a new release. Helm will
compare all --values
files and the version, and safely ignore any that are
unchanged.
Top level nodes#
The top level nodes are as follows, in order of appearance in the default
values.yaml
file embedded in the Helm chart. Click on a section header to
access the relevant documentation:
Node name |
Description |
---|---|
Docker images section |
|
|
Contains the details for the SEP Docker image. |
|
Contains the details for the SEP bootstrap Docker image. |
Docker registry access section |
|
|
Defines authentication details for Docker registry access. Cannot
be used if using |
|
Alternative authentication for selected Docker registry using secrets.
Cannot be used if using |
Internal communications section |
|
|
Sets the shared secret value for internal communications. |
|
The environment name for the SEP cluster. |
|
Specifies port numbers used for internal cluster communications. |
Ports and networking |
|
|
Defines the mechanisms and their options that expose the SEP cluster to an outside network. |
Coordinator and workers |
The coordinator and worker nodes have many properties in common. They are
detailed in the Coordinator section, and referenced in the Worker section,
rather than repeating them. Likewise, examples are provided for the
|
|
Extensive section that configures the SEP coordinator pod. |
|
Extensive section that configures the SEP worker pods. |
Startup options section |
|
|
Specifies a shell script to run after coordinator and work pod launch, and before SEP startup on the pods. |
|
List of extra arguments to be passed to the |
|
Lists secret to be used for |
Advanced security options |
|
|
Defines mounted external secrets from a Kubernetes secrets manager. |
|
Configures the user database to use for file-based user authentication. This is disabled by default for versions 356-e and later. |
|
Sets the Kubernetes security context that defines privilege and access control settings for all SEP pods. |
Pod lifecycle management section |
|
|
Specifies the Kubernetes readiness probe that monitor containers for readiness to accept traffic. |
|
Specifies the Kubernetes liveness probe that monitor containers for the need to restart. |
SEP cache, disk and memory management |
|
|
Specifies query processing and management configuration properties. |
|
Configures spilling of query processing data to disk. |
|
Configures Hive storage caching. |
Catalogs to configure connected data sources |
|
|
Configures the catalog properties files. |
Miscellaneous |
|
|
Defines mounted volumes for various uses in the cluster. |
|
Configures cluster metrics gathering with Prometheus. |
|
Defines common labels to identify objects in a KRM to use with the
|
Docker images#
The default configuration automatically contains the details for the relevant Docker image on the Starburst Harbor instance. Typically you should not configure any overrides. An exception is if you host the Docker image on a different Docker registry.
Node name |
Description |
---|---|
|
The |
|
The |
Defaults and examples:
Docker registry access#
These nodes enable access to the Docker registry. They should be added to the
registry-access.yaml
file to access the registry as shown in the best
practices guide.
Node name |
Description |
---|---|
|
The |
|
Alternative authentication for selected Docker registry using secrets.
NOTE: Cannot be used if using |
Warning
You can only define authentication with registryCredentials
or
imagePullSecrets
. Defining both is invalid, and causes an error.
Defaults and examples:
Internal communication#
SEP provides three top level nodes for internal cluster communication between the coordinator and the workers. Because internal communication configurations and credentials are unique, these are not configured by default.
Ensure to configure sharedSecret
and environment
values in all your
clusters to use secured communication with internal authentication of the nodes.
Node name |
Description |
---|---|
|
The environment name for the SEP cluster, used to set the
|
|
Sets the shared secret value for secure communication
between coordinator and workers in the cluster to a long, random string that you
provide. If not set, the |
|
When set to |
Defaults and examples:
Exposing the cluster#
You must expose the cluster to allow users to connect to the SEP coordinator
with tools such as the CLI, applications using JDBC/ODBC drivers and any other
client application. This service-type configuration is defined by the expose
top level node. You can choose from four different mechanisms by setting the type
value to the common configurations in k8s.
Depending on your choice, you only have to configure the identically-named sections.
Type |
Description |
---|---|
|
Default value. Only exposes the coordinator internally within the k8s cluster using an IP address internal to the cluster. Use this in the early stages of configuration. |
|
Configures the internal port number of the coordinator for requests from
outside the cluster on the |
|
Used with platforms that provide a load balancer. This
option automatically creates |
|
This option provides a production-level, securable configuration. It
allows a load balancer to route to multiple apps in the cluster, and
may provide load balancing, SSL termination, and name-based virtual
hosting. For example, the SEP coordinator and Immuta server can be
in the same cluster, and can be exposed via |
Defaults and examples:
Coordinator#
The top level coordinator:
node contains property nodes that configure the pod of the cluster that runs the
SEP coordinator. The default values are
suitable to get started with reasonable defaults on a production-sized k8s
cluster.
Note
These property nodes also appear under the top level worker:
node unless
otherwise noted.
Node name |
Description |
---|---|
|
Specifies the content of the |
|
The CPU and memory resources to use for the coordinator pod. These settings can be adjusted to match your workload needs, and available node sizes in the cluster. |
|
The size of the container memory headroom. The value needs to be less than
resource allocation limit for memory defined in |
|
Percentage of container memory reduced with headroom assigned to Java heap. Must be less than 100. |
|
Headroom of Java heap memory not tracked by SEP during query execution. Must be less than 100. |
|
Any properties described in the reference documentation that are not specific to any other YAML node are set here. Example usages are to set query time-out and memory values and to enable Starburst Insights. Configuration properties can be found throughout the reference documentation, including the Properties reference page. |
|
Allows for the propagation of environment variables from different sources complying with K8S schema can be used to deliver values to SEP configuration properties file by creating a Kubernetes secret holding variable values. |
|
Configuration to determine the node and pod to use. |
|
Configuration to annotate the coordinator deployment. |
|
Configuration to annotate the coordinator pod. |
|
Priority class for coordinator pod for setting k8s pod priority and preemption. |
|
Attach additional sidecar containers to the coordinator pod. |
|
Add extra init containers to the coordinator pod. |
Defaults and examples:
etcFiles
on the coordinator#
Note
These property nodes also appear under the top level worker:
node unless
otherwise noted.
Node name |
Description |
---|---|
|
Defines the content of the JVM config for the coordinator. |
|
Defines configuration files located in the |
|
Defines the content of the
default configuration file
for the coordinator. You can also use |
|
Defines the contents of the node.properties file
for the coordinator, including paths to installation and log directories.
The unique identifier and environment are pulled in from the |
|
Defines the contents of the log configuration files for the coordinator. |
|
Configures password authentication
for SEP on the coordinator.
NOTE: Does not apply to |
|
Enables and configures access control
for SEP on the coordinator. NOTE: Does not apply to
|
|
Configures an optional exchange manager for Fault-tolerant execution. |
|
Other files that need to be placed in the |
Node assignment#
All charts allow you to configure criteria to define which node and pod in the cluster is suitable to use for running the relevant container. In SEP, this may be useful if you have a cluster spanning availability zones, or if you are using an HMS in your cluster with smaller node sizes. The following configurations are available, and by default are not defined:
nodeSelector: {}
affinity: {}
tolerations: []
Example configurations are available in the k8s documentation. Specific usage and values are highly dependent on your k8s cluster configuration.
Further resources:
Coordinator defaults and examples#
Workers#
The worker
section configures the pods of the cluster that run the
SEP workers.
The top level worker:
node contains property nodes
that configure the pod of the cluster that runs the SEP workers. The
default values are suitable to get started with
reasonable defaults on a production-sized k8s cluster.
Many of the properties for this node also appear in the coordinator:
top
level node. They are fully documented in the Coordinator
section:
worker.etcFiles.*
worker.resources
worker.nodeMemoryHeadroom
worker.heapSizePercentage
worker.heapHeadroomPercentage
worker.additionalProperties
worker.envFrom
worker.nodeSelector
worker.affinity
worker.tolerations
worker.deploymentAnnotations
worker.podAnnotations
worker.priorityClassName
worker.sidecars
worker.initContainers
The following properties apply only to worker
:
Node name |
Description |
---|---|
|
Specifies |
|
The number of worker pods for a static cluster. |
|
Configuration for the minimum and maximum number of workers. Ensure the
additional requirements for scaling are
fulfilled on your k8s cluster. Set Scaling down proceeds until WARNING: The autoscaling feature does not yet work with OpenShift clusters (as of the latest release 4.6), due to a known limitation with OpenShift clusters. HPA does not work with pods having init containers in OpenShift. |
|
An alternative method of scaling the number of workers is implemented using the KEDA scaler for Kubernetes workloads in the SEP Helm chart. Scaler configuration is described in a dedicated section. |
|
Specifies the termination grace period for workers. Workers are not terminated until queries running on the pod are finished and the grace period passes. |
|
Sets |
The following configuration properties for workers are identical to the coordinator properties, documented in preceding section.
Startup shell script for coordinator and workers nodes#
initFile:
is a top level node that allows you to create a startup shell
script (example) to customize how SEP is
started on the coordinator and workers, and pass additional arguments to it.
These are undefined by default.
Node name |
Description |
---|---|
|
A shell script to run before SEP is launched. The content of the file
has to be an inline string in the YAML file. The script is started as
|
|
List of extra arguments to be passed to the |
User security considerations#
SEP has extensive security options that allow you specify how to authenticate users. Because user security configurations and credentials are unique, these are not configured by default.
Security context#
You can configure a security context to define privilege and access control settings for the SEP pods.
securityContext:
runAsNotRoot: true
runAsUser: 1000
runAsGroup: 0
Note
These settings are typically not required, and are highly dependent upon your
Kubernetes environment. For example, OpenShift requires audit_log
to be
set in securityContext
in order to run sudo
, while other platforms use
arbitrary user IDs. Refer to the Kubernetes documentation and to the
documentation for your particular platform to ensure your securityContext
is configured correctly.
Service account#
You can configure a service account for the SEP pods using:
serviceAccountName:
External secret reference#
There are several locations where properties require pointing to files delivered outside of SEP, such as CA certificates. In such cases, you can use a special notation that allows you to point to a k8s secret.
For example, you can configure password authentication using LDAP (example). This requires k8s to create the
etc/password-authenticator.properties
configuration file, which in turn
points to the ca.crt
certificate file.
Defining external secrets#
You can automatically mount external secrets, for example from the AWS Secrets
Manager, using the secretRef
or secretEnv
notation.
Node name |
Description |
---|---|
|
Type of the external secret provider. Currently, only |
|
Prefix of all secrets that need to be mapped to external secret. |
|
The external-secrets backend type, such as
|
The Helm chart scans for all secretRef
or secretEnv
references in the
referenced YAML files which start with the configured secretPrefix
string.
For each secret found, it generates an``ExternalSecret`` K8s manifest
(example).
Note
The selected external secrets provider needs to be deployed and configured
separately. The secret names in the external storage must match names of K8s
secrets you reference. When using secretEnv
, the external storage secret
must contain only a single value. For each external secret a single K8s secret
is created, including one key with external secret value.
File-based authentication#
The unix command htpasswd
can generate a user database, which can be used to
configure file-based user authentication. It creates the file under
/usr/lib/starburst/etc/auth/{{.Values.userDatabase.name}}
. This allows you to
statically deliver user credentials to the file
etc/password-authenticator.properties
(example).
Alternatively, you can use a secret to add an externally created user database
file to SEP. Set the file.password-file
property to the user database file,
and ensure to disable the built-in user database
(example).
Performance considerations#
SEP has performance management configuration options that help you to optimize aspects of SEP’s performance.
Query memory usage control#
The top level query:
node lets you to set the maxConcurrentQueries
configuration property. The maxConcurrentQueries
configuration property
divides the query memory space by the value set. By setting this value, you
establish a limit on the maximum memory usage for each query. If a query exceeds
the allocated memory, it fails with an out-of-memory error.
This configuration property defaults to 3
, which means that a single query
can use up to one-third of the available memory space. To allocate more memory
for concurrent queries, decrease the value of this property.
All other query processing properties must be set using the
additionalProperties:
node on the coordinator.
Spilling#
The top level query:
node (example) allows
you to configure SEP’s spilling properties.
Spilling uses internal node storage, which is mounted within the container.
Warning
Spilling is disabled by default, and we strongly suggest to leave it disabled. Enabling spill should be used as a method of last resort to allow for rare, memory-intensive queries to succeed on a smaller cluster at the expense of query performance and overall cluster performance.
Hive connector storage caching#
The cache:
top level node (example) allows
you to configure Hive connector storage caching. It is disabled by default.
Node name |
Description |
---|---|
|
Enable or disable caching for all catalogs using the Hive connector. If
you want to only enable it for a specific catalog, you have to configure
it with the catalog configuration and
|
|
Set the value for the |
|
Set the value for the |
|
Configure the volume in which to store the cached files. |
Catalogs#
The catalogs:
top level node allows you to configure a catalog for each of
your data sources. The catalogs defined in this node are used to create catalog
properties files, which contain key-value pairs for catalog properties.
Information for specific properties supported in each catalog can be found with
the documentation for the connectors. At the very minimum, a
catalog definition must consist of the name of the name of the catalog, and the
connector.name
property.
For best practices, use the YAML multi-line syntax shown in the examples to configure the content in multiple lines indented below the catalog name.
Defaults and examples:
Additional volumes#
Additional k8s volumes can be
necessary for persisting files, for Hive object storage caching, and for a number of other use cases. These can be
defined in the additionalVolumes
section (example). None are defined by default.
Node name |
Description |
---|---|
|
Specifies the path to the mounted volume. If you specify |
|
A directory, with or without data, which is accessible to the containers in a pod. |
|
A volume with a |
|
REQUIRED when using |
|
When specified, a specific key named |
Adding files#
Various use cases around security and event listeners need additional config files as properties or XML files. You can add any file to a pod using config maps.
Types of files you may need to add:
LDAP authentication file
Hive site xml file
Alternatively you can also use additionalVolumes to
mount the files and copy the files to appropriate location using path
and
subPath
parameter (example).
Prometheus#
This top level node configures the cluster to create Prometheus metrics. It is not to be confused with the connector of the same name. It is enabled by default.
We strongly suggest reviewing our example Prometheus rules.
SEP built-in directories#
The location of specific directories on the SEP container is important, if you configure additional files or otherwise want to tweak the container.
SEP is configured to use the following directories in the container:
/usr/lib/starburst
: Top level folder for the SEP binaries./usr/lib/starburst/plugin
: All plugins for SEP including connectors/usr/lib/starburst/etc
:etc
folder for SEP configuration files such asconfig.properties
and others./usr/lib/starburst/bin
: Location of therun-starburst
script, which is invoked with the container start./data/starburst/var/run
: Containslauncher.pid
file used by the launcher script./data/starburst/var/log
: Contains secondary, rotated log files. Main log is redirected to stdout as recommended on containers.
Sidecar and init containers#
Extra sidecar and init containers can be specified to run together with
coordinator or worker pods. You can use these containers for a number of use
cases such as to prepare an additional driver for the SEP runtime. To have a
directory available in all containers, define an emptyDir
volume in
additionalVolumes
(section: Additional volumes).
Sample configuration:
coordinator:
sidecars:
- name: sidecar-1
image: alpine:3
command: [ "ping" ]
args: [ "127.0.0.1" ]
initContainers:
- name: init-1
image: alpine:3
command: [ "bash", "-c", "printenv > /mnt/InContainer/printenv.txt" ]
worker:
sidecars:
- name: sidecar-3
image: alpine:3
command: [ "ping" ]
args: [ "127.0.0.1" ]
additionalVolumes:
- path: /mnt/InContainer
volume:
emptyDir: {}
KEDA scaler#
KEDA is a Kubernetes-based Event Driven Autoscaler. SEP
can be configured with an external KEDA scaler to adjust the number of
workers automatically based on JVM performance metrics available via JMX. Once
it is enabled with the .Values.worker.kedaScaler.enabled
property, the
coordinator pod runs an additional container called keda-trino-scaler
. This
container works as a GRPC service, and communicates with KEDA to scale workers.
Note
The .Values.worker.autoscaling
and .Values.worker.kedaScaler.enabled
properties enable mutually-exclusive features that cannot be enabled together
as KEDA also uses HPA
to scale Kubernetes Deployments.
Prerequisites#
KEDA with Helm - versions 2.6.x and higher are required.
Prometheus JMX Exporter - The SEP Helm chart ensures that the exporter is enabled, and configured with all necessary rules.
Configuration#
The following nodes and node sections configure the KEDA scaler:
Node or section name |
Description |
---|---|
|
Set this node to |
|
Corresponds to the parameters such as |
|
Defines the scaling method implemented by this scaler. Only the
|
|
Defines the maximum number of concurrent queries handled by a single
worker. For example, if set to |
|
Reduces the number of workers to |
Troubleshooting#
When the KEDA scaler does not behave as expected, review the following logs:
keda-trino-scaler
container logs in the coordinator podkeda-operator
logs in thekeda
namespace