Starburst Enterprise with Kubernetes requirements#

Kubernetes (k8s) and the related tools are very vibrant open source projects. The dynamic nature of these projects and the commercial extensions and modifications result in a lot of change as well as a myriad of features and options.

Usage of Dell Data Analytics Engine, powered by Starburst Enterprise platform (SEP) on Kubernetes can not support all the variations and the following sections detail the specific requirements for SEP deployments on k8s.

K8s cluster requirements#

The following k8s versions are supported:

  • 1.25

  • 1.24

  • 1.23

  • 1.22

The following Kubernetes platform services are supported:

  • EKS

  • GKE

  • AKS

  • OpenShift 4.9 or higher

In all cases the clusters, and the described usage and deployment, only support standard k8s tools.

The nodes in the k8s cluster need to fulfill the following requirements:

  • 64 to 256 GB RAM

  • 16 to 64 cores

  • all nodes are identical

  • each node is dedicated to one SEP worker or coordinator only

  • nodes are not shared with other applications running in the cluster

  • x86_64 or ARM (Graviton) processor architecture

Warning

SEP is designed to work with homogenous resource allocation in terms of network, CPU, and memory for all workers. Resource sharing on a node or pod is therefore not recommended. Failure to ensure this setup can result in unpredictable query performance and query processing failures.

A simple and recommended approach for optimal operation is to ensure that each node only hosts one coordinator or worker pod, and nodes and pods are not shared with other applications. SEP performs best with exclusive access to the underlying memory and CPU resources since it uses significant resources on each node.

The recommended approach to achieve this is to use a dedicated cluster or namespace for all SEP nodes.

If you operate SEP in a cluster shared with other applications, you must ensure that all nodes have reliable access to the same amount of CPU, memory, and network resources. You can use nodegroups, taints and tolerations, or pod affinity and anti-affinity as well as CPU and memory requests and limits to achieve consistency across nodes. This approach is not recommended, since it is more complex to implement, but can be used by experienced k8s administrators. Contact your account team for assistance, if you are required to use this type of deployment.

We recommend that you take advantage of our SEP performance tuning training video for in-depth information on topics such as cluster and machine sizing, workload tuning and resource management to help you make informed choices while planning your implementation.

Scaling requirements#

If you plan to use automatic scaling of your SEP deployment, additional required components need to be installed:

Reference the documentation of the above tools for installation instructions.

The automatic scaling adds and removes worker nodes based on demand. This differs from the commonly used horizontal scaling where new pods are started on existing nodes, and is a result of the fact that workers require a full dedicated node. You need to ensure that your k8s cluster supports this addition of nodes and has access to the required resources.

Access requirements#

Access to SEP from outside the cluster using the Trino CLI, or any other application, requires the coordinator to be available via HTTPS and a DNS hostname.

This can be achieved with an external load balancer and DNS that terminates HTTPS and reroutes to HTTP requests inside the cluster

Alternatively you can configure a DNS service for your k8s cluster and configure ingress appropriately.

Service database requirement#

The SEP query logger is a required service. You must provide an externally-managed database appropriate to your environment for the query logger to use.

Installation tool requirements#

  • kubectl, version identical to the k8s cluster version

  • helm, version 3.2.4 or newer

In addition we strongly recommend Octant to simplify cluster workload visualization and management. The Octant Helm plugin can simplify usage further.

Note

Check out our Helm troubleshooting tips.

Helm chart repository#

The Helm charts and docker images required for deployment and operation are available in the Starburst Harbor instance at https://harbor.starburstdata.net.

Customer-specific user accounts to access Harbor are available from Starburst.

Installation and usage requires you to add the Helm repository on Harbor:

helm repo add \
  --username yourusername \
  --password yourpassword \
  --pass-credentials \
  starburstdata \
  https://harbor.starburstdata.net/chartrepo/starburstdata

Confirm success by listing the repository with the following command:

$ helm repo list
NAME           URL
starburstdata  https://harbor.starburstdata.net/chartrepo/starburstdata

If you search the repository, the available charts are listed:

$ helm search repo
NAME                                 CHART VERSION  APP VERSION  DESCRIPTION
starburstdata/starburst-hive         413.18.0                     Helm chart for Apache Hive
starburstdata/starburst-enterprise   413.18.0        1.0          A Helm chart for Starburst Enterprise

After new releases from Starburst, you have to update the repository:

$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "starburstdata" chart repository
Update Complete. ⎈ Happy Helming!⎈

Docker registry#

The Helm charts reference the Docker registry on Harbor to download the relevant docker images.

Create a values.yml for each Helm chart you want to install in a cluster. This file contains the configuration for the specific chart in the specific cluster.

As a minimum you need the YAML file for registry credentials and a separate YAML file for each chart and each cluster.

More details, including using your own Docker registry, are available in the Docker image and registry section for the Helm chart for SEP.

License#

You need to ensure you get a license file from Starburst and configure it, if you intend to use features of SEP.

RBAC-enabled clusters#

Kubernetes by default provides the ClusterRole edit. This role includes all necessary permissions to deploy and work with Helm charts that consist of common resources. SEP uses one additional custom type through use of the externalSecrets: node. However, this custom resource definition must be deployed to the cluster with its own chart. It is only required in specific use cases, as in this example.

Helm troubleshooting tips#

Here are some things to keep in mind as you implement SEP with Kubernetes and Helm in your organization.

  • Helm is space-sensitive and tabs are not valid. Our default values.yaml file uses 2-space indents. Ensure that your code editor preserves the correct indentation as it copies and pastes text.

  • If your YAML files fail to parse, Helm provides several tools to debug this issue.

  • Sometimes problems can arise from improperly formatted YAML files. SEP uses Helm to build its *.properties files from YAML both with YAML key-value pairs and with multi-line strings. We recommend that you review the YAML techniques used in Helm to ensure that you are comfortable with using the Helm multi-line strings feature, and to understand the importance of consistent indentation.

  • Units can also be a source of confusion. We strongly suggest that you pay close attention to any units regarding memory and storage. In general, units provided for multiline strings that feed into SEP configuration files are in traditional metric bytes, such as megabytes (MB) and gigabytes (GB) as used by SEP. However, machine sizing and other values are in binary multiples such as mebibytes (Mi) and gibibytes (Gi), since these are used directly by Helm and Kubernetes.