Plan your Kubernetes deployment#

Kubernetes (k8s) support for Dell Data Analytics Engine, powered by Starburst Enterprise platform (SEP) allows you to run your SEP clusters with your additional components such as the Hive Metastore Service (HMS).

The following sections detail the specific requirements and considerations when planning for SEP deployments using k8s.

K8s cluster requirements#

The following k8s versions are supported:

  • 1.29

  • 1.28

  • 1.27

  • 1.26

  • 1.25

In addition to the SEP requirements listed in our deployment basics topic, your cluster nodes must fulfill the following requirements:

  • 64 to 256 GB RAM

  • 16 to 64 cores

  • All nodes must be identical.

  • Each node is dedicated to one SEP worker or coordinator only.

  • Nodes are not shared with other applications running in the cluster.

  • x86_64 (AMD64) or AArch64 (ARM64) processor architecture

K8s platform services#

The following Kubernetes platform services are tested regularly and supported:

  • Amazon Elastic Kubernetes Service (EKS)

  • Google Kubernetes Engine (GKE)

  • Microsoft Azure Kubernetes Service (AKS)

  • Red Hat OpenShift

  • Rancher RKE2

This topic focuses on k8s usage and practice that applies to all services.

Other Kubernetes distributions and installations may work if they fulfill the requirements, but they are not tested and not supported.

Warning

SEP is designed to work with homogenous resource allocation in terms of network, CPU, and memory for all workers. Resource sharing on a node or pod is therefore not recommended. Failure to ensure this setup can result in unpredictable query performance and query processing failures.

A simple and recommended approach for optimal operation is to ensure that each node only hosts one coordinator or worker pod, and nodes and pods are not shared with other applications. SEP performs best with exclusive access to the underlying memory and CPU resources since it uses significant resources on each node.

The recommended approach to achieve this is to use a dedicated cluster or namespace for all SEP nodes. You can read more about this in our cluster design guidelines.

If you operate SEP in a cluster shared with other applications, you must ensure that all nodes have reliable access to the same amount of CPU, memory, and network resources. You can use nodegroups, taints and tolerations, or pod affinity and anti-affinity as well as CPU and memory requests and limits to achieve consistency across nodes. This approach is not recommended, since it is more complex to implement, but can be used by experienced k8s administrators. Contact your account team for assistance, if you are required to use this type of deployment.

Scaling requirements#

In your SEP deployment, install the following components:

Automatic scaling adds and removes worker nodes based on demand. This differs from the commonly used horizontal scaling where new pods are started on existing nodes, and is a result of the fact that workers require a full dedicated node. You must ensure that your k8s cluster supports this addition of nodes and has access to the required resources.

Access requirements#

Access to SEP from outside the cluster using the Trino CLI, or any other application, requires the coordinator to be available via HTTPS and a DNS hostname.

This can be achieved with an external load balancer and DNS that terminates HTTPS and reroutes to HTTP requests inside the cluster.

Alternatively you can configure a DNS service for your k8s cluster and configure ingress appropriately.

Service database requirement#

The SEP backend service is required. You must provide an externally-managed database appropriate to your environment for the service to use.

Installation tool requirements#

In addition, we strongly recommend Octant to simplify cluster workload visualization and management. The Octant Helm plugin can simplify usage further.

Note

Check out our Helm troubleshooting tips.

Helm chart repository#

The Helm charts and Docker images required for deployment and operation are available in the Starburst Harbor instance at https://harbor.starburstdata.net.

To obtain a customer-specific Harbor account, contact Starburst Support.

To add the Helm repository to your cluster, use the following command:

$ helm repo add \
  --username yourusername \
  --password yourpassword \
  --pass-credentials \
  starburstdata \
  https://harbor.starburstdata.net/chartrepo/starburstdata

To confirm the Helm repository was successfully added to your cluster, use the following command:

$ helm repo list
NAME           URL
starburstdata  https://harbor.starburstdata.net/chartrepo/starburstdata

To view Helm charts in the repository, use the following command:

$ helm search repo
NAME                                 CHART VERSION  APP VERSION  DESCRIPTION
starburstdata/starburst-hive         453.8.0                     Helm chart for Apache Hive
starburstdata/starburst-enterprise   453.8.0        1.0          A Helm chart for Starburst Enterprise

To update to a new release of SEP, use the following command:

$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "starburstdata" chart repository
Update Complete. ⎈ Happy Helming!⎈

Docker registry#

The Helm charts reference the Docker registry on Harbor to download the relevant Docker images. You can also use your own Docker registry.

For more information, see Docker images.

License#

In order to use SEP, you must get a license file from Starburst Support. For information on how to configure your license file, see Docker images.

RBAC-enabled clusters#

Kubernetes by default provides the ClusterRole edit. This role includes all necessary permissions to deploy and work with Helm charts that consist of common resources. SEP uses one additional custom type through use of the externalSecrets: node. However, this custom resource definition must be deployed to the cluster with its own chart. It is only required in specific use cases, as in this example.