GPU acceleration with Spark RAPIDS#

Dell Data Processing Engine includes support for GPU acceleration using the NVIDIA Spark RAPIDS plugin. This enables Spark applications to leverage NVIDIA GPUs for faster data processing and improved performance on supported workloads.

Requirements#

Software requirements#

Spark RAPIDS 25.08.
NVIDIA drivers 525.60.13+.
CUDA 12 installed in the /usr/local/cuda path.
NVIDIA GPU operator 25.3.2+ with the driver.enabled and the toolkit.enabled configuration properties set to false.

Hardware requirements#

A supported GPU. The RAPIDS plugin is designed to run on NVIDIA Volta, Turing, Ampere, Ada Lovelace, Hopper, and Blackwell generation datacenter GPUs.

Configuration#

Spark RAPIDS GPU support can be enabled when configuring a resource pool. You can adjust the maximum number of GPU cores in total and the maximum number of GPU cores per application.

Configuration properties can be added to your GPU-enabled jobs via the job submission UI or the CLI.

The following is an example of a basic configuration:

spark.driver.memory=4g
spark.driver.cores=4
spark.driver.resource.gpu.amount=0
spark.executor.resource.gpu.amount=1

Property	Description
`spark.driver.memory`	Amount of memory to use for the driver process.
`spark.driver.cores`	Number of cores to use for the driver process.
`spark.driver.resource.gpu.amount`	Amount of the GPU resource to use on the driver.
`spark.executor.resource.gpu.amount`	Amount of the GPU resource to use per executor process.

You can view all available properties in the official Spark RAPIDS documentation:

Performance tuning#

You can use the following practices to optimize GPU performance:

Caution

GPU performance tuning is highly environment-specific. The properties and values shown here are examples, not recommended defaults. Apply changes incrementally, measure impact, and revert if stability or performance degrades.

Ensure the GPU is being used. It is essential to eliminate GPU execution warnings that show a query cannot fully run on the GPU. Iteratively review logs and adjust configurations, data layout, or query logic to address them.
Avoid Out of Memory (OOM) conditions. OOM events can cause repeated executions resulting in longer query times. Review logs or the Spark History server for memory errors. The configuration properties below can help reduce the likelihood of OOM situations in some GPU workloads, but they are not guaranteed to prevent them and must be tuned for your environment.
```
 spark.executor.memoryFraction=0.25, // Sets the fraction of heap space allocated to Spark
 spark.executor.memoryOverhead=4g // Reserves memory for executer overhead
```
Avoid GPU to CPU fallbacks caused by padding. Your job may fall back to the CPU if it processes padded strings. To avoid this behavior, add the following properties to your configuration:
```
spark.sql.legacy.charVarcharAsString=true
spark.sql.readSideCharPadding=false
```
Note

In some cases this may lead to compatibility issues, such as strict CHAR semantics when writing data.

Example configuration#

The following is an example configuration that includes the tuning properties detailed above:

spark.driver.memory=4g
spark.driver.cores=4

spark.driver.resource.gpu.amount=0
spark.executor.resource.gpu.amount=1

spark.rapids.sql.incompatibleOps.enabled=true
spark.rapids.sql.concurrentGpuTasks=2,
spark.rapids.memory.pinnedPool.size=1g

# may prevent OOM kill on Spark executors:
spark.executor.memory=8g
spark.executor.cores=4
spark.executor.memoryFraction=0.25
spark.executor.memoryOverhead=4g

# to prevent GPU to CPU fallback (optional)
spark.sql.readSideCharPadding=false
spark.sql.legacy.charVarcharAsString=true