Overview#
Dell Data Analytics Engine, powered by Starburst Enterprise platform (SEP) includes support for Dell Data Processing Engine, built for Apache Spark, the open-source distributed processing engine designed for large-scale data analytics.
About Dell Data Processing Engine#
User personas#
Dell Data Processing Engine is used primarily by platform administrators who are responsible for configuring and managing the Dell Data Processing Engine environment, and data engineers who are responsible for developing and deploying data pipelines using Spark clusters. View the following documentation for each persona to learn more:
For platform administrators:
For data engineers:
Command line interface#
Dell Data Processing Engine includes a command line interface that can be used to interface with the Dell Data Processing Engine API. View the command line interface documentation.
User interface#
A user interface is available for many components of the Dell Data Processing Engine:
Release notes#
The release notes page lists new features, upgraded functionality, bug fixes, and breaking changes.
Software version information#
The following software versions are included:
Component | Version |
---|---|
Spark | 3.5.3 |
Hive Metastore | 3.1.3 |
Hudi | 1.0.1 |
Iceberg | 1.8.0 |
Delta Lake | 3.3.0 |
Parquet | 1.13.1 |
Expand the following lists to view included Python and Java libraries:
Python libraries
Library | Version |
---|---|
absl-py | 2.1.0 |
aiobotocore | 2.17.0 |
aiohappyeyeballs | 2.4.4 |
aiohttp | 3.11.11 |
aioitertools | 0.12.0 |
aiosignal | 1.3.2 |
astunparse | 1.6.3 |
async-timeout | 5.0.1 |
attrs | 24.3.0 |
blis | 1.2.0 |
boto3 | 1.35.93 |
botocore | 1.35.93 |
certifi | 2024.12.14 |
charset-normalizer | 3.4.1 |
cramjam | 2.9.1 |
et_xmlfile | 2.0.0 |
fastparquet | 2024.11.0 |
flatbuffers | 24.12.23 |
frozenlist | 1.5.0 |
fsspec | 2024.12.0 |
gast | 0.6.0 |
google-pasta | 0.2.0 |
greenlet | 3.1.1 |
grpcio | 1.69.0 |
h5py | 3.12.1 |
idna | 3.10 |
jmespath | 1.0.1 |
joblib | 1.4.2 |
keras | 3.8.0 |
libclang | 18.1.1 |
Markdown | 3.7 |
markdown-it-py | 3.0.0 |
MarkupSafe | 3.0.2 |
mdurl | 0.1.2 |
ml-dtypes | 0.4.1 |
multidict | 6.1.0 |
namex | 0.0.8 |
numpy | 2.0.2 |
openpyxl | 3.1.5 |
opt_einsum | 3.4.0 |
optree | 0.13.1 |
packaging | 24.2 |
pandas | 2.2.3 |
patsy | 1.0.1 |
pip | 24.3.1 |
propcache | 0.2.1 |
protobuf | 5.29.3 |
pyarrow | 18.1.0 |
Pygments | 2.19.1 |
python-dateutil | 2.9.0.post0 |
pytz | 2024.2 |
PyYAML | 6.0.2 |
requests | 2.32.3 |
rich | 13.9.4 |
s3fs | 2024.12.0 |
s3transfer | 0.10.4 |
scipy | 1.15.1 |
setuptools | 59.6.0 |
six | 1.17.0 |
SQLAlchemy | 2.0.37 |
statsmodels | 0.14.4 |
spacy | v3.8.3 |
tensorboard | 2.18.0 |
tensorboard-data-server | 0.7.2 |
tensorflow | 2.18.0 |
tensorflow-io-gcs-filesystem | 0.37.1 |
termcolor | 2.5.0 |
typing_extensions | 4.12.2 |
tzdata | 2024.2 |
urllib3 | 2.3.0 |
Werkzeug | 3.1.3 |
wheel | 0.37.1 |
wrapt | 1.17.2 |
xlrd | 2.0.1 |
yarl | 1.18.3 |
Java libraries
Library | Artifact |
---|---|
Hudi | org.apache.hudi:hudi-spark3.3.x_2_12:1.0.0 |
Iceberg | org.apache.iceberg:iceberg-spark-runtime:0.13.2 |
Nessie | org.projectnessie:nessie-spark-extensions:0.45.0 |
Delta | io.delta:delta-spark_2_12:3.3.0 |
The Java libraries are stored in the opt/spark/lib
directory and must be added
to your jobs as needed.