Overview#

Dell Data Analytics Engine, powered by Starburst Enterprise platform (SEP) includes support for Dell Data Processing Engine, built for Apache Spark, the open-source distributed processing engine designed for large-scale data analytics.

About Dell Data Processing Engine#

User personas#

Dell Data Processing Engine is used primarily by platform administrators who are responsible for configuring and managing the Dell Data Processing Engine environment, and data engineers who are responsible for developing and deploying data pipelines using Spark clusters. View the following documentation for each persona to learn more:

For platform administrators:

For data engineers:

Command line interface#

Dell Data Processing Engine includes a command line interface that can be used to interface with the Dell Data Processing Engine API. View the command line interface documentation.

Release notes#

The release notes page lists new features, upgraded functionality, bug fixes, and breaking changes.

Included libraries#

The following libraries are included:

Python:

absl-py==2.1.0
aiobotocore==2.17.0
aiohappyeyeballs==2.4.4
aiohttp==3.11.11
aioitertools==0.12.0
aiosignal==1.3.2
astunparse==1.6.3
async-timeout==5.0.1
attrs==24.3.0
blis==1.2.0
boto3==1.35.93
botocore==1.35.93
certifi==2024.12.14
charset-normalizer==3.4.1
cramjam==2.9.1
et_xmlfile==2.0.0
fastparquet==2024.11.0
flatbuffers==24.12.23
frozenlist==1.5.0
fsspec==2024.12.0
gast==0.6.0
google-pasta==0.2.0
greenlet==3.1.1
grpcio==1.69.0
h5py==3.12.1
idna==3.10
jmespath==1.0.1
joblib==1.4.2
keras==3.8.0
libclang==18.1.1
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==3.0.2
mdurl==0.1.2
ml-dtypes==0.4.1
multidict==6.1.0
namex==0.0.8
numpy==2.0.2
openpyxl==3.1.5
opt_einsum==3.4.0
optree==0.13.1
packaging==24.2
pandas==2.2.3
patsy==1.0.1
pip==24.3.1
propcache==0.2.1
protobuf==5.29.3
pyarrow==18.1.0
Pygments==2.19.1
python-dateutil==2.9.0.post0
pytz==2024.2
PyYAML==6.0.2
requests==2.32.3
rich==13.9.4
s3fs==2024.12.0
s3transfer==0.10.4
scipy==1.15.1
setuptools==59.6.0
six==1.17.0
SQLAlchemy==2.0.37
statsmodels==0.14.4
spacy==v3.8.3
tensorboard==2.18.0
tensorboard-data-server==0.7.2
tensorflow==2.18.0
tensorflow-io-gcs-filesystem==0.37.1
termcolor==2.5.0
typing_extensions==4.12.2
tzdata==2024.2
urllib3==2.3.0
Werkzeug==3.1.3
wheel==0.37.1
wrapt==1.17.2
xlrd==2.0.1
yarl==1.18.3

Java:

Hudi: org.apache.hudi:hudi-spark3.3.x_2_12:1.0.0
Iceberg: org.apache.iceberg:iceberg-spark-runtime:0.13.2
Nessie org.projectnessie:nessie-spark-extensions:0.45.0
Delta: io.delta:delta-spark_2_12:3.3.0

The Java libraries are stored in the /opt/spark/lib directory and must be added to your jobs as needed.