Atlas integration#
Dell Data Analytics Engine, powered by Starburst Enterprise platform (SEP) features integration with Apache Atlas, a framework for the governance of data and metadata assets. This allows you to include changes to SEP catalogs, schemas, tables, columns, and queries as part of an overall enterprise data governance plan.
Introduction#
The Atlas support in SEP is implemented as an event listener that detects changes to SEP objects and sends
notice of those changes to an Atlas server by means of a Kafka message bus. Starburst also provides the atlas-cli
command,
which allows you to manage the relationship between your SEP
cluster and Atlas.
Most home-grown and commercial data governance systems can import from and export to Apache Atlas. This means that enterprises using a non-Atlas data governance system can still take advantage of SEP’s Atlas support by using it as a bridge to their system.
Setup steps#
To integrate Atlas with your SEP cluster, follow the sections in numbered order.
Setup summary#
Set up Atlas support for a SEP cluster with the following steps:
The requirements must be in place before you begin.
Configure an Atlas plugin on your coordinator.
Register Atlas types for SEP objects with the
atlas-cli
command.Register your SEP cluster on Atlas with
atlas-cli
.Load catalogs and their components onto Atlas with
atlas-cli
.Restart your cluster and verify Atlas connectivity.
1. Requirements#
SEP’s support for Apache Atlas requires:
SEP cluster version 356 or later, configured and running.
Apache Atlas 2.1.0 or later, configured and running.
Apache Kafka, configured to consume and emit Atlas messages.
You must be able to contact the Atlas and Kafka servers at their specified ports from the SEP coordinator.
Atlas CLI downloaded from Starburst Support then installed and configured.
A valid Starburst Enterprise license for the Starburst Atlas plugin.
2. Configure Atlas plugin#
Follow the guidance for the Starburst Atlas plugin to create a configuration file that defines the properties of your cluster’s connection to Atlas and Kafka.
After preparing this configuration, do not restart your cluster yet! Wait for step 6 before you restart.
3. Register SEP types#
The atlas-cli
command keeps an internal registry of eight Atlas-format
types that describe SEP objects. Run the following command to upload
these SEP-specific definitions to Atlas.
atlas-cli types create --server https://atlas.example.com:21000 --user=admin --password
See the Atlas CLI reference for this command.
4. Register SEP cluster#
One of the properties you configure for your cluster in step 2 is atlas.cluster.name
, where you assign an
arbitrary name for your SEP cluster. Use a command like the following to
register this cluster name with Atlas.
atlas-cli cluster register --server https://atlas.example.com:21000 \
--cluster-name fastqueries --user admin --password
The value of the cluster-name
parameter here must match the
atlas.cluster.name
property already configured.
See the Atlas CLI reference for this command.
5. Load catalogs on Atlas#
You must tell Atlas what SEP catalogs and/or schemas and tables you want tracked. This step loads the object names to be tracked. Thereafter, if there are any changes in these objects, the Starburst Atlas plugin running on your SEP cluster detects those changes and notifies Atlas.
“Change” here refers to a change in structure, such as a new column added to a table, or a table deleted from a schema. SEP does not store data, so it is not the job of the Atlas plugin to track changes in table data.
For each catalog on your SEP cluster whose objects you want to track in Atlas,
use an atlas-cli
command with catalog register
command. For example:
atlas-cli catalog register --server https://atlas.example.com:21000 \
--cluster-name fastqueries --user admin --password \
--starburst-jdbc-url "jdbc:trino://cluster.example.com:8080?user=starburst_service" \
--catalog tpch --schema tiny --table nation
See the Atlas CLI reference for further options.
6. Restart cluster and test#
When all SEP cluster objects are registered in Atlas, restart your cluster.
Test the Atlas integration by browsing with the Atlas web interface. Create a new table and register that table with Atlas. Then add a column to that table and make sure the change is reflected in Atlas.
Limitations#
SEP’s support for Apache Atlas has the following limitations:
Once a cluster or catalog is registered on an Atlas server, it cannot be unregistered.
There is no attempt to de-duplicate tables. For example, on a cluster connected to other SEP clusters by means of the Starburst Stargate connector, it is possible for the same table’s structure metadata to be loaded twice, from a local catalog and from a remote catalog.