Starburst Atlas plugin#

The Starburst Atlas plugin for Dell Data Analytics Engine, powered by Starburst Enterprise platform (SEP) allows changes in a SEP cluster’s catalog, schema, table, or column configuration to be automatically pushed to an Apache Atlas server by means of an Apache Kafka message bus. The plugin works by publishing change information about SEP objects to the Kafka topic ATLAS_HOOK. Any change in the configuration of Atlas entity names for SEP objects is published to the topic ATLAS_ENTITIES.

Note

The plugin requires a valid Starburst Enterprise license.

Configuration#

The Starburst Atlas plugin is implemented as an event listener. To enable the event listener, create a configuration file on the coordinator with any name, such as etc/atlas-listener.properties.

In this file, the event-listener.name property must be set to starburst-atlas. The configuration includes the Kafka broker details and Atlas service URL, as well as username and password for accessing the Atlas server.

If your SEP cluster has more than one event listener, identify all listener configuration files in a comma-separated list in the event-listener.config-files property of the coordinator’s config.properties file. For example:

event-listener.config-files=etc/atlas-listener.properties,etc/http-event-listener.properties

The following is an example of a simple Atlas plugin configuration file:

event-listener.name=starburst-atlas
atlas.cluster.name=fastqueries
atlas.kafka.bootstrap.servers=kafka.example.com:9092
atlas.server.url=https://atlas.example.com:21000
atlas.username=admin
atlas.password=s3cr3t1v3

Atlas plugin configuration properties shows the options for different circumstances.

TLS/HTTPS settings#

All network traffic between the Atlas plugin and the Atlas server uses TLS. If your Atlas server uses a globally trusted certificate and does not require client certificates, then to connect you only need to specify the server’s https:// URL with atlas.username and atlas.password.

If your Atlas server uses a site-specific certificate, or requires client certificates, then configure those settings in an XML settings file. Identify the location of this file with the following property in the coordinator’s config.properties file:

atlas.ssl-config-file=etc/atlas-tls-settings.xml

The following shows the template for the TLS settings XML file. As is standard for TLS, if you provide a globally trusted certificate in the keystore setting, there is no need to provide a truststore path because the global certificate relies on the Certificate Authorities listed in the standard Java cacerts file.

If a Hadoop credential file is required by your Atlas server, specify the path to a JCEKS keystore. This keystore is like a JKS file, but secured with stronger DES encryption.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>hadoop.ssl.require.client.cert</name>
        <value>true|false</value>
    </property>

    <property>
        <name>ssl.client.keystore.location</name>
        <value>Path to KeyStore location</value>
    </property>

    <property>
        <name>ssl.client.truststore.location</name>
        <value>Path to TrustStore location</value>
    </property>

    <property>
        <name>hadoop.security.credential.provider.path</name>
        <value>jceks://file/Path to Credential File</value>
    </property>
</configuration>

Kerberos settings#

If your Atlas server uses Kerberos authentication, specify the following required configuration properties:

atlas.authentication-type=KERBEROS
atlas.kerberos.principal=admin/atlas.cluster@EXAMPLE.COM
atlas.kerberos.keytab=/etc/krb5.keytab
atlas.kerberos.config=/etc/krb5.conf

Pass configuration to Kafka#

Additional properties related to Kafka security as described in the Kafka documentation can be passed in a properties file. Specify the path to this file with the atlas.config.resource property. For example:

atlas.config.resource=etc/kafka.properties

For example, there are two Kafka properties that can be used to change the default names of the Kafka topics used by this Atlas plugin, as shown in the following table:

Kafka properties for topic names#

Property

Description

atlas.notification.hook.topic.name

Name of the Kafka topic to which the Starburst Atlas plugin publishes SEP change information. Default is ATLAS_HOOK.

atlas.notification.entities.topic.name

Name of the Kafka topic to which the Starburst Atlas plugin publishes any changes in Atlas entity names for SEP objects. Default is ATLAS_ENTITIES.

Pre-built Atlas hooks#

Certain data systems have an integrated Atlas hook at the metastore level. These include the following systems that also have a SEP connector: Hive and Kafka. For such catalogs, SEP only needs to push change lineage details instead of pushing each change. The mapping for these cases is provided by the following configuration property in the coordinator’s config.properties:

atlas.catalog-cluster-mapping=catalogName,AtlasNamespace

The catalogName refers to one catalog on your SEP cluster.

The AtlasNamespace refers to a unique qualified name created by Atlas to categorize entities and types from the same source.

Reference#

The Starburst Atlas plugin configuration uses the following properties:

Properties for Atlas plugin configuration#

Property

Description

Required

event-listener.name

Must be starburst-atlas.

yes

atlas.cluster.name

Arbitrary name for this SEP cluster.

yes

atlas.kafka.bootstrap.servers

Network name or IP address and port of the Kafka server.

yes

atlas.server.url

URL with port of the Atlas server.

yes

atlas.username

Atlas username.

yes

atlas.password

Atlas password, if required.

no

atlas.ssl-config-file

Path to an optional XML configuration file with custom TLS settings.

no

atlas.authentication-type

Set to KERBEROS to allow the Atlas plugin to connect to a Kerberos-enabled Atlas server.

no

atlas.kerberos.principal

Principal name on the Atlas server in standard Kerberos format.

no

atlas.kerberos.keytab

Path to a Kerberos key table file.

no

atlas.kerberos.config

Path to a Kerberos config file; typically /etc/krb5.conf.

no

atlas.config.resource

Path to a properties file to be passed to the associated Kafka server.

no

atlas.catalog-cluster-mapping

Comma-separated SEP catalog name and Atlas namespace with Atlas hook.

no

atlas.exclude-client-tags

Comma-separated list of clientTags values. Atlas events are not generated for queries by connecting clients that have any of these clientTags.

no

The Kafka network name and Atlas URL must be valid and accessible from the SEP coordinator.

Note that the Atlas server does not successfully receive SEP events until SEP types are uploaded to Atlas using the Atlas CLI. Be sure to complete the Atlas setup steps.