Starburst Kafka connector#

The Starburst Kafka connector included in Dell Data Analytics Engine, powered by Starburst Enterprise platform (SEP) is an extended version of the Kafka connector with identical configuration and usage. It includes the following features:

  • TLS/SSL encryption (1-way SSL)

  • Additional authentication and access control mechanisms:
    • TLS/SSL authentication (2-way SSL)

    • Basic authentication

    • OAuth 2.0

    • OAuth 2.0 token pass-through

    • SCRAM

    • Kerberos, including access to the schema registry

Requirements#

Configuration#

The connector configuration is identical to the configuration for the base Kafka connector, with the exception of the Cloudera schema registry properties described in this section.

A minimal configuration uses the connector.name kafka, and adds configuration for nodes and table names as shown in the following snippet:

connector.name=kafka
kafka.table-names=table1,table2
kafka.nodes=host1:port,host2:port

Cloudera schema registry#

To use the Cloudera (CDP) schema registry, the following properties must be set:

kafka.table-description-supplier=cloudera
kafka.cloudera-schema-registry-url=http://schema-registry.example.com:8081/api/v1

To configure Kerberos authentication with a CDP schema registry, you must include the following additional properties:

kafka.cloudera-schema-registry.authentication.type=KERBEROS
kafka.cloudera-schema-registry.authentication.client.principal=kafka/broker1.example.com@EXAMPLE.COM
kafka.cloudera-schema-registry.authentication.client.keytab=/etc/kafka/kerberos/broker1.keytab
kafka.cloudera-schema-registry.authentication.config=/etc/krb5.conf
kafka.cloudera-schema-registry.authentication.service-name=kafka

The full list of configuration properties is as follows:

CDP schema registry properties#

Property name

Description

kafka.cloudera-schema-registry-url

URL for Cloudera Schema Registry, e.g. http://example.com:8080/api/v1.

kafka.cloudera-schema-registry.authentication.type

Credential source for schema registry. Valid values are KERBEROS and NONE.

kafka.cloudera-subjects-cache-refresh-interval

The interval at which the topic to subjects cache will be refreshed.

kafka.empty-field-strategy

How to handle struct types with no fields. Valid values are:

  • IGNORE - ignore structs with no fields. This strategy propagates to parents.

  • FAIL - fail the query if a struct with no fields is defined.

  • DUMMY - add a boolean field named dummy, which is null. This may be desired if the struct represents a marker field.

kafka.cloudera-schema-registry.authentication.client.principal

Kerberos client principal name.

kafka.cloudera-schema-registry.authentication.client.keytab

Kerberos client keytab location.

kafka.cloudera-schema-registry.authentication.config

Kerberos service file location, typically /etc/krb5.conf.

kafka.cloudera-schema-registry.authentication.service-name

Kerberos principal name of the Kafka service.

Warning

The connector only supports CDP Schema Registry Kafka messages serialized using protocol ID 3 (VERSION_ID_AS_INT_PROTOCOL). Attempting to use other protocols results in an error.

SQL support#

The connector supports all of the SQL statements listed in the Kafka connector documentation.

Security for schema registry access#

The connector supports table schema and schema registry usage, and includes a number of security-related features, detailed in the following sections.

TLS/SSL authentication#

Typically, your schema registry is secured with TLS/SSL, and therefore accessed securely with the HTTPS protocol. The connector supports the 2-way authentication used by the protocol, if you enable the HTTPS protocol in your catalog properties file:

kafka.confluent-schema-registry.security-protocol=HTTPS

If your TLS certificates on the schema registry and on SEP are signed by a certificate authority, it is recognized as such, and no further configuration is necessary.

If you use a custom certificate, you have to configure the truststore and keystore to use on SEP after adding the relevant certificates to these files. After creating these files, you have to place them on your cluster nodes and configure the relevant properties:

Truststore and keystore properties#

Property name

Description

kafka.confluent-schema-registry.ssl.truststore.location

Location of the truststore file. Absolute path or relative path to etc.

kafka.confluent-schema-registry.ssl.truststore.password

Password to the truststore file.

kafka.confluent-schema-registry.ssl.truststore.type

The file format of truststore key, JKS or PKCS12.

kafka.confluent-schema-registry.ssl.keystore.location

Location of the keystore file. Absolute path or relative path to etc.

kafka.confluent-schema-registry.ssl.keystore.password

Password to the keystore file.

kafka.confluent-schema-registry.ssl.keystore.type

The file format of keystore key. JKS or PKCS12.

kafka.confluent-schema-registry.ssl.key.password

Password of the private key stored in the keystore file.

You can use the secrets support to avoid plain text password values in the catalog file.

Basic authentication#

The schema registry can be configured to require users to authenticate using a username and password via the basic HTTP authentication type. The connector supports the Basic authentication used by the schema registry, if you enable the PASSWORD authentication type and relevant properties in your catalog properties file:

kafka.confluent-schema-registry.authentication.type=PASSWORD
kafka.confluent-schema-registry.authentication.username=examplename
kafka.confluent-schema-registry.authentication.password=examplepassword

Kerberos authentication#

The schema registry can be configured to use the Kerberos service and the SASL GSSAPI authentication type. Add the following configuration to your catalog properties file to use the Kerberos authentication type for the schema registry:

kafka.confluent-schema-registry.authentication.type=KERBEROS
kafka.confluent-schema-registry.authentication.client.principal=kafka/host.your.org@YOUR.ORG
kafka.confluent-schema-registry.authentication.client.keytab=/etc/secrets/kafka_client.keytab
kafka.confluent-schema-registry.authentication.config=/etc/krb5.conf
kafka.confluent-schema-registry.authentication.service-name=kafka

Security#

The connector includes a number of security-related features, detailed in the following sections.

Password credential pass-through#

The connector supports password credential pass-through. To enable it, edit the catalog properties file to include the authentication type:

kafka.authentication.type=PASSWORD_PASS_THROUGH

For more information about configurations and limitations, see Password credential pass-through.

TLS/SSL encryption#

By default, the connector communicates with the Kafka server using the PLAINTEXT protocol, which means sent data is not encrypted. To encrypt the communication between the connector and the server change the kafka-security-protocol configuration property to:

In addition, you can set following optional configuration properties:

Optional SSL encryption configuration properties#

Property name

Description

kafka.ssl.truststore.location

Location of the truststore file.

kafka.ssl.truststore.password

Password to the truststore file.

kafka.endpoint-identification-algorithm

The endpoint identification algorithm used by SEP to validate the server host name. The default value is HTTPS. SEP verifies that the broker host name matches the host name in the broker’s certificate. To disable server host name verification use disabled.

You can see a full example configuration with SSL encryption in the following snippet:

connector.name=kafka
...
kafka.security-protocol=SSL
kafka.ssl.truststore.location=/etc/secrets/kafka.broker.truststore.jks
kafka.ssl.truststore.password=truststore_password

TLS/SSL authentication#

With TLS/SSL authentication, the connector authenticates with the Kafka server/broker, also called 2-way authentication. Add the following configuration to your catalog file to use TLS/SSL:

kafka.security-protocol=SSL

You must set the following required configuration properties:

Required settings#

Property name

Description

kafka.ssl.keystore.location

Location of the keystore file.

kafka.ssl.keystore.password

Password to the keystore file.

kafka.ssl.key.password

Password of the private key stored in the keystore file.

You can see a full example configuration using the SSL authentication in the following snippet:

connector.name=kafka
...
kafka.security-protocol=SSL
kafka.ssl.keystore.location=/etc/secrets/kafka.broker.keystore.jks
kafka.ssl.keystore.password=keystore_password
kafka.ssl.key.password=private_key_password

SASL authentication#

With SASL authentication, the connector authenticates with the Kafka server using one of the supported authentication types in the following table:

Authentication types#

Authentication type name

Corresponding Kafka SASL mechanism

Documentation

PASSWORD

PLAIN

Password authentication

KERBEROS

GSSAPI

Kerberos authentication

OAUTH2

OAUTHBEARER

OAuth 2.0 authentication

DELEGATED-OAUTH2

OAUTHBEARER

OAuth 2.0 token pass-through

SCRAM_SHA_256

SCRAM-SHA-256

SCRAM authentication

SCRAM_SHA_512

SCRAM-SHA-512

SCRAM authentication

SASL authentication can be enabled for both PLAINTEXT and SSL protocols by setting kafka.security-protocol to SASL_PLAINTEXT and SASL_SSL respectively.

Example configuration of the Kerberos authentication over TLS/SSL:

kafka.security-protocol=SASL_SSL
kafka.authentication.type=KERBEROS

Note

If the SASL authentication type is enabled, then the SSL client authentication (2-way authentication) is disabled, but the client still verifies the server certificate (1-way authentication).

Password authentication#

The password authentication is simple username and password authentication using the SASL PLAIN authentication type to authenticate.

Password authentication should only be used with SSL encryption enabled to ensure that the password is not sent without encryption.

Add the following configuration to your catalog properties file to use the password authentication:

kafka.security-protocol=SASL_SSL
kafka.authentication.type=PASSWORD

Set the following required configuration properties:

Required settings#

Property name

Description

kafka.authentication.username

User name for Kafka access.

kafka.authentication.password

Password for the user.

Kerberos authentication#

The Kerberos authentication uses the Kerberos service and the SASL GSSAPI authentication type to authenticate. Add the following configuration to your catalog properties file to use the Kerberos authentication type:

kafka.security-protocol=SASL_SSL
kafka.authentication.type=KERBEROS

Set the following required configuration properties:

Required settings#

Property Name

Description

kafka.authentication.client.principal

Kerberos client principal name.

kafka.authentication.client.keytab

Kerberos client keytab location.

kafka.authentication.config

Kerberos service file location, typically /etc/krb5.conf.

kafka.authentication.service-name

Kerberos principal name of the Kafka service.

Example configuration using the Kerberos authentication:

connector.name=kafka
...
kafka.security-protocol=SASL_SSL
kafka.authentication.type=KERBEROS
kafka.authentication.client.principal=kafka/broker1.your.org@YOUR.ORG
kafka.authentication.client.keytab=/etc/secrets/kafka_client.keytab
kafka.authentication.config=/etc/krb5.conf
kafka.authentication.service-name=kafka

OAuth 2.0 authentication#

The OAuth 2.0 authentication uses an access token obtained from an OAuth 2.0-compliant authorization server and SASL OAUTHBEARER authentication type to authenticate the Kafka connector. Only the client credentials flow is currently supported.

Add the following configuration to your catalog properties file to use the OAuth 2.0 authentication:

kafka.security-protocol=SASL_SSL
kafka.authentication.type=OAUTH2

Set the following required configuration properties:

Required settings#

Property name

Description

kafka.authentication.oauth2.token-url

The token URL of an OAuth 2.0-compliant authorization server.

kafka.authentication.oauth2.client-id

ID of the Kafka connector OAuth2 client.

kafka.authentication.oauth2.client-secret

Secret for the client.

If the authorization server is using SSL with a self-signed certificate, set the additional properties to use a custom truststore while validating the certificate:

Additional settings#

Property name

Description

kafka.authentication.oauth2.ssl.truststore.path

Location of the SSL truststore file used to verify the OAUTH2 authorization server certificate.

kafka.authentication.oauth2.ssl.truststore.password

Password to the truststore file.

kafka.authentication.oauth2.ssl.truststore.type

Type of the truststore file, supported values are: JKS and PKCS12.

OAuth 2.0 token pass-through#

The Kafka connector supports OAuth 2.0 token pass-through.

Configure this option the same as OAuth 2.0 authentication, except for the additional settings described in this section.

Set the authentication type in the coordinator’s config properties file:

http-server.authentication.type=DELEGATED-OAUTH2

Additionally enable OAUTH2_PASSTHROUGH in the catalog properties file using the Kafka connector:

kafka.authentication.type=OAUTH2_PASSTHROUGH

In addition, the SASL mechanism must be enabled with kafka.security-protocol=SASL_SSL or kafka.security-protocol=SASL_PLAINTEXT as described in the previous section.

SCRAM authentication#

Salted Challenge Response Authentication Mechanism (SCRAM), or SASL/SCRAM, is a family of SASL mechanisms that addresses the security concerns with traditional mechanisms that perform username/password authentication like PLAIN. Kafka supports SCRAM-SHA-256 and SCRAM-SHA-512. All examples below use SCRAM-SHA-256, but you can substitute the configuration for SCRAM-SHA-512 as needed.

Add the following configuration to your catalog properties file to use the SCRAM authentication:

kafka.security-protocol=SASL_SSL
kafka.authentication.type=SCRAM_SHA_256

Set the following required configuration properties:

Required settings#

Property name

Description

kafka.authentication.username

The user name.

kafka.authentication.password

The password.

Performance#

The connector includes a number of performance improvements, detailed in the following sections.

Starburst Cached Views#

The connector supports table scan redirection to improve performance and reduce load on the data source.