Starburst HBase connector#

The Starburst HBase connector allows querying an external Apache HBase database.

Note

The HBase connector is a limited preview. Contact Starburst support with questions or feedback.

Requirements#

To connect to HBase, you need:

  • Apache HBase version 2.3.x

  • Network access from the coordinator and workers to the HBase and Zookeeper servers. The default port is 2181.

Configuration#

Create a catalog properties file in etc/catalog named, for example, example.properties to access the configured HBase database in the example catalog. Configure the connector by specifying hbase as the connector name, and replace the connection properties as appropriate for your setup.

connector.name=hbase
hbase.zookeeper.quorum=host1,host2
hbase.zookeeper.port=2181
hbase.zookeeper.znode=/hbase
hbase.table-description-dir=/path/to/table-description-dir

Configuration properties#

The following configuration properties are available for the connector, to be used in catalog properties files:

HBase connector configuration properties#

Property name

Description

Default value

hbase.zookeeper.quorum

Comma-separated list of ZooKeeper server hostnames.

hbase.zookeeper.port

Server port for the ZooKeeper service.

2181

hbase.zookeeper.znode

The root znode containing all znodes created and/or used by HBase.

hbase.table-description-dir

Directory path that holds JSON description files for HBase tables. See Table descriptions for more information.

hbase.config.resources

Comma-separated paths to HBase configuration files for HBase clients. In the event of conflicting configuration options, the SEP connector catalog configuration overrides these files.

hbase.authentication.type

HBase authentication mechanism type, either UNSECURED or KERBEROS.

UNSECURED

hbase.impersonation.enabled

When set to true, enables session user impersonation.

false

hbase.projection-pushdown.enabled

When set to true, enables projection pushdown. See Projection pushdown.

false

hbase.allow-create

When set to true, enables CREATE TABLE, CREATE TABLE AS, and CREATE SCHEMA SQL statements. See CREATE, DROP, and INSERT support.

false

hbase.allow-drop

When set to true, enables DROP TABLE and DROP SCHEMA SQL statements. See CREATE, DROP, and INSERT support.

false

hbase.allow-insert

When set to true, enables INSERT SQL statements. See CREATE, DROP, and INSERT support.

false

hbase.connection-pool.enabled

Enable connection pooling. When set to true, the connector attempts to re-use connections to HBase instead of establishing a new connection.

true

hbase.connection-pool.ttl

If connection pooling is enabled, determines how long in seconds to keep a connection alive when not in active use.

30

Table descriptions#

Table descriptions are required to query HBase tables from SEP. A table description directory on all SEP nodes contains table description .json files that describe HBase table names, columns, and data types. The connector reads these files at runtime to create a 1:1 mapping between SEP and HBase tables. Any HBase tables without a corresponding table description file in this directory cannot be queried from SEP.

The table description directory must contain a subdirectory for each HBase namespace or SEP schema name, which then contain the corresponding table description files. The following shows an example directory and file structure:

table-description-dir/
  namespace1/
    table1.json
    table2.json
  namespace2/
    table1.json
    table2.json

Warning

Schema, table, and column names are case-insensitive. However, we recommend that you use lowercase identifiers in table description .json files and file names. If the HBase database has multiple tables with the same name in different cases, they cannot be mapped to SEP tables and therefore cause query failures.

Table description files use the following format:

HBase table description file format#

Property name

Description

Required

columns

List of the table’s columns along with their types and column families.

Yes

keyColumnNames

List of column names to be used as the HBase row key. This property is required but can be set to an empty list object, or [ ]. In this case a random UUID is used to automatically populate the HBase row key for each row during insertion. Does not support multiple columns with variable-length data types in case the key is composite. Instead, use multiple fixed-length columns or a single variable-length column.

Yes

saltBuckets

Number of salt buckets for this table, between 1 to 256.

No

The following block shows an example JSON table description file:

{
  "columns" : [ {
    "name" : "varchar_column",
    "type" : "varchar",
    "family" : "column_family",
    "comment" : "comment"
  }, {
    "name" : "int_column",
    "type" : "integer",
    "family" : "column_familia",
    "comment" : "comment"
  } ],
  "keyColumnNames" : [ "int_column" ],
  "saltBuckets" : 5,
  "comment" : "table_comment"
}

Create a namespace#

In SEP, HBase namespaces are represented by schemas. To create a namespace in HBase, use a CREATE SCHEMA SQL statement in SEP.

CREATE SCHEMA test_namespace;

Note

CREATE SCHEMA statements are disabled by default, but can be enabled with the hbase.allow-create configuration property. See CREATE, DROP, and INSERT support for more information.

Create a table#

In SEP, running a CREATE TABLE statement creates a table in HBase using an existing table’s schema as the new table’s namespace.

All columns within these created tables are part of the default cf column family. Column families cannot be modified in an SEP session, but can be changed in the table’s description .json file.

Note

CREATE TABLE statements are disabled by default, but can be enabled with the hbase.allow-create configuration property. See CREATE, DROP, and INSERT support for more information.

CREATE TABLE statements support the same properties as the table description files.

CREATE TABLE hbase.default.test (
  id1 BIGINT,
  id2 BIGINT,
  value VARCHAR
)
WITH (
  key_column_names = ARRAY['id1', 'id2'],
  salt_buckets = 10
);

Type mapping#

The connector supports the following SQL types:

  • BOOLEAN

  • TINYINT

  • SMALLINT

  • BIGINT

  • REAL

  • DOUBLE

  • VARCHAR

  • VARCHAR(N)

  • CHAR(N)

  • VARBINARY

  • DATE

  • TIMESTAMP

  • TIMESTAMP(N)

All other SQL types are not supported.

SQL support#

The connector provides read and write access to data and metadata in HBase. In addition to the globally available and read operation statements, the connector supports the following features:

CREATE, DROP, and INSERT support#

The following features are supported by the connector but disabled by default:

These operations are non-transactional and can therefore result in data loss if interrupted. To enable these SQL statements, set the corresponding configuration in the catalog properties file:

//Enable CREATE TABLE, CREATE TABLE AS, and CREATE SCHEMA
hbase.allow-create=true

//Enable DROP TABLE and DROP SCHEMA
hbase.allow-drop=true

//Enable INSERT
hbase.allow-insert=true

INSERT#

INSERT operations with the same values for key columns overwrite rows in HBase.

ALTER TABLE#

The connector does not support ALTER TABLE RENAME or ALTER TABLE SET AUTHORIZATION.

Performance#

The connector includes a number of performance improvements, detailed in the following sections.

Parallelism#

By default, the connector attempts to create multiple splits off available table regions for reading data. Each split is mapped to a single region and processed in parallel. Therefore, it is best practice to split HBase tables over multiple regions for better performance.

Table salting#

Sequential write operations can perform poorly across region servers when the row keys increase monotonically. SEP supports salting tables as a way to mitigate this problem.

If a table is not created through SEP and doesn’t follow the one byte salt key prefix, the connector won’t be able to distinguish the salt prefix from the actual key. In this case, resultant data includes the salt prefix as part of the row key column.

Projection pushdown#

The connector can improve performance of table scans by performing pushdown of column projections to HBase. This reduces the amount of data transferred between SEP and the region servers.

If projection pushdown is enabled, rows without any of the projected columns are skipped. This is because HBase is a non-relational database with an arbitrary number of columns per row, so it can only return rows where at least one of the projected columns exist.

Projection pushdown is disabled by default, but can be enabled by setting the hbase.projection-pushdown.enabled property in the catalog properties file to true:

hbase.projection-pushdown.enabled=true

Security#

The connector includes a number of security-related features, detailed in the following sections.

User impersonation#

The connector supports user impersonation. Enable user impersonation by setting the hbase.impersonation.enabled property in the catalog properties file to true:

hbase.impersonation.enabled=true

Kerberos authentication#

The connector supports Kerberos authentication. Use the following properties in the catalog properties file to configure it.

hbase.authentication.type=KERBEROS
hbase.master.kerberos.principal=hbase/_HOST@example.com
hbase.regionserver.kerberos.principal=hbase/_HOST@example.com
hbase.client.kerberos.principal=example@example.com
hbase.client.kerberos.keytab=etc/kerberos/example.keytab

With this configuration the user example@example.com is used to connect to the database, and the related Kerberos service ticket is stored in the example.keytab file.