Starburst HBase connector#
The Starburst HBase connector allows querying an external Apache HBase database.
Note
The HBase connector is a limited preview. Contact Starburst support with questions or feedback.
Requirements#
To connect to HBase, you need:
Apache HBase version 2.3.x
Network access from the coordinator and workers to the HBase and Zookeeper servers. The default port is 2181.
A valid Starburst Enterprise license
Configuration#
Create a catalog properties file in etc/catalog
named, for example,
example.properties
to access the configured HBase database in the
example
catalog. Configure the connector by specifying hbase
as the
connector name, and replace the connection properties as appropriate for your
setup.
connector.name=hbase
hbase.zookeeper.quorum=host1,host2
hbase.zookeeper.port=2181
hbase.zookeeper.znode=/hbase
hbase.table-description-dir=/path/to/table-description-dir
Configuration properties#
The following configuration properties are available for the connector, to be used in catalog properties files:
Property name |
Description |
Default value |
---|---|---|
|
Comma-separated list of ZooKeeper server hostnames. |
|
|
Server port for the ZooKeeper service. |
2181 |
|
The root znode containing all znodes created and/or used by HBase. |
|
|
Directory path that holds JSON description files for HBase tables. See Table descriptions for more information. |
|
|
Comma-separated paths to HBase configuration files for HBase clients. In the event of conflicting configuration options, the SEP connector catalog configuration overrides these files. |
|
|
HBase authentication mechanism type, either |
|
|
When set to true, enables session user impersonation. |
false |
|
When set to true, enables projection pushdown. See Projection pushdown. |
false |
|
When set to true, enables |
false |
|
When set to true, enables |
false |
|
When set to true, enables |
false |
|
Enable connection pooling. When set to |
true |
|
If connection pooling is enabled, determines how long in seconds to keep a connection alive when not in active use. |
30 |
Table descriptions#
Table descriptions are required to query HBase tables from SEP. A table
description directory on all SEP nodes contains table description .json
files that describe HBase table names, columns, and data types. The connector
reads these files at runtime to create a 1:1 mapping between SEP and HBase
tables. Any HBase tables without a corresponding table description file in this
directory cannot be queried from SEP.
The table description directory must contain a subdirectory for each HBase namespace or SEP schema name, which then contain the corresponding table description files. The following shows an example directory and file structure:
table-description-dir/
namespace1/
table1.json
table2.json
namespace2/
table1.json
table2.json
Warning
Schema, table, and column names are case-insensitive. If the HBase database has multiple tables with the same name in different cases, they cannot be mapped to SEP tables and therefore cause query failures.
Table description files use the following format:
Property name |
Description |
Required |
---|---|---|
|
List of the table’s columns along with their types and column families. |
Yes |
|
List of column names to be used as the HBase row key. This property is
required but can be set to an empty list object, or |
Yes |
|
Number of salt buckets for this table, between 1 to 256. |
No |
The following block shows an example JSON table description file:
{
"columns" : [ {
"name" : "varchar_column",
"type" : "varchar",
"family" : "column_family",
"comment" : "comment"
}, {
"name" : "int_column",
"type" : "integer",
"family" : "column_familia",
"comment" : "comment"
} ],
"keyColumnNames" : [ "int_column" ],
"saltBuckets" : 5,
"comment" : "table_comment"
}
Create a namespace#
In SEP, HBase
namespaces are
represented by schemas. To create a namespace in HBase, use a CREATE SCHEMA
SQL statement in SEP.
CREATE SCHEMA test_namespace;
Note
CREATE SCHEMA
statements are disabled by default, but can be enabled
with the hbase.allow-create
configuration property. See
CREATE, DROP, and INSERT support for more information.
Create a table#
In SEP, running a CREATE TABLE
statement creates a table in HBase using
an existing table’s schema as the new table’s namespace.
All columns within these created tables are part of the default cf
column
family. Column families cannot be modified in an SEP session, but can be
changed in the table’s description .json
file.
Note
CREATE TABLE
statements are disabled by default, but can be enabled
with the hbase.allow-create
configuration property. See
CREATE, DROP, and INSERT support for more information.
CREATE TABLE
statements support the same properties as the table description files.
CREATE TABLE hbase.default.test (
id1 BIGINT,
id2 BIGINT,
value VARCHAR
)
WITH (
key_column_names = ARRAY['id1', 'id2'],
salt_buckets = 10
);
Type mapping#
The connector supports the following SQL types:
BOOLEAN
TINYINT
SMALLINT
BIGINT
REAL
DOUBLE
VARCHAR
VARCHAR(N)
CHAR(N)
VARBINARY
DATE
TIMESTAMP
TIMESTAMP(N)
All other SQL types are not supported.
SQL support#
The connector provides read and write access to data and metadata in HBase. In addition to the globally available and read operation statements, the connector supports the following features:
CREATE, DROP, and INSERT support#
The following features are supported by the connector but disabled by default:
These operations are non-transactional and can therefore result in data loss if interrupted. To enable these SQL statements, set the corresponding configuration in the catalog properties file:
//Enable CREATE TABLE, CREATE TABLE AS, and CREATE SCHEMA
hbase.allow-create=true
//Enable DROP TABLE and DROP SCHEMA
hbase.allow-drop=true
//Enable INSERT
hbase.allow-insert=true
INSERT#
INSERT
operations with the same values for key columns overwrite rows in
HBase.
ALTER TABLE#
The connector does not support ALTER TABLE RENAME
or ALTER TABLE SET AUTHORIZATION
.
Performance#
The connector includes a number of performance improvements, detailed in the following sections.
Parallelism#
By default, the connector attempts to create multiple splits off available table regions for reading data. Each split is mapped to a single region and processed in parallel. Therefore, it is best practice to split HBase tables over multiple regions for better performance.
Table salting#
Sequential write operations can perform poorly across region servers when the row keys increase monotonically. SEP supports salting tables as a way to mitigate this problem.
If a table is not created through SEP and doesn’t follow the one byte salt key prefix, the connector won’t be able to distinguish the salt prefix from the actual key. In this case, resultant data includes the salt prefix as part of the row key column.
Projection pushdown#
The connector can improve performance of table scans by performing pushdown of column projections to HBase. This reduces the amount of data transferred between SEP and the region servers.
If projection pushdown is enabled, rows without any of the projected columns are skipped. This is because HBase is a non-relational database with an arbitrary number of columns per row, so it can only return rows where at least one of the projected columns exist.
Projection pushdown is disabled by default, but can be enabled by setting the
hbase.projection-pushdown.enabled
property in the catalog properties file
to true
:
hbase.projection-pushdown.enabled=true
Security#
The connector includes a number of security-related features, detailed in the following sections.
User impersonation#
The connector supports user impersonation.
Enable user impersonation by setting the hbase.impersonation.enabled
property in the catalog properties file to true
:
hbase.impersonation.enabled=true
Kerberos authentication#
The connector supports Kerberos authentication. Use the following properties in the catalog properties file to configure it.
hbase.authentication.type=KERBEROS
hbase.master.kerberos.principal=hbase/_HOST@example.com
hbase.regionserver.kerberos.principal=hbase/_HOST@example.com
hbase.client.kerberos.principal=example@example.com
hbase.client.kerberos.keytab=etc/kerberos/example.keytab
With this configuration the user example@example.com
is used to connect to
the database, and the related Kerberos service ticket is stored in the
example.keytab
file.