Define and configure catalogs#
In Dell Data Analytics Engine, powered by Starburst Enterprise platform (SEP), users query data sources exposed as catalogs.
This topic covers defining and configuring catalogs.
Connectors#
A connector is specific to the data source it supports. It transforms the underlying data into the SEP concepts of schemas, tables, columns, rows, and data types.
Connectors provide the following between a data source and SEP:
Secure communications link
Translation of data types
Handling of variances in the SQL implementation, adaption to a provided API, or translation of data in raw files
SEP comes with supported enterprise connectors that allow you to configure catalogs that provide access to all your data sources. SEP’s architecture fully abstracts the data sources it can connect to; compute and storage are separated. You can scale your query engine and performance separately from your data storage.
Exposing all this data in one place creates an accessible data mesh, ready for your analysis.
Connector setup#
To connect to a data source, read the documentation for the specific connector which includes a sample configuration to create a catalog as well as additional configuration options for the connector.
If you’re using a connector that requires additional set up, such as the addition of a proprietary JDBC driver, you find that documented with the specific connector.
Developing custom connectors#
Dell Data Analytics Engine, powered by Starburst Enterprise platform (SEP) comes with an array of built-in connectors for a variety of cloud-based and on-prem data sources.
That separation means that you can use the SEP connector service provider interface (SPI) to build plugins for file systems and object stores, NoSQL stores, relational database systems, and custom services without an off-the-shelf connector. As long as you can map data into relational concepts such as tables, columns, and rows, it is possible to create your own SEP connector.
To learn more, read our latest developer documentation.
Catalogs#
Your users do not need to worry about connectors or technical details for data sources unless they want to use catalog session properties, discussed later in this topic. They only need to think about what catalogs are defined in the SEP cluster.
Users can browse available catalogs in the query editor, in the SEP web UI, or if connected to SEP through another client, with the SHOW CATALOGS statement.
Catalog configuration properties#
Each connector has a small set of properties required when defining a catalog. While these properties can vary, they minimally define the connection to the data source. Depending on your connector and data source, there are a number of different configuration properties available.
Optional properties enable further configuration of the catalog in areas such as security, performance, and query behavior. These connector-specific properties are defined in the documentation for each connector. For more information on connector-specific configuration properties, start with the list of all connectors.
Define a catalog#
Where a catalog is defined depends upon your deployment method:
In Starburst Admin deployments - Create the catalog properties file in
etc/catalog/
, for exampleetc/catalog/salesdatabase.properties
. The first part of the file name, in this examplesalesdatabase
, becomes the name used to access the data in a query.In the SEP Helm chart - Create a child node under the catalogs: top level node, for example
salesdatabase:
. The child node name, in this examplesalesdatabase
, becomes the name used to access the data in a query.
Configure access to a data source in your newly defined catalog with a few simple steps:
Specify the required connector with
connector.name=
, for exampleconnector.name=postgresql
Add any other properties required by the connector
Add any optional properties as desired
Copy the file onto the coordinator and all worker nodes
Restart the coordinator and all workers
Confirm the catalog is available with
SHOW CATALOGS;
List the available schemas with
SHOW SCHEMAS FROM salesdatabase;
Start writing and running the desired queries
Many connectors use the similar properties to define them. For example, most JDBC-based connectors require these minimum properties for their catalog files:
connector.name=<connectorname>
connection-url=<connectorprotocol>//<host>:<port>;database=<database>
connection-user=root
connection-password=secret
Apply changes#
To use the a newly-defined catalog, you must restart SEP for the changes to take effect. Refer to the documentation for your deployment type for restart instructions.
Next steps#
Familiarize yourself with the available SEP connectors.