Starburst Cosmos DB connector#

The Starburst Cosmos DB connector uses the API for NoSQL to read data stored in Azure Cosmos DB for NoSQL.

The Starburst Cosmos DB connector only supports connecting to Azure Cosmos DB for NoSQL. If you are using Azure Cosmos DB for PostgreSQL, MongoDB, or Apache Cassandra, use the native PostgreSQL, MongoDB, or Cassandra connectors instead.

Note

The Starburst Cosmos DB connector is a public preview. Contact Starburst Support with questions or feedback.

Requirements#

To connect to Azure Cosmos DB for NoSQL, you need:

  • Azure access credentials with an attached policy to be able to read from CosmosDB.

  • Network access from the coordinator and workers to the Cosmos DB instance. By default this connection uses HTTPS over port 443.

  • A valid Starburst Enterprise license.

  • Data in Cosmos DB must be stored in Azure Cosmos DB for NoSQL.

Configuration#

Create the example catalog with a catalog properties file in etc/catalog named example.properties (replace example with your database name or some other descriptive name of the catalog) with the following contents:

connector.name=cosmosdb
cosmosdb.connection-url=https://ACCOUNT_NAME.documents.azure.com:443/
cosmosdb.connection-key=sample-key

Specify the connector.name property as cosmosdb. Configure the catalog using your Azure Cosmos DB connection URL and access key. The connection URL may be formatted differently from the example provided here.

SQL support#

The connector provides globally available and read operation statements to access data and metadata in Cosmos DB databases.

Type mapping#

Because Trino and Cosmos DB each support types that the other does not, this connector modifies some types when reading data. Data types may not map the same way in both directions between SEP and the data source. Refer to the following sections for type mapping in each direction.

Cosmos DB to Trino type mapping#

The connector maps Cosmos DB types to the corresponding Trino types following this table:

Cosmos DB to Trino type mapping#

Cosmos DB type

Trino type

Notes

Boolean

BOOLEAN

Double

DOUBLE

Cosmos DB uses IEEE 754 double precision for its number type. All numeric types in Cosmos DB are mapped to DOUBLE.

String

VARCHAR

Object

ROW

Array

ARRAY

Mapped instead to ROW if all elements in the array are of the same type.

No other types are supported.

Performance#

The connector includes a number of performance improvements, detailed in the following sections.

Pushdown#

The connector supports pushdown for Limit pushdown and some predicates. Predicate pushdown is only supported for equality (=) and range (<, >) expressions, on columns of type VARCHAR and BOOLEAN.