Configure the Data Source Connection

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

After you install the Azure Cosmos DB OCF connector, you must configure the connection to the Azure Cosmos DB data source.

The various steps involved in configuring the Azure Cosmos DB data source connection setting are:

Provide Access

To set the data source visibility,go to the Access tab on the Settings page of your Azure Cosmos DB data source, set the data source visibility using these options:

  • Public Data Source — The data source is visible to all users of the catalog.

  • Private Data Source — The data source is visible to the users allowed access to the data source by Data Source Admins.

You can add new Data Source Admin users in the Data Source Admins section.

Connect to Data Source

To establish the a connection to data source, you must:

Provide the JDBC URI

Important

We recommend that you provide the values in the corresponding fields on the General Settings page instead of the JDBC URI field in the Datasource Connection section. Leave this field empty if all the connection properties you need are available in the user interface.

JDBC URI Format for Account Key Authentication in Compose

Use the following JDBC URI format for account key authentication for Compose:

cosmosdb://AccountEndpoint=<myAccountEndpoint>;AccountKey=<myAccountKey>;

Configure Authentication

For metadata extraction (MDE), profiling and sampling, the connector supports the following authentication methods:

  • Account Key authentication

    • Account Key

    • Account Endpoint

    • Token Type: Master or Resource

  • Azure authentication

    • Azure Service Principal

      • Client Secret

      • Azure Tenant

      • Client ID

  • SSL authentication

    • SSL Client certificate file

    • SSL Client certificate file password

    • (Optional) SSL Server certificate

      Important

      SSL server certificate isn’t supported. If you use a server certificate for connection, contact Alation Support.

Configure Authentication Scheme Settings

  1. On the Settings page of your Azure Cosmos DB data source, go to the General Settings tab.

  2. In the Connector Settings section, provide the following details in the Authentication section.

    Field

    Description

    Auth Scheme

    Specify the type of authentication for connecting to Azure Cosmos DB.

    Default: AccountKey

    Available Values:

    • AccountKey: Set this to perform authentication with Account Key and Account Endpoint.

    • AzureServicePrincipal: Set this to authenticate as Azure Service Principal using a Client Secret.

      Important

      OAuth isn’t supported. Use OAuth fields only for Azure Service Principal authentication scheme.

    Account Endpoint

    Specify the URL from the Keys blade of the Azure Cosmos DB account.

    Account Key

    Specify a master key token or a resource token for connecting to the Azure Cosmos DB REST API.

    Token Type

    Specify the type of token for authentication.

    Available Values:

    • master: Available when an account is created as a set of primary and secondary keys.

    • resource: Available when users in a database are set up with access permissions for precise access control on a resource, also known as a permission resource.

If you choose to authenticate using Azure, configure Azure Authentication. For details, see Configure Azure Authentication Settings .

Configure Azure Authentication Settings

If you choose Azure Service Principal for authentication, follow these steps:

  1. On the Settings page of your Azure Cosmos DB data source, go to the General Settings tab.

  2. In the Connector Settings section, provide the following details in the Azure Authentication section.

    Field

    Description

    Azure Tenant

    Specify the Microsoft online tenant to access data.

    If unspecified, the default tenant is used.

    Azure Environment

    Select the environment to use when establishing a connection.

    Available Values:

    • GLOBAL

    • CHINA

    • USGOVT

    • USGOVTDOD

  3. In the Connector Settings section, provide the following details in the OAuth section.

    Field

    Description

    Initiate OAuth

    Select this to initiate the process to obtain or refresh the OAuth access token when you connect.

    • GETANDREFRESH: Indicates that the entire OAuth Flow is handled by the provider. If no token currently exists, it is obtained by prompting the user through the browser. If a token exists, it gets refreshed when applicable.

    OAuth Client ID

    Specify the assigned client ID when you register your application with an OAuth authorization server.

    OAuth Client Secret

    Specify the assigned client secret when you register your application with an OAuth authorization server.

    Important

    OAuth isn’t supported. Use OAuth fields only for Azure Service Principal authentication scheme.

If you choose to encrypt using SSL, configure SSL. For details, see Configure SSL Authentication Settings .

Configure SSL Authentication Settings

  1. On the Settings page of your Azure Cosmos DB data source, go to the General Settings tab.

  2. In the Authentication section, select Encrypt.

  3. In the Connector Settings section, provide the following details in the SSL section.

    Field

    Description

    SSL Client Cert

    Specify the certificate store for the client certificate and select the appropriate file type from the options under SSL Client Cert Type.

    SSL Client Cert Type

    Specify the type of key store that contains the SSL client certificate.

    Available Options:

    • USER: (Default) For Windows, this specifies that the certificate store is a certificate store owned by the current user. Note that this store type is not available in Java.

    • MACHINE: For Windows, this specifies that the certificate store is a machine store. Note that this store type is not available in Java.

    • PFXFILE: The certificate store is the name of a PFX (PKCS12) file containing certificates.

    • PFXBLOB: The certificate store is a string (base64 encoded) representing a certificate store in PFX (PKCS12) format.

    • JKSFILE: The certificate store is the name of a Java key store (JKS) file containing certificates. Note that this store type is only available in Java.

    • JKSBLOB: The certificate store is a string (base64 encoded) representing a certificate store in JKS format. Note that this store type is only available in Java.

    • PEMKEY_FILE: The certificate store is the name of a PEM-encoded file that contains a private key and an optional certificate.

    • PEMKEY_BLOB: The certificate store is a string (base64 encoded) that contains a private key and an optional certificate.

    • PUBLIC_KEY_FILE: The certificate store is the name of a file that contains a PEM- or DER-encoded public key certificate.

    • PUBLIC_KEY_BLOB: The certificate store is a string (base64 encoded) that contains a PEM- or DER-encoded public key certificate.

    • SSHPUBLIC_KEY_FILE: The certificate store is the name of a file that contains an SSH-style public key.

    • SSHPUBLIC_KEY_BLOB: The certificate store is a string (base64 encoded) that contains an SSH-style public key.

    • P7BFILE: The certificate store is the name of a PKCS7 file containing certificates.

    • PPKFILE: The certificate store is the name of a file that contains a PPK (PuTTY Private Key).

    • XMLFILE: The certificate store is the name of a file that contains a certificate in XML format.

    • XMLBLOB: The certificate store is a string that contains a certificate in XML format.

    SSL Client Cert Password

    Specify the password for the client certificate. If the certificate store is of a type that requires a password, this property is used to specify that password to open the certificate store.

    SSL Client Cert Subject

    Specify the subject of the client certificate. The subject is a comma separated list of distinguished name fields and values.

    Consider the following points:

    • If an exact match is not found, the store is searched for subjects containing the value of the property.

    • If a match is still not found, the property is set to an empty string, and no certificate is selected.

    • The special value * picks the first certificate in the certificate store.

    SSL Server Cert

    Specify the TLS/SSL certificate to be accepted from the server.

    Accepted Values:

    • A full PEM Certificate

    • A path to a local file containing the certificate

    • The public key

    • The MD5 Thumbprint (hex values can also be either space or colon separated)

    • The SHA1 Thumbprint (hex values can also be either space or colon separated) If not specified, any certificate trusted by the machine is accepted.

    Certificates are validated as trusted by the machine based on the system’s trust store. The trust store used is the javax.net.ssl.trustStore value specified for the system.

    If no value is specified for this property, Java’s default trust store is used (for example, JAVA_HOMElibsecuritycacerts). Use * to signify to accept all certificates. Note that this is not recommended due to security concerns.

    Important

    SSL server certificate isn’t supported. If you use server certificate for connection, contact Alation Support.

  1. Save the details.

Test the Connection

The connection test checks database connectivity.

After configuring authentication, test the connection.

To validate the network connectivity, go to General Settings > Test Connection of the Settings page of your Azure Cosmos DB data source and click Test.

A dialog box appears confirming the status of the connection test.

Configure Additional Connection Settings

Apart from the mandatory configurations that you perform to connect to the data source on the General Settings tab, configure the following additional settings:

Note

In the General Settings tab, leave the Additional data source connection field blank and skip the Disable automatic lineage generation toggle as these options are not applicable to the Azure Cosmos DB OCF connector.

Configure Firewall Settings

  1. On the Settings page of your Azure Cosmos DB data source, go to the General Settings tab.

  2. In the Connector Settings section, provide the following details in the Firewall section.

    Field

    Description

    Firewall Type

    Specify the protocol used by the proxy-based firewall for traffic tunneling.

    Available Options:

    • NONE: Default.

    • TUNNEL: Opens a connection to Azure Cosmos DB and traffic flows back and forth through the proxy.

      The default port is 80.

    • SOCKS4: Sends data through the SOCKSv4 proxy as specified in Firewall Server and Firewall Port.

      The default port is 1080.

    • SOCKS5: Sends data through the SOCKSv5 proxy as specified in Firewall Server and Firewall Port.

      The default port is 1080.

    Firewall Server

    Specify the host name, DNS name, or IP address of the proxy-based firewall.

    Firewall Port

    Specify the TCP port of the proxy-based firewall.

    Firewall User

    Specify the user name to authenticate with the proxy-based firewall.

    Firewall Password

    Specify the password to authenticate with the proxy-based firewall.

  3. Save the details.

Configure Proxy Settings

  1. On the Settings page of your Azure Cosmos DB data source, go to the General Settings tab.

  2. In the Connector Settings section, provide the following details in the Proxy section.

    Field

    Description

    Proxy Auto Detect

    Select this to use the system proxy settings. Don’t select this if you’re using custom proxy settings.

    For SOCKS proxy, select the appropriate value in Firewall Type.

    Proxy Server

    Specify the hostname or IP address of a proxy to route HTTP traffic.

    For SOCKS proxy, select the appropriate value in Firewall Type.

    Proxy Port

    Specify the TCP port the Proxy Server is running on.

    Default: 80

    Proxy Auth Scheme

    Specify the authentication type to use to authenticate to the proxy server.

    Available Values:

    • BASIC: (Default) Enables HTTP basic authentication.

    • DIGEST: Enables HTTP digest authentication.

    • NONE: No proxy authentication.

    • NEGOTIATE: Retrieves an NTLM or Kerberos token based on the applicable protocol for authentication.

    • NTLM: Retrieves only NTLM token based on the applicable protocol for authentication.

    • PROPRIETARY: Adds a custom token in the Authorization header of the HTTP request. It doesn’t generate NTLM or Kerberos token.

    Proxy User

    Specify the username to authenticate to the proxy server based on the chosen Proxy Auth Scheme.

    If you are using Windows or Kerberos authentication, set this property to a user name in one of the following formats:

    • user@domain

    • domain\user

    Proxy Password

    Specify the password to authenticate to the proxy server based on the chosen Proxy Auth Scheme.

    Proxy SSL Type

    Select the SSL type when connecting to the proxy server.

    Available Values:

    • AUTO: (Default) If the URL is an HTTPS URL, the provider will use the TUNNEL option. If the URL is an HTTP URL, the component will use the NEVER option.

    • ALWAYS: The connection is always SSL enabled.

    • NEVER: The connection is not SSL enabled.

    • TUNNEL: The connection is established through a tunneling proxy. The proxy server opens a connection to the remote host and traffic flows through the proxy.

    Proxy Exceptions

    Specify a semicolon separated list of destination hostnames or IPs that are exempt from connecting through the proxy server.

Configure Logging Settings

  1. On the Settings page of your Azure Cosmos DB data source, go to the General Settings tab.

  2. In the Connector Settings section, provide the following details in the Logging section.

    Field

    Description

    Verbosity

    Specify the verbosity level between 1 to 5 to include details in the log file.

    Available Values:

    • 1: Logs the query, number of rows returned by a query,

      execution time, time of the start of execution, and any errors.

    • 2: Logs everything included in the Verbosity level 1,

      cache queries, and any additional information about the request.

    • 3: Logs everything included in the Verbosity level 2,

      HTTP headers, request body and response body.

    • 4: Logs everything included in the Verbosity level 3,

      transport-level communication with the data source. This includes SSL negotiation.

    • 5: Logs everything included in the Verbosity level 4,

      communication with the data source, and additional details that may be helpful in troubleshooting. This includes interface commands.

    Log Modules

    Includes the core modules in the log files. Add module names separated by a semi-colon.

    By default, all modules are included.

    Max Log File Count

    Specify the maximum file count for log files. After the limit, the log file is rolled over and time is appended at the end of the file. The oldest log file is deleted.

    Maximum Value: 2

    Default: -1. A negative or zero value indicates unlimited files.

Configure Schema Settings

  1. On the Settings page of your Azure Cosmos DB data source, go to the General Settings tab.

  2. In the Connector Settings section, provide the following details in the Schema section.

    Field

    Description

    Browsable Schemas

    Specify the schemas as subset of the available schemas in a comma separated list. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC

    Tables

    Specify the fully qualified name of the table as a subset of the available tables in a comma separated list.

    For example, Tables=TableA,TableB,TableC

    Each table must be a valid SQL identifier that might contain special characters escaped using square brackets, double-quotes, or backticks.

    For example, Tables=TableA,[TableB/WithSlash],WithCatalog.WithSchema.`TableC With Space`.

    Views

    Specify the fully qualified name of the Views as a subset of the available tables in a comma separated list.

    For example, Views=ViewA,ViewB,ViewC.

    Each table must be a valid SQL identifier that might contain special characters escaped using square brackets, double-quotes, or backticks.

    For example, Views=ViewA,[ViewB/WithSlash],WithCatalog.WithSchema.`ViewC With Space`.

    Schema

    Specify the Azure Cosmos DB database you want to work with.

Configure Miscellaneous Settings

  1. On the Settings page of your Azure Cosmos DB data source, go to the General Settings tab.

  2. In the Connector Settings section, provide the following details in the Misc section.

    Field

    Description

    Batch Size

    Specify the maximum size of each batch operation.

    Default: 0

    Calculate Aggregates

    Specifies whether to return the calculated value of the aggregates or grouped by partition range.

    Connection Lifetime

    Specify the maximum limit for a connection to stay connected in seconds.

    Default: 0 indicates unlimited lifetime for a connection.

    Flatten Arrays

    Specify an arbitrary number to flatten the elements in a nested array into columns. By default, the nested arrays are returned as JSON strings.

    Set it to -1 to flatten all the elements.

    Flatten Objects

    Select this to flatten the object properties in a nested array into columns. By default, the nested arrays are returned as JSON strings.

    Force Query On Non Indexed Containers

    Force the use of an index scan to process the query if indexing is disabled or the right index path is not available.

    Generate Schema Files

    Specify the preference when to generate and save the schemas.

    Available Options:

    • Never: Doesn’t generate a schema file.

    • OnUse: A schema file is generated the first time a table is referenced, provided the schema file for the table does not already exist. In SQL, the schemas are generated as you execute SELECT queries.

    • OnStart: A schema file is generated at connection time for any tables that do not currently have a schema file.

    • OnCreate: A schema file is generated when running a CREATE TABLE SQL query.

    Max Rows

    Specify the limit for the number of rows returned if no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses.

    Max Threads

    Specifies the maximum number of concurrent requests for Batch CUD (Create, Update, Delete) operations.

    Multi Thread Count

    Aggregate queries in partitioned collections will require parallel requests for different partition ranges.

    Set this to the number of parallel requests to be issued at the same time.

    Other

    Specify the caching, integration, or formatting properties in a list format separated by a semicolon.

    Available Options:

    • Caching Configuration:

      • CachePartial=True: Caches only a subset of columns specified in the query.

      • QueryPassthrough=True: Passes the specified query to the cache database instead of using the SQL parser of the provider.

    • Integration and Formatting:

      • DefaultColumnSize: Sets the default length of string fields when the data source does not provide column length in the metadata.

        The default value is 2000.

      • ConvertDateTimeToGMT: Converts date-time values to GMT instead of the local time of the machine.

      • RecordToFile=filename: Records the underlying socket data transfer to a specified file.

    Page size

    Specify the maximum number of results to return per page from Azure Cosmos DB.

    A higher value results in better performance but uses more memory.

    Pool Idle Timeout

    Specify the idle time for a connection in a pool.

    Default: 60 seconds

    Pool Max Size

    Specify the maximum number for connections in a pool.

    To disable, set the value to 0 or less.

    Default: 100

    Pool Min Size

    Specify the minimum number for connections in a pool.

    Default: 1

    Pool Wait Time

    Specify the maximum wait duration for a connection to become available. If a new connection request is in wait for an available connection but exceeds the time, an error is thrown. By default, new connection requests have a forever wait time for an available connection.

    Default: 60 seconds

    Pseudo Columns

    Specify the pseudo columns in the comma-separated list to be added as columns to the table.

    For example, “Table1=Column1, Table1=Column2, Table2=Column3”.

    Use the * character to include all tables and columns in this format: *=*

    Read only

    Select this to enforce only SELECT queries to work on Azure Cosmos DB.

    Retry Wait Time

    Specify the minimum number of milliseconds the provider needs to wait to retry a request.

    Default: 2000

    Row Scan Depth

    Specify the maximum number of rows to scan for the available columns in a table.

    Set it to -1 to scan an arbitrary number of rows.

    Separator Character

    Specify the character or characters to denote hierarchy or separate columns.

    Default: .

    Note: If your data has columns that use a period (.) within the attribute name, specify any other character.

    Set Partition Key As PK

    Select this to use the collection’s Partition Key field as part of composite Primary Key for the corresponding exposed table.

    Timeout

    Specify the time limit in seconds after which the operation is canceled and an error is thrown.

    A value of 0 specifies that the operation never times out until completion or failure.

    Default: 60 seconds

    Type Detection Scheme

    Specify how to scan data to determine the fields and datatypes in a document collection.

    Available Values:

    • None: Returns all columns as strings.

    • Rowscan: Scans rows to heuristically determine the data type.

    • Recent: Scans the rows to heuristically determine the data type for the recent documents in a collection.

    Use Connection Pooling

    Select this to enable connection pooling.

    Use Consistent Reads

    Select this to always use Consistent Reads when querying Azure Cosmos DB.

    User Defined Views

    Specify the file path pointing to the JSON configuration file that contains custom views.

    Use Rid As PK

    Select this property to switch using the default _rid as primary key instead of column id.

    Write Throughput Budget

    Defines the Requests Units (RU) budget per second that the Batch CUD (Create, Update, Delete) operations should not exceed.

    Default: 1000

  3. Click Save.

Disable Obfuscate Literals

You can hide literal values from queries ingested with query log ingestion and displayed on the Queries tab of a schema and table catalog objects.

Go to General Settings > Obfuscate Literals of the Settings page of your Azure Cosmos DB data source and disable the Obfuscate literals toggle.

When enabled, literal values are substituted with placeholder values. Disable this option when you want literal values in queries to be visible to users.

By default, this option is disabled.

Configure Logging

To set the logging level for your Azure Cosmos DB OCF data source logs, perform these steps:

  1. On the Settings page of your Azure Cosmos DB OCF data source, go to General Settings > Logging configuration.

  2. Select a logging level for the connector logs and click Save.

    The available log levels are based on the Log4j framework.

You can view the connector logs in Admin Settings > Server Admin > Manage Connectors > Azure Cosmos DB OCF connector.