Databricks Unity Catalog Connector Extraction Queries

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Important

This section is applicable from connector versions 3.0.0 onwards.

The default extraction queries for Databricks Unity Catalog Connector OCF data sources are listed below. You can customize them to better suit your extraction requirements. Custom queries can be provided on the Metadata Extraction tab of the data source Settings page under the Custom extraction queries section.

You can customize all or some of the queries.

If specified, custom extraction queries will take precedence over the default MDE , selective extraction, and schema selection.

Catalog

Make sure that your query has a column labeled as CATALOG in the SELECT statement.

SELECT
        CATALOG_NAME AS CATALOG
FROM
        SYSTEM.INFORMATION_SCHEMA.CATALOGS
WHERE
        CATALOG_NAME NOT IN('system');

Schema

Make sure that your query has columns labeled as CATALOG, SCHEMA and REMARKS in the SELECT statement.

SELECT
        CATALOG_NAME AS CATALOG,
        SCHEMA_NAME AS SCHEMA,
        COMMENT AS REMARKS
FROM
        SYSTEM.INFORMATION_SCHEMA.SCHEMATA
WHERE
        CATALOG_NAME NOT IN ('system')
        AND CONCAT(CATALOG_NAME,
    '.',
        SCHEMA_NAME) NOT IN ('''')
        AND SCHEMA_NAME NOT IN('information_schema');

Table

Make sure that your query has columns labeled as CATALOG, SCHEMA, TABLE_NAME, TABLE_TYPE, and REMARKS in the SELECT statement.

SELECT
        TABLE_CATALOG AS CATALOG,
        TABLE_SCHEMA AS SCHEMA,
        TABLE_NAME AS TABLE_NAME,
        TABLE_TYPE AS TABLE_TYPE,
        COMMENT AS REMARKS
FROM
        SYSTEM.INFORMATION_SCHEMA.TABLES
WHERE
        TABLE_TYPE NOT IN( 'MATERIALIZED_VIEW', 'VIEW')
        AND TABLE_CATALOG NOT IN ('system')
        AND CONCAT(TABLE_CATALOG,
    '.',
        TABLE_SCHEMA)
NOT IN ('''')
        AND TABLE_SCHEMA NOT IN('information_schema');

View

Make sure that your query has columns labeled as CATALOG, SCHEMA, VIEW_NAME, VIEW_CREATE_STATEMENT, 'VIEW' AS VIEW_TYPE, and REMARKS in the SELECT statement.

SELECT
    V.TABLE_CATALOG AS CATALOG,
    V.TABLE_SCHEMA AS SCHEMA,
    V.TABLE_NAME AS VIEW_NAME,
    'VIEW' AS VIEW_TYPE,
    CASE
        WHEN lower(V.VIEW_DEFINITION) LIKE 'create view%' THEN V.VIEW_DEFINITION
        WHEN lower(V.VIEW_DEFINITION) LIKE 'create or replace view%' THEN V.VIEW_DEFINITION
        WHEN lower(V.VIEW_DEFINITION) LIKE 'create materialized view%' THEN V.VIEW_DEFINITION
        WHEN lower(V.VIEW_DEFINITION) LIKE 'create or replace materialized view%' THEN V.VIEW_DEFINITION
        WHEN T.TABLE_TYPE = 'MATERIALIZED_VIEW' THEN 'CREATE OR REPLACE MATERIALIZED VIEW ' || V.TABLE_CATALOG || '.' || V.TABLE_SCHEMA || '.' || V.TABLE_NAME || ' AS ' || V.VIEW_DEFINITION
        ELSE 'CREATE OR REPLACE VIEW ' || V.TABLE_CATALOG || '.' || V.TABLE_SCHEMA || '.' || V.TABLE_NAME || ' AS ' || V.VIEW_DEFINITION
    END AS VIEW_CREATE_STATEMENT,
    T.COMMENT AS REMARKS
FROM
    SYSTEM.INFORMATION_SCHEMA.VIEWS V
LEFT JOIN SYSTEM.INFORMATION_SCHEMA.TABLES T
ON
    V.TABLE_CATALOG = T.TABLE_CATALOG
    AND V.TABLE_SCHEMA = T.TABLE_SCHEMA
    AND T.TABLE_TYPE IN ('MATERIALIZED_VIEW', 'VIEW')
    AND V.TABLE_NAME = T.TABLE_NAME
WHERE
    V.TABLE_CATALOG NOT IN ('system')
    AND CONCAT(V.TABLE_CATALOG,
    '.',
    V.TABLE_SCHEMA) NOT IN ('''')
    AND V.TABLE_SCHEMA NOT IN('information_schema');

Column

Make sure that your query has columns labeled as CATALOG, SCHEMA, TABLE_NAME, TYPE_NAME, DATA_TYPE, COLUMN_NAME, ORDINAL_POSITION, IS_NULLABLE, REMARKS, and COLUMN_DEFAULT in the SELECT statement.

SELECT
    TABLE_CATALOG AS CATALOG,
    TABLE_SCHEMA AS SCHEMA,
    TABLE_NAME AS TABLE_NAME,
    COLUMN_NAME AS COLUMN_NAME,
    ORDINAL_POSITION AS ORDINAL_POSITION,
    DATA_TYPE AS TYPE_NAME,
    FULL_DATA_TYPE AS DATA_TYPE,
    FULL_DATA_TYPE AS COLUMN_DEF,
    COLUMN_DEFAULT AS COLUMN_DEFAULT,
    IS_NULLABLE AS IS_NULLABLE,
    COMMENT AS REMARKS
FROM
    SYSTEM.INFORMATION_SCHEMA.COLUMNS
WHERE
    TABLE_CATALOG NOT IN ('system')
    AND CONCAT(TABLE_CATALOG,
    '.',
    TABLE_SCHEMA) NOT IN ('''')
    AND TABLE_SCHEMA NOT IN('information_schema');

Primary Key

Make sure that your query has columns labeled as CATALOG, SCHEMA, TABLE_NAME, and COLUMN_NAME in the SELECT statement.

SELECT
    KCU.TABLE_CATALOG AS CATALOG,
    KCU.TABLE_NAME AS TABLE_NAME,
    KCU.COLUMN_NAME AS COLUMN_NAME,
    KCU.TABLE_SCHEMA AS SCHEMA
FROM
    SYSTEM.INFORMATION_SCHEMA.TABLE_CONSTRAINTS TC
JOIN SYSTEM.INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU
ON
    KCU.CONSTRAINT_SCHEMA = TC.CONSTRAINT_SCHEMA
    AND KCU.CONSTRAINT_NAME = TC.CONSTRAINT_NAME
    AND KCU.TABLE_SCHEMA = TC.TABLE_SCHEMA
    AND KCU.TABLE_NAME = TC.TABLE_NAME
WHERE
    KCU.TABLE_CATALOG NOT IN ('system')
    AND TC.CONSTRAINT_TYPE = 'PRIMARY KEY'
    AND
CONCAT(KCU.TABLE_CATALOG,
    '.',
    KCU.TABLE_SCHEMA)
NOT IN ('''')
    AND KCU.TABLE_SCHEMA NOT IN
('information_schema');

Foreign Key

Make sure that your query has columns labeled as PK_CATALOG, PK_SCHEMA, PK_TABLE, PK_COLUMN, FK_CATALOG, FK_SCHEMA, FK_TABLE, and FK_COLUMN in the SELECT statement.

SELECT
    KCUF.TABLE_CATALOG AS FK_CATALOG,
    RC.CONSTRAINT_NAME FK_NAME,
    KCUF.TABLE_SCHEMA FK_SCHEMA,
    KCUF.TABLE_NAME FK_TABLE,
    KCUF.COLUMN_NAME FK_COLUMN,
    RC.UNIQUE_CONSTRAINT_NAME PK_NAME,
    KCUP.TABLE_CATALOG AS PK_CATALOG,
    KCUP.TABLE_SCHEMA PK_SCHEMA,
    KCUP.TABLE_NAME PK_TABLE,
    KCUP.COLUMN_NAME PK_COLUMN
FROM
    SYSTEM.INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS RC
JOIN SYSTEM.INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCUF ON
    RC.CONSTRAINT_NAME = KCUF.CONSTRAINT_NAME
JOIN SYSTEM.INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCUP ON
    RC.UNIQUE_CONSTRAINT_NAME = KCUP.CONSTRAINT_NAME
JOIN SYSTEM.INFORMATION_SCHEMA.TABLE_CONSTRAINTS TC ON
    KCUP.CONSTRAINT_SCHEMA = TC.CONSTRAINT_SCHEMA
    AND KCUF.CONSTRAINT_NAME = TC.CONSTRAINT_NAME
WHERE
    KCUF.TABLE_CATALOG NOT IN ('system')
    AND TC.CONSTRAINT_TYPE = 'FOREIGN KEY'
    AND CONCAT(KCUF.TABLE_CATALOG,
    '.',
    KCUF.TABLE_SCHEMA)
NOT IN ('''')
    AND KCUF.TABLE_SCHEMA NOT IN
('information_schema');