Configure Sampling and Profiling

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Refer to Configure Sampling and Profiling for OCF Data Sources.

Sampling and Profiling

For Amazon Redshift, Alation supports a number of ways to retrieve data samples and column profiles. For general information about sampling and profiling, see Configure Sampling and Profiling for OCF Data Sources.

Column Profiling

The default queries for column profiling can be modified on the Custom Settings tab of the Settings page. Do not use fully qualified table names in custom profiling queries for Amazon Redshift. The following query template can be used to create a custom query:

Numeric columns:

SELECT
        MIN({column_name}) AS Minimum,
        MAX({column_name}) AS Maximum,
        AVG({column_name}) AS Average,
        MEDIAN({column_name}) AS Median,
        SUM(CASE WHEN {column_name} IS NULL THEN 1 ELSE 0 END) as "#NULL",
        (SUM(CASE WHEN {column_name} IS NULL THEN 1 ELSE 0 END) / COUNT(*)) * 100 AS "%NULL"
FROM {table_name};

Non-numeric columns:

SELECT
        SUM(CASE WHEN {column_name} IS NULL THEN 1 ELSE 0 END) as "#NULL",
        (SUM(CASE WHEN {column_name} IS NULL THEN 1 ELSE 0 END) / COUNT(*)) * 100 AS "%NULL"
FROM {table_name};