Explore Lineage

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Lineage is data about the origin of data and its movement through an organization’s data ecosystem. Lineage documents how target data objects are created from source data objects. Lineage is visually represented as a chart on the Lineage tab of a data source, BI source, or file system. Lineage charts frequently include dataflow objects, which can be used to document:

  • ETL and ELT processes

  • Stored procedures

  • SQL queries

  • Scripts that transform source data into target data

The lineage chart brings together a target data object, its upstream sources, and the dataflow objects that track its movement, to fully represent the data ecosystem.

From version 2023.3, lineage can be displayed in either of two views: a classic view or a compound layout view. For more information, see Analyze the Lineage Chart.

Lineage Architecture

The lineage framework in Alation is built on Lineage V3, or the lineage service, introduced in version 2021.4. The lineage service is a microservice operating inside the Alation server that is responsible for the creation, storage, and retrieval of lineage data into the catalog.

The Alation server creates lineage data from multiple sources, such as metadata extraction (MDE), query log ingestion (QLI), Compose query history, and public APIs. Lineage events generated from these sources are sent to the lineage service via Event Bus. In the lineage service:

  • The lineage write service consumes lineage events from the Event Bus and stores this lineage data into the lineage database.

  • The lineage read service retrieves the stored lineage data and powers the lineage diagrams in the Alation user interface.

The image below illustrates the lineage architecture for a customer-managed Alation instance.

../../_images/lineageV3_01.png

Types of Lineage

There are two main types of lineage: table-level and column-level. Table-level lineage is the more common, as all types of lineage extraction are capable of producing it. Column-level lineage is dependent upon both the data source and the data source connector. Column-level lineage is calculated for those sources whose connectors support it. For a complete list of data sources that support column-level lineage, see the Support Matrix for your Alation version.

Both table-level and column-level lineage can be created:

Automatic Lineage

Alation automatically calculates lineage using metadata sourced from metadata extraction (MDE), query log ingestion (QLI), and Compose queries. For most data sources, automatic lineage calculation requires query history data extracted and ingested with QLI. Lineage from Compose only exposes data transformations done through Alation’s Compose. Some data sources, for example, SAP HANA and Databricks Unity Catalog, support direct lineage extraction, which is lineage data extracted from system tables during MDE.

Manual Lineage

Users can create and edit lineage charts manually in the Alation interface using the capabilities of the Manual Lineage feature. Learn more in Create Lineage Data Manually.

Creating Lineage via the API

Alation provides a public API to create and update lineage data in the data catalog. The Lineage API documentation can be found on the Developer Portal: Lineage APIs.

Enabling Column-Level Lineage

For most connectors that support column-level lineage, column-level lineage is not calculated by default. You must first enable automatic extraction by setting a feature flag similar to the following on the Feature Configuration tab of Alation’s Admin Settings page:

../../_images/CLL_AutomaticExtract_FeatureFlags.png

If you still do not see column-level lineage, check with your Alation account manager to ensure that column-level lineage for the specified connector is part of your Alation license entitlement.

Column-Level Lineage from Custom SQL

Applies from version 2024.1.2

In BI systems like Tableau or Power BI, analysts often create data sources and datasets using SQL queries that transform source data for specific analyses. Starting with version 2024.1.2, Alation automatically captures such SQL queries and generates column-level lineage detailing data transformations between source and target systems. The lineage supports all types of SQL operations that can be used to create BI data sources, for example SELECT *, CREATE AS SELECT, joins, and unions.

Some conditions must be met for users to see lineage for SQL query-based BI data sources:

  • Upstream data sources must be cataloged in Alation using the appropriate OCF connector.

  • Upstream data sources and downstream BI sources must support column-level lineage, which needs to be enabled in the catalog.

  • Cross-source lineage must be configured either on the BI source or the data source.

Note

Cross-source lineage is a configuration that establishes a mapping between sources in the catalog. It enables Alation to identify (resolve) lineage objects more accurately and generate upstream lineage that traces data flows from one source to another. Without proper cross-source lineage configuration, upstream objects might not be identified correctly and could appear as temporary (TMP) nodes on lineage charts. OCF connector documentation for the connector you’re using will contain information on how to configure cross-source lineage if it’s supported by the connector.

Catalog users can find SQL queries used to create the BI data source on the Connections tab of BI datasource or BI dataset objects’ catalog pages.

Note

Lineage from SQL query-based BI data sources is generated automatically and does not require additional enablement on an instance. However, users with the Server Admin role may need to be aware of two alation_conf feature flags that control this feature on an instance. Both are set to True by default (enabled):

  • alation.resolution.DEV_no_hostport_lineage_resolution–Enables cross-source lineage when the target system doesn’t have the host and port information of the source system.

  • alation.resolution.DEV_sql_cll–Enables column-level lineage for the BI data source type of SQL query between BI and RDBMS systems.

Known Issues with Custom SQL Lineage

  • When a column name is adjusted on the BI server by the user or due to auto-formatting, Alation can’t trace column-level lineage for the affected column.

  • When the custom SQL for a BI data source is modified on the BI server and columns are removed, this change will not be reflected in the lineage charts in Alation after a subsequent extraction. Columns remain in the lineage charts as previously extracted.