Version 2.3.0 or Newer

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Note

If you are on Alation versions prior to 2024.1.4 or connector versions prior to 2.3.0 then refer to Alation Versions Prior to 2024.1.4 or Connector Versions Prior to 2.3.0

Metadata Extraction (MDE) fetches BI source information, such as workspaces, dashboards, reports, datasets, and dataflows. The metadata that Alation retrieves during MDE, becomes catalog objects.

You can initiate MDE on demand or schedule it for regular catalog updates.

Important

With Alation version 2024.1.4 or newer and Azure Power BI Scanner OCF connector version 2.3.0 or newer, Alation has introduced Metadata Extraction tab in the user interface for configuring Metadata Extraction.

Configure MDE in Alation

Metadata extraction fetches data source information, such as workspaces, dashboards, reports, datasets, and dataflows.

Steps involved in metadata extraction are:

Test Access and Fetch Workspaces

Important

This step applies from Alation version 2024.1.4 and connector version 2.3.0.

Before fetching the list of workspaces for extraction, Alation tests if the user has all the configurations to run metadata extractions. Ensure that you have completed the steps to set up the Azure Power BI Scanner (Prerequisites).

Perform these steps to test access and fetch buckets:

  1. On the Settings page of BI Server source, go to the Metadata Extraction tab.

  2. In the Test access and fetch workspaces section, click Run.

    The retrieved list of workspaces appear in the Workspaces table under the Select workspaces for extraction section of the Metadata Extraction tab.

Select Workspaces for Extraction

By default, all the workspaces Alation fetches from the BI server source are selected for extraction. You can adjust the selection of workspaces by:

  • Selecting Workspaces Using Filters

  • Selecting Workspaces Manually

Important

If you do not select any workspace manually or using filters, Alation extracts all the workspaces when you run the metadata extraction.

Select Workspaces Using Filters

If you want to apply extraction filters, perform these steps:

  1. On the Settings page of your BI server source, go to the Metadata Extraction tab.

  2. Under the Select workspaces for extraction section, turn on the Enable advanced settings toggle.

  3. Select the required extraction filter option from the Extract drop down:

    1. Only selected workspaces — Extracts metadata only from the selected workspaces. This is the default value.

    2. All workspaces except selected — Extracts metadata from all workspaces except the selected workspaces.

  4. To delete the workspaces from previous extraction that are not part of the current workspaces selection, select the Keep the catalog synchronized with the current selection of workspacess checkbox.

  5. Create a filter.

    1. From the first drop down, select Workspace.

    2. Select the filter criteria (Contains, Starts with, Ends with, Regex).

    3. Specify the keyword to look for from the workspace.

    Use this option if you frequently change workspaces or if you use extensive metadata.

    You can add multiple filters by clicking the Add another filter link.

Note

You must use rules if you plan to schedule MDE.

  1. Click Apply filters.

    The Workspaces table displays the selected workspaces that match the rules that you had set.

Note

After applying rules, you cannot manually adjust the selection of workspaces.

Select Workspaces Manually

If you opt to manually select the workspaces for extraction, perform these steps:

  1. On the Settings page of your BI Server source, go to the Metadata Extraction tab.

  2. Under the Select workspaces for extraction section, turn off the Enable advanced settings toggle, if not disabled already.

  3. Select the required workspaces from the list in the Workspaces table.

    Alternatively, you can select a workspace by searching for the specific workspace from the table using either the workspace name or any keyword or string in the workspace name.

    After you have selected the workspaces, your selection count is displayed on top of the Workspaces table.

Customize the Extraction Scope

Note

This is an optional step.

You can customize metadata extraction down to the level of specific metadata types, such as reports fields. Consider the following point:

  • You can disable or enable the metadata types you want to extract by enabling or disabling the corresponding option.

To customize the extraction scope, perform these steps:

  1. On the Settings page of your BI Server source, go to the Metadata Extraction tab.

  2. Under the Customize additional extraction scope section, enable or disable the required additional available metadata types from the following options:

    • Extract all the workspaces from Azure Power BI - Extract all the workspaces from Azure Power BI.

      Note

      • When this option is turned on, all the workspaces from the Azure Power BI will be extracted, irrespective of the service principal’s access.

      • When this option is turned off, only those workspaces accessible to the service principal configured, will be extracted.

      • When this option is turned off, it is mandatory to enable the Service principals can use Fabric APIs option in the Azure Power BI Tenant settings.

      • Once you enable or disable Extract all the workspaces from Azure Power BI option, execute the Step 1: Test access and fetch workspaces step to retrieve the updated list of workspaces.

    • Enable apps extraction - extracts apps from Azure Power BI.

    • Enable report fields extraction - extracts report fields from Azure Power BI.

      Note

      To extract the report field information:
      • Select Export and Sharing Settings under Tenant Settings > Download Reports.

      • Add the security group.

      • Grant access to the service principal configured

      • Turn on the Service principals can use Fabric APIs toggle in the Azure Power BI Tenant settings.

      Important

      Enabling report field extraction increases metadata extraction time, as Alation invokes Azure Power BI API to download the reports and extract the report fields.

Run Extraction

Under the Run extraction section (General Settings > Metadata Extraction), click Run Extraction to extract metadata on demand.

The status of the extraction action is logged in the Extraction Job Status table under the MDE Job History tab.

Schedule Extraction

You can also schedule the extraction. To schedule the extraction, perform these steps:

  1. On the Settings page of your BI Server source, go to the Metadata Extraction.

  2. Under the Run extraction section, turn on the Enable extraction schedule toggle.

  3. Using the date and time widgets, select the recurrence period and day and time for the desired MDE schedule. The next metadata extraction job for your BI Server source will run on the schedule you have specified.

    ../../../../_images/Snowflake_OCF_New_ScheduleMDE.png

Note

Here are some of the recommended schedules for better performance:

  • Schedule extraction to run for every 12 hours at the 30th minute of the hour.

  • Schedule extraction to run for every 2 days at 11:30 PM.

  • Schedule extraction to run every week on the Sunday and Wednesday of the week.

  • Schedule extraction to run for every 3 months on the 15th day of the month.

View the MDE Job History

You can view the status of the extraction actions after you run the extraction or after Alation triggers the MDE as per the schedule. Also, you can view the status of the workspaces retrieved from the Test Access and Fetch Workspaces step.

To view the status of extraction, go to Metadata Extraction > MDE Job History on the Settings page of your BI Server source. The Extraction job status table is displayed.

../../../../_images/PowerBIScanner_OCF_New_MDEJobHistory.png

The Extraction job status table logs the following status:

  • Did Not Start - Indicates that the metadata extraction did not start due to configuration or other issues.

  • Succeeded - Indicates that the extraction was successful.

  • Partial Success - Indicates that the extraction was successful with warnings. If Alation fails to extract some of the objects during the metadata extraction process, it skips them and proceeds with the extraction process, resulting in partial success.

  • Failed - Indicates that the extraction failed with errors.

Click the View Details link to view a detailed report of metadata extraction. If there are errors, the Job errors table displays the error category, error message, and a hint (ways to resolve the issue). Follow the instructions under the Hints column to resolve the error.

In some cases, Generate Error Report link is displayed above the Job errors table. Click the Generate Error Report link above the Job errors table to generate an archive (.zip) containing CSV files for different error categories, such as Data and Connection errors. Click Download Error Report to download the files.

Enable Raw Dump or Replay

You can enable or disable the Raw Metadata Dump or Replay feature for debugging MDE. By default, this feature is disabled. We recommend enabling it for extraction debugging only. The full use of this feature requires access to the Alation server.

If Raw Metadata Dump or Replay is enabled, Alation breaks MDE into these stages:

  • “Dump” the extracted metadata into files. You can access and review the files on the Alation server to debug extraction issues before attempting to ingest the metadata into the catalog.

  • Ingest the metadata from the files into the catalog (Replay).

Both the stages are manually controlled from the user interface.

To enable the Raw Metadata Dump or Replay perform these steps:

  1. On the Settings page of your Oracle data source, go to the Metadata Extraction > Troubleshooting section.

  2. From the Enable Raw Metadata Dump or Replay dropdown list, select the Enable Raw Metadata Dump option.

  3. Click Save.

    This enables the first stage of MDE where the extracted metadata is dumped into the following files in a subdirectory within the opt/alation/site/tmp/ directory on the Alation server (inside the Alation shell):

    attribute.dump, function.dump, schema.dump, table.dump —in a subdirectory of the directory opt/alation/site/tmp/ on the Alation server (inside the Alation shell).

  4. Click Run extraction.

    Alation performs a raw metadata dump into files. In the Extraction job status table on the MDE Job History tab, click the View Details link to display the details of the MDE job. The log lists the location of the .dump files for the MDE job. For example: /opt/alation/site/tmp/rosemeta/170/extraction_dump/5028.

  5. Access and review the metadata dump files to intercept any potential extraction issues.

  6. From the Enable Raw Metadata Dump or Replay dropdown list, select the option Enable Ingestion Replay.

  7. Click Save.

    This enables the second stage where the metadata from the files is ingested into the Alation catalog.

  8. Click Run extraction.

    The metadata from the files are ingested into the catalog.

Configure Extraction Batch Size

You can set the size of the workspace extraction batches.

Go to the Troubleshooting section under the Metadata Extraction tab of the Settings page of your BI server source and specify the batch size in the Workspace extraction batch size field.

Alation supports a maximum of 100 batches per extraction. Reducing this Extraction Batch Size parameter will increase the number of API calls against the Azure Power BI Scanner to fetch the data, but decrease the size of data fetched during each call. For large workspaces, Alation recommends a smaller batch size .

Manage Extraction Based on API Limits

You can pause and retry the extraction till the API limit is reached.

Go to the Troubleshooting section under the Metadata Extraction tab of the Settings page of your BI server source and turn on the Wait and retry extraction when API limit is reached toggle.

Enable this option to pause the extraction when the API limit is reached. Extraction will be paused until Power BI refreshes the API limit. If you turn off the toggle button, the extraction process will stop when the API limit is reached.