Alation’s Use of AI for ALLIE AI Suggested Descriptions

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Overview

ALLIE AI Suggested Descriptions, a new Alation feature making use of generative AI technology, was launched into Private Preview on February 1, 2024 and is currently planned to go General Availability in June 2024. The feature accelerates curation of the Description field of RDBMS table objects in the data catalog. Catalog descriptions are particularly valuable to users, and with the advent of powerful pre-trained large language models (LLMs), we now have the opportunity to aid catalog stewards with the creation of descriptions in an automatic fashion. The feature is intended to ehance the curation process and user engagement and satisfaction with the data catalog.

ALLIE AI Suggested Descriptions are available as part of New User Experience on Alation Cloud Service instances on the cloud-native architecture.

How It Works

ALLIE AI Suggested Descriptions use generative AI, employing custom prompt engineering combined with a large language model (LLM). Suggested descriptions are natural language text descriptions of data objects generated by the LLM. In the catalog, users can initiate an AI-generated description, request revisions, or manually edit it to improve accuracy and relevance.

Description generation involves collecting pieces of metadata about a data object, inserting those into a prompt template, and then making a call to the LLM API to get a generated output. Templates are not exposed to users and are handled by internal logic. The response may be displayed to the Alation catalog users in the user interface or applied to internal logic of the Alation application.

For example, there may be a prompt template like the following:

Given the following table name and schema write a description for this table:

Name: <NAME>
Schema: <SCHEMA>

The template is populated with the relevant metadata about the table and sent to the LLM API. For an example table store_locations, the LLM might respond with something like the following:

<answer>
This table contains information about stores and where to find them.
</answer>

Alation integrates Amazon Bedrock to power ALLIE AI Suggested Descriptions. Amazon Bedrock is a fully managed service that simplifies the development of generative AI applications by providing access to high-performing foundation models (FMs) via a single API. Requests to Amazon Bedrock are made in Python via an authenticated boto3 client. This approach ensures secure and efficient interactions with the service.

Security

Content Privacy

With Amazon Bedrock, your content:

  • Is not used to improve the base models.

  • Is not shared with third-party model providers.

  • Is always encrypted in transit and at rest.

Where possible, Alation uses AWS PrivateLink as an extra layer of secure communication to establish private connectivity between Amazon Bedrock and your Amazon Virtual Private Cloud (VPC) without exposing your traffic to the internet.

Any customer content processed by Amazon Bedrock is encrypted and stored at rest in the AWS region where you are using Amazon Bedrock.

../../_images/Stewards_UseOfAI.png

Safety

Amazon Bedrock implements automated abuse detection mechanisms to identify and mitigate potential violations of AWS’s Acceptable Use Policy (AUP), Responsible AI Policy, or a third-party model provider’s AUP.

Abuse detection mechanisms are fully automated, so there is no human review of or access to user inputs or model outputs. Find out more in Amazon Bedrock abuse detection in AWS documentation.

Geographical Availability

The Amazon Bedrock-backed features aren’t available in all regions supported by Alation. You can learn more about its geographical availability in Amazon Bedrock endpoints and quotas in AWS documentation. To extend the AI features to more customers in different regions, Alation routes traffic cross-regionally to supported regions where possible. Requests are made from the Alation instance within the Alation Cloud Service VPCs to Amazon’s region-specific infrastructure. Cross-region calls are secured with the TLS 1.2 encryption, utilizing AWS’s private network to ensure data protection.

The table below illustrates the regions where Alation’s AI features are available and the corresponding target region routing.

Origin Region

Target Region

us-east-1 (US East, N. Virginia)

us-east-1 (US East, N. Virginia)

us-east-1 (US East, N. Virginia)

us-east-2 (US East, Ohio)

(expected availability timeline: second half of 2024)

ca-central-1 (Canada, Central)

(expected availability timeline: second half of 2024)

us-west-2 (US West, Oregon)

us-west-2 (US West, Oregon)

ap-southeast-1 (Asia Pacific, Singapore)

ap-southeast-1 (Asia Pacific, Singapore)

ap-southeast-1 (Asia Pacific, Singapore)

ap-northeast-1 (Asia Pacific, Tokyo)

ap-southeast-2 (Asia Pacific, Sydney)

(expected availability timeline: first half of 2024)

eu-central-1 (Europe, Frankfurt)

eu-central-1 (Europe, Frankfurt)

eu-west-1 (Europe, Ireland)

(expected availability timeline: first half of 2024)

Usage of Customer Metadata by Alation

Features like ALLIE AI Suggested Descriptions utilize a broad spectrum of customer metadata, sending various metadata to the model as input. For example, to create a description of a table Alation may send the table name, title, column names, column types, the text of relevant queries, and the text of relevant documents to the model. The underlying data of a catalog object is never sent. The model then replies with a natural language description of the table which is displayed to users for evaluation.

Alation may send all or a portion of the following metadata or catalog data to the model for a given catalog object:

  • Name

  • Title

  • Column names and types

  • Linked queries

    • Titles

    • Comments

    • SQL content

  • @-mentioned objects (for example, documents)

  • Source comments

  • Tags

  • Custom rich text fields

  • Search results

    • Alation may run an internal search for keywords related to a table to find terms or document objects in the catalog and use the information in those documents to support the generation of a description. The sending of search results is limited to the document object types only.

  • All of the above for the parent and child objects and the domain

  • Conversations

  • Lineage information

  • Collections of objects
    • Catalog sets

    • Generic document collection

Frequently Asked Questions

Does Alation pass any actual data to the large language model (LLM)?

It doesn’t. Only metadata is passed to the LLM, and never the actual data.

Are the Suggested Descriptions generated based on the data in the table or the table name?

The Suggested Descriptions are generated based solely on the table metadata.

Is the LLM trained using the metadata?

It isn’t. Alation doesn’t tune LLMs, and therefore, no customer data or metadata is stored in any of the models in use by Alation.

Does the LLM use metadata from all objects in the catalog or just from the specified object?

It uses metadata from the specific object as well as related objects. However, future enhancements might include a Retrieval-Augmented Generation (RAG) solution that could incorporate additional metadata or documentation from the catalog.

What data protocols are used between the LLM and Alation Cloud Service instances to ensure data security at rest and in transit?

The communication uses TLS and AWS PrivateLink where possible, leveraging Amazon Bedrock services local to the customer’s instance to ensure data security both at rest and in transit.

Can a customer use their own LLM with Alation’s prompts?

Alation doesn’t currently provide the capability for customers to use their own LLM in conjunction with Alation’s prompts. Alation’s prompts have been carefully crafted to work specifically with the LLM we use.