Virtual NoSQL Data Sources

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Important

You are viewing documentation for Classic Alation.

You can catalog a NoSQL database in Alation using a virtual data source and the dedicated API. Alation does not provide automated metadata extraction (MDE) for virtual data sources, and you will need to use the API to upload metadata.

To catalog your NoSQL data source:

  1. Understand NoSQL Structure in Alation

  2. Enable NoSQL Data Source Support

  3. Add a NoSQL Virtual Data Source

  4. Load Metadata with the NoSQL API

  5. View NoSQL Metadata in the Catalog

Understand NoSQL Structure in Alation

A NoSQL database is modeled as a collection of top level folders, each containing one or more collections. Each collection is a set of documents. A virtual NoSQL data source in Alation follows the same structure, except that each collection contains a list of schemas instead of documents. The schemas in collections are used to describe the structure of the documents in your NoSQL database.

Some NoSQL data sources, like MongoDB, have databases at the top level. In Alation, you can model a database as a folder and collections as collections under this folder.

../../_images/No_SQL_APIStructue.png

Before you can upload any metadata, you need to Enable NoSQL Data Source Support in your Alation catalog.

Enable NoSQL Data Source Support

You can enable NoSQL support with a feature flag in alation_conf.

Note

Alation Cloud Service customers can request server configuration changes through Alation Support.

To enable support of NoSQL data sources:

  1. Use SSH to connect to the Alation server.

  2. Enter the Alation shell using the following command:

    sudo /etc/init.d/alation shell
    
  3. Run the following command to set the feature flag to true:

    alation_conf alation.feature_flags.enable_generic_nosql_support -s true
    
  4. Restart uWSGI and Celery:

    alation_supervisor restart web:uwsgi celery:*
    

After you have enabled NoSQL support, you can add a NoSQL virtual data source to your catalog.

Add a NoSQL Virtual Data Source

To add a NoSQL database as a virtual data source:

  1. If the left navigation isn’t showing, click the left navigation icon to open it.

    ../../_images/LeftNavigationButton.png
  2. Click Data Sources to open the data sources page.

  3. On the upper right, click + Add and in the menu that opens, click Virtual Data Source.

  4. In the dialog that opens, provide the required information and click Continue Setup.

  5. Provide a description (optional).

  6. Select the desired privacy setting (Public or Private)

  7. Click Save and Continue. The new virtual data source will be created and its catalog page will open.

  8. Click the three dots in the top right corner, then click Settings. The settings page will open. Now you can change the settings and upload metadata.

  1. In the top toolbar, click Apps, then Sources. The Sources page will open.

  2. On the upper-right, click +Add and in the menu that opens, click Virtual Data Source.

  3. In the dialog that opens, provide the required information and click Continue Setup.

  4. From the Database Type list, select Generic NoSQL.

  5. Provide a description (optional).

  6. Select the desired privacy setting (Public or Private)

  7. Click Save and Continue. The new virtual data source will be created and the settings page will open. Now you can change the settings and upload metadata.

If you’re going to upload metadata in Avro format, consider whether to enable schema versioning. If you’re not using Avro or don’t want to enable schema versioning, you can start to load metadata with the NoSQL API instead.

Enable Schema Versioning

The metadata you load into a virtual NoSQL data source can be versioned to track changes to the data objects or data types.

We recommend that you enable schema versioning for a NoSQL source that you know will have metadata in Avro format. Version history will reflect changes to data types in the uploaded Avro schemas.

Note

Calculating changes for the schema version history make take time and slow down the NoSQL API when you use it to update your data source. Enable this feature only if version history is a must-have for your NoSQL virtual data source (when you are loading schemas in Avro format).

Turn on schema versioning right after you create the NoSQL data source but before you upload the metadata for the first time. This way, the first schema upload will become version one of the metadata and all subsequent changes will be calculated off this first version.

Schema versioning can be enabled for each specific virtual NoSQL data source that you create in Alation. By default, it is turned off.

Important

After being enabled, schema versioning cannot be disabled.

To enable schema versioning:

  1. On the catalog page of the NoSQL data source, on the upper right, click Settings to open the data source settings page.

  2. Click the General Settings tab.

  3. Turn on the Enable Versioning for the DataSource toggle. This enables schema versioning for this virtual data source. When it is enabled, catalog pages for schemas you upload to this data source will show a Version History button.

  1. On the catalog page of the NoSQL data source, click the three dots in the top right corner, then click Settings. The settings page will open.

  2. Click the General Settings tab.

  3. Turn on the Enable Versioning for the DataSource toggle. This enables schema versioning for this virtual data source. When it is enabled, catalog pages for schemas you upload to this data source will show a Version History button.

Now you can start to load metadata with the NoSQL API.

Load Metadata with the NoSQL API

Alation does not extract metadata into virtual data sources automatically. You will need to upload the metadata using the NoSQL API.

To upload the metadata:

  1. Go to the catalog page of the NoSQL data source. The URL of this page will include the ID of the virtual data source. Take note of the ID, as you will need it to create the API calls. For example, if the URL is https://company.alationcloud.com/data/1441/overview, the ID is 1441.

  2. Use the NoSQL API to upload the metadata.

After the upload, your virtual data source in Alation will have the metadata you have pushed using the API, and you can view each NoSQL object’s dedicated catalog page.

View NoSQL Metadata in the Catalog

The virtual NoSQL data source has a catalog page that shows the list of folders you have uploaded to this data source. The data source has a title and description and uses the data source template.

After you have uploaded your NoSQL metadata to Alation, each NoSQL object will have a dedicated catalog page in Alation under the virtual NoSQL data source.

The NoSQL data source structure has the following object hierarchy:

  • Virtual NoSQL Data Source

    • Folder

      • Collection

        • Schema

          • JSON Schema Properties or Avro Data Types

Folders

The folder level is the top level that appears under Contents on your virtual data source page. You can upload multiple folders to your NoSQL data source. Folders are also called NoSQL databases in Alation.

The folder catalog page will show the list of collections under this folder. Each folder can have multiple collections. Folders have a title and description that can be curated through the UI or by using the Custom Field Values API.

The NoSQL folder object type has a template that can be used to add custom fields to folders.

Collections

The next level of the hierarchy below folders is the collection. Collections are used to group schemas that describe the structure of the documents in your NoSQL database. Each collection can have multiple schemas.

The collection catalog page will show the list of schemas under this collection. Schemas with multiple levels of properties or data types will be shown as expandable objects.

Collections have a title and description that can be curated through the UI or by using the Custom Field Values API.

The NoSQL collection object type has a template that can be used to add custom fields to collections.

Schemas

Schemas are found under collections. A schema is the metadata that describes the structure of the documents in your NoSQL database. Schemas are also called NoSQL attributes in Alation.

Complex properties and data types are expandable and include nested data types on lower levels. You can click the arrow next to a property or data type to expand it and see its children, or you can click on the property or data type key to open its own dedicated catalog page.

Schemas have a title and description that can be curated through the UI or by using the Custom Field Values API.

The NoSQL schema object type has a template that can be used to add custom fields to schemas.

Other Metadata

The Other Metadata table will show any additional properties that were loaded for this object. This field will list any properties that are not included in the Alation’s logical model for NoSQL format and are not required by the NoSQL API but were loaded in addition to the required properties.

Version History

If you have enabled schema versioning for this virtual data source, you will see a Version History button on the catalog page of each schema. Click this button to open a dialog that shows the changes in the schema over time.

The version history dialog will show:

  • Schema versions

    • A version is an object that is created in the catalog after each API call that loads new and updates existing metadata to this data source. A version for a specific Avro schema is created only if this schema is affected by the API call.

  • Changes in the newer version compared to the previous version:

    • Type created

    • Type updated

    • Deleted

Note

The Other Metadata field history is not captured. Version history only reflects changes to data types and properties required by the API.

Filter the List of Changes

You can filter the list of changes in the version history using the quick filter on top of the table. You can filter by name, or type.

You can also sort the list of changes by version.