Skip to main content

Data Product creation

Data Product definition

  1. Scaffold a new Data Product. witboost will create a new repository containing just the definition of the Data Product, without any code relative to components or business logic. To do so, head to Builder > Templates. The templates page will be displayed. Each template represents a ready-to-use repository that can be created by just inserting its fundamental details. Let's select the Data Product template by clicking on Choose.

Data Product template

  1. A wizard opens, asking you for fundamental information for the Data Product creation.
tip

The set of information asked by the wizard will change from template to template since different components require different data. Templates are developed by the platform team. To get some more insights on how templates work, you can refer to the Templates section.

caution

When performing actions that will act on a repository (creating a new repository, committing/pushing some files, etc) witboost requires you to specify the token that you will be using to interact with the git repository. If the token is not set, an error message will notify you to configure it in your settings.

Refer to [Configuring Git Credentials page to set up your own GitLab token.

  1. The newly created Data Product shall be listed in the Builder > My Data Products page and its details can be viewed by selecting it.

My Data Products page

  1. Go back to the templates and create the desired workload and output port.

Templates output ports

tip

In this phase, when inside the wizard, it is important to associate the components to the Data Product that was just created.

Choose the right Data Product in the template wizard

This way when the repository will be created and registered inside witboost, you will see the components associated with the Data Product directly on its details page.

Data Product details and relations

Refer to Update Data Product Metadata page to understand how to update the metadata of a data product.

tip

If you need to import metadata from an infrastructure service like AWS into your output port or workload, the Reverse Provisioning feature provides a convenient solution. It enables you to effortlessly integrate the desired metadata, potentially saving you a significant amount of time. You can find more information about it in the Reverse Provisioning documentation.

Data Product validation

The Data Product Descriptor is the main definition of the Data Product, and it will be used to send information to the other modules. For this reason, it is important that it respects some fundamental rules and it is compliant with the platform requirements; To make those checks, in the Control Panel you can test it against platform policies on the target environment. When a descriptor is declared compliant by the platform, you can deploy it. To get more details on this workflow you can check the Deploy section.

Obtaining a preview descriptor

The preview descriptor panel, on the right side of the control panel, as shown below, is used to obtain a complete overview on the data product descriptor. That means you will see in advance the descriptor that is sent to the Provisioning Coordinator and the Marketplace Plugin, so to be sure everything is fine before deploy or release.

preview box

To get a preview for a different release, simply select a different one from the drop-down menu. The preview will load automatically. If instead, you want to obtain a preview descriptor for a different target environment, simply select a different environment from the drop-down menu on top of the preview box. The Preview descriptor window will refresh automatically after each of the operations: commit, release or new snapshot.

note

The HEAD release is the latest version of a data product that is found in the repository. It can diverge from the latest released version if no commit is performed.

tip

The descriptor contains all the metadata about the data product and its components. It is sent to the Provisioning Coordinator to generate a provisioning plan and to the Marketplace Plugin to update the Data Products Marketplace.

Testing Compliance with Governance Policies

The policies tests box, on the left side of the control panel as shown below, lists all previous policies checks performed and their detailed results. Policies check history is available as long as user browser data remains untouched.

policies box

After selecting a release, policy tests can be launched through the Launch test button.

tip

You can define different policies according to the environment. So be sure to select the right environment before launching tests!

Inspecting failed policies checks

After you click on Launch Test, the Preview | Test panel will show a list of all policy checks performed. If any of them failed during a test, the status of the test will be KO, as shown below.

policies test fail

To see which check is failing, expand the test result by clicking on the arrow. You will see a list of all checks that have been performed during a test:

policies fail

Each failing check contains detailed information about the error. To open the details of the error, click on the failed check:

policies error detail

In this view, you will see three panels:

  • Policy panel: contains the policy itself, highlighting which specific rule has not been fulfilled
  • Descriptor panel: contains the descriptor of the data product as a reference
  • Error Info panel: contains the error coming directly from the CUE runtime environment

In the example above, the error is caused by some missing fields in one of the components of the data product (i.e. urn:dmb:cmp:finance:cashflow:0:cashflows-calculation). Those fields are specified in the Error Info panel: specific.bucket, specific.cdpEnvironment, specific.folder.

Async validation

You have the possibility to validate a data product also in an asynchronous way. This means that if you have a technology adapter which performs long running validation tasks, i.e. validation tasks that require a lot of time to be executed, you will find it useful to run those tasks asynchronously to not encounter some timeout errors.

The async validation performs the same operations as discussed above but calls the /v2/validate API of the provisioning coordinator (and also the /v2/validate of each technology adapter).

See Async validation section to enable it.