Data Product creation
Data Product definition
- Scaffold a new Data Product. witboost will create a new repository containing just the definition of the Data Product, without any code relative to components or business logic.
To do so, head to
Builder > Templates
. The templates page will be displayed. Each template represents a ready-to-use repository that can be created by just inserting its fundamental details. Let's select the Data Product template by clicking onChoose
.
- A wizard opens, asking you for fundamental information for the Data Product creation.
The set of information asked by the wizard will change from template to template since different components require different data. Templates are developed by the platform team. To get some more insights on how templates work, you can refer to the Templates section.
When performing actions that will act on a repository (creating a new repository, committing/pushing some files, etc) witboost requires you to specify the token that you will be using to interact with the git repository. If the token is not set, an error message will notify you to configure it in your settings.
Refer to [Configuring Git Credentials page to set up your own GitLab token.
- The newly created Data Product shall be listed in the
Builder > My Data Products
page and its details can be viewed by selecting it.
- Go back to the templates and create the desired workload and output port.
In this phase, when inside the wizard, it is important to associate the components to the Data Product that was just created.
This way when the repository will be created and registered inside witboost, you will see the components associated with the Data Product directly on its details page.
Refer to Update Data Product Metadata page to understand how to update the metadata of a data product.
If you need to import metadata from an infrastructure service like AWS into your output port or workload, the Reverse Provisioning feature provides a convenient solution. It enables you to effortlessly integrate the desired metadata, potentially saving you a significant amount of time. You can find more information about it in the Reverse Provisioning documentation.
Data Product validation
The Data Product Descriptor is the main definition of the Data Product, and it will be used to send information to the other modules. For this reason, it is important that it respects some fundamental rules and it is compliant with the platform requirements;
To make those checks, in the Control Panel
you can test it against platform policies on the target environment. When a descriptor is declared compliant by the platform, you can deploy it.
To get more details on this workflow you can check the Deploy section.
Obtaining a preview descriptor
The preview descriptor panel, on the right side of the control panel, as shown below, is used to obtain a complete overview on the data product descriptor. That means you will see in advance the descriptor that is sent to the Provisioning Coordinator and the Marketplace Plugin, so to be sure everything is fine before deploy or release.
To get a preview for a different release, simply select a different one from the drop-down menu. The preview will load automatically.
If instead, you want to obtain a preview descriptor for a different target environment, simply select a different environment from the drop-down menu on top of the preview box. The Preview descriptor window will refresh automatically after each of the operations: commit
, release
or new snapshot
.
The HEAD release is the latest version of a data product that is found in the repository. It can diverge from the latest released version if no commit is performed.
The descriptor contains all the metadata about the data product and its components. It is sent to the Provisioning Coordinator to generate a provisioning plan and to the Marketplace Plugin to update the Data Products Marketplace.
Testing Compliance with Governance Policies
The policies tests box, on the left side of the control panel as shown below, lists all previous policies checks performed and their detailed results. Policies check history is available as long as user browser data remains untouched.
After selecting a release, policy tests can be launched through the Launch test button.
You can define different policies according to the environment. So be sure to select the right environment before launching tests!
Inspecting failed policies checks
After you click on Launch Test
, the Preview | Test
panel will show a list of all policy checks performed.
If any of them failed during a test, the status of the test will be KO
, as shown below.
To see which check is failing, expand the test result by clicking on the arrow. You will see a list of all checks that have been performed during a test:
Each failing check contains detailed information about the error. To open the details of the error, click on the failed check:
In this view, you will see three panels:
Policy
panel: contains the policy itself, highlighting which specific rule has not been fulfilledDescriptor
panel: contains the descriptor of the data product as a referenceError Info
panel: contains the error coming directly from the CUE runtime environment
In the example above, the error is caused by some missing fields in one of the components of the data product (i.e. urn:dmb:cmp:finance:cashflow:0:cashflows-calculation
).
Those fields are specified in the Error Info
panel: specific.bucket
, specific.cdpEnvironment
, specific.folder
.
Async validation
You have the possibility to validate a data product also in an asynchronous way. This means that if you have a specific provisioner which performs long running validation tasks, i.e. validation tasks that require a lot of time to be executed, you will find it useful to run those tasks asynchronously to not encounter some timeout errors.
The async validation performs the same operations as discussed above but calls the /v2/validate
API of the provisioning coordinator (and also the /v2/validate
of each specific provisioner).
See Async validation section to enable it.