Data Product Release Lifecycle
Each new data product release can be segmented in a series of steps and the control panel has been designed to effectively address this use-case. However, the flow presented here is the suggested one, every change in this flow must be carefully thought. Typically, when improving the quality of a data product, one could experience the following steps:
- Start of a new release cycle
- Making changes, either to metadata or to environment-specific configurations
- Commit changes
- (optional) Manually review the preview descriptor
- (optional) Check policies compliance
- Deploy and see if everything works as expected
- Release and publish to the marketplace
Steps 4 and 5 are marked as optional since those steps are, in any case, performed automatically by the provisioning coordinator in the deployment phase. However, for step 5, in case the policy validation fails during deployment, errors can be better understood if you use the "Launch Test" button.
Every new release cycle for a data product starts with a New Snapshot. A snapshot is an intermediate state between the latest data product release and the new work-in-progress release. It is designed to make improvement iterations faster, without the need to increase the data product version for each single delta.
Specifically, this operation will create a new folder under
releases/ having as its name the new version number (e.g.
0.3). It will then populate this folder with all descriptors needed:
- Data product descriptor
- Environment-specific configurations for each environment
Once this operation is completed, a new data product release cycle is said to be started.
The commit operation is used to ensure that any changes to the release snapshot of a data product move only between consistent states. Use it when you want to reflect your changes to the release files managed by witboost. That means all changes to descriptors and configurations made in the repository will be reflected to the release snapshot descriptors and configurations (i.e. in the files inside the release folder) only when a commit is performed.
You can still have a preview descriptor of the changes you made before committing by selecting the HEAD release. Hence, HEAD can diverge from the snapshot if no commit operation is performed.
Modifying files inside the release folder is still allowed. However, this practice is highly discouraged, since all changes there will be overwritten by any further commit operation.
Deploy will take the data product descriptor and deploy the infrastructure to the selected environment. Before that, it will ensure that the descriptor is well-formatted and compliant with the defined policies for the given environment (you can test its validity against the data governance's policies, refer to DP Validation section). If all checks are passed and the deployment is successful, the data product is published into the marketplace, so that consumers can start using it.
By performing a commit before deploying, you will ensure that the data product descriptor with the latest changes is being sent to the provisioning coordinator.
The Data Product deployment will start the underlying deployment of all of its components, in a sequence defined by the "dependsOn" relations among them. After the deploy operation is invoked, it can happen that:
- the deployment fails with a validation error: in this case, the deployment did not start at all. The environment is not touched, and you can simply run the test again to see what the error was. Then after solving the error, you can just try to deploy everything again.
- one of the deployment steps fails. In this case, the Data Product is in a corrupted state, since some of its components may be already deployed and available, while others are not. In this case, witboost does not automatically roll back any of the already deployed components, since their deployment could require a long time or a significant amount of resources. From here the user can decide to solve the deployment problems and trigger another deployment operation, or to undeploy the whole data product (this will remove the components already deployed).
- the deployment completes with success. One of the steps that the Provisioner module performs is the registration of the deployed data product in the Marketplace. So from now on the data product can be visible to all the consumers that access the Marketplace.
Remember that the Marketplace is updated only by deploy operations, so you will need to perform another deploy operation in case you want to change the metadata visualized in the Marketplace.
Since all Specific Provisioners must be idempotent, deploy operations are idempotent as well, so you can always re-deploy an already deployed data product to update its metadata.
A data product is considered "deployed" in a target environment when all of its components are deployed, and it is registered in the Marketplace.
When changes are required to improve the data product or to fix some problems, the components' repositories (or the data product repository itself) can be edited accordingly, and a new data product release can be generated and deployed. Remember that only one deployment per data product and environment should be active at the same time, so if you are making some structural changes, you could decide to undeploy the current data product before deploying it again.
To delete a component, you can simply remove it from witboost as described in the previous section. Remember that this will not affect any deployed instance of the data product, so to make this deletion effective you should undeploy the current version and deploy it again. Similarly, when you want to completely delete a data product, you should first undeploy it, and then start removing each component entity, and then the data product entity itself.
When you are going to introduce changes that will break retro-compatibility on the contract that the data product exposes to its consumers (e.g. the output ports it exposes) a new version of the data product must be created. You can do that by accessing the upper-right menu of the data product and selecting
New version. This will create a completely new data product and components with a different major version. Those can be considered as completely new components belonging to a brand-new data product.
Release will promote the data product snapshot to a concrete release, by assigning it a final version naming to the data product.
After that, the release cycle is closed and the release gets frozen. To make further changes a new snapshot is needed.