Getting started
Let's take a look at how you can start your Data Mesh journey by creating a Data Product in witboost.
Prerequisites
As this is only an introductory overview, you do not need to take any action during this walkthrough. But in general, to start using the platform, you should already:
- Be able to log in to witboost through your credentials or authentication provider
- Have configured Git credentials in your settings OR A global access token has been configured for all witboost users
- Have at least one Domain defined in the platform
Creating my first Data Product
To create a Data Product, we start with the Builder module
- Go to
Builder
>Templates
to get the list of available Builder templates.
- Choose
Data Product
template. this is the most important template in witboost, and you will always find it in the list.
- Fill the form with the information required, by following the wizard. If you have doubts about how to
fill
Choose a location
section, you can take a look here
Domain names are always transformed to lowercase to keep the same format and ensure proper linking between dependent entities.
- At the end of the wizard, you can review some of the information filled in the fields. Let's click on
Create
. This will start the scaffolding process that will create a new repository will all needed files.
Detailed information about each field can be found in the Data Product Specification page
Remember that you must not fill in all the metadata right away, since you can always access the catalog-info.yaml
file
directly from the repository and edit it there.
Details can be found on Update Data Product Metadata page](/docs/p1_user/p6_advanced/p6_4_update_dp_meta)
The data product is now listed on My Data Products
page. When opening its details page, you should be able to see its
generic information, and the dependencies between that entity and the other ones (e.g. domain, user, etc).
Creating Output Ports for my Data Product
Let's make the Data Product consumable by other domains and users. We will do so by creating an output port.
- Get back to the list of templates by clicking on
Builder
>Templates
. From there, select any of the listed templates. For the sake of the example, we will selectInternal Storage S3 Template
.
- Again, let's fill in all information needed, as we did before. This time, make sure to link the Output Port to the right Data Product:
- After following all steps in the wizard, click on
Create
If you now go back to your Data Product (Builder
> My Data Products
> <Your Data Product>
), you will see in
the Relations
panel that there will be an Output Port attached to it. In our case, this Output Port will be an S3
Storage:
In the example shown in the picture, there is more than one output port attached to the data product.
If you need to import metadata from an infrastructure service like AWS into your output port, the Reverse Provisioning feature provides a convenient solution. It enables you to effortlessly integrate the desired metadata, potentially saving you a significant amount of time. You can find more information about it in the Reverse Provisioning documentation.
Deploying my first Data Product
After implementing all necessary logic, configurations, and documentation, it's now time to deploy our first Data Product.
We may also want to get a full descriptor preview and eventually check if my Data Product is compliant with all governance policies defined in the platform.
Let's start by taking a look at a full preview descriptor, to do so:
- Go to
Builder
>My Data Products
><Your data product>
>Control Panel
tab - Select a target environment, for this example, we want to deploy our Data Product in our development environment
- After selecting an environment you will see the full preview descriptor in the
Preview | Test
box:
If you didn't notice any typos or errors in the descriptor, it's time to test it against governance policies!
To do so, from the Preview | Test
panel:
- Click on
Launch test
and let's see the results:
In our case, that's all fine! Great. This means our Data Product is compliant with our organization's Governance Policies and we can proceed to the Deploy step.
If instead, there is any failing check, you will notice a red mark describing the status of the test.
To inspect the error, click on the test and you will get a list of policy checks performed. The policy checks marked in
red as KO
means they failed.
For each one, click on the failing check to get the details of the error, then fix it.
You can find more details about how to inspect policy errors in
the Managing Policies section.
Deploying my first Data Product
To deploy a Data Product, you will first need a deployment unit: a Snapshot.
Let's see how we can do so and finally deploy our Data Product:
Still from the Control Panel (
Builder
>My Data Products
><Your data product>
>Control Panel
tab)Select the target environment where you want to deploy the Data Product
In the
Release | Deploy
panel, click onNew Snapshot
. This will create a new snapshot right away:
- Now you can try to deploy it and see if everything works: click on
Deploy
. This will start the deployment operation. Each circle indicates the deployment status of each component, including the data product itself.
- If everything went right, you will see all circles turning green, with a checkmark.
If the deployment is successful, the data product gets published to the marketplace, and it's available to be consumed by other domains.
If you are not happy with your current version of the Data Product, or you deployed something misconfigured, or simply
you want to change some logic, you will need to first make changes on the components or to the data product itself,
then, you should update the deployment unit before deploying it again. To update the deployment unit, just click on Commit
.
If you want more details about commit/release/deploy/snapshot operations, take a look at the deploy guide
Consuming Data Products
We have briefly seen how to build and work with a Data Product from scratch. Then we understood that after a deployment is performed, our data product is published in the marketplace to be consumed by other data teams within the organization. How can we consume a Data Product in witboost?
To do so, we will use the Marketplace module:
- Go to
Marketplace
>DP Catalog
. This page contains a list of all published Data Products within the organization. You can filter them by environment or by typing some search keywords in the top-right corner.
OR
- Go to
Marketplace
>DP Graph
. The graph visualization shows all relationships between the data products published within your organization.
- Click on the data product you are interested in. In this example, we will consider the
Finance Customer
data product for the production environment.
From this page you will have a set of useful information like general information, dependencies, reviews, questions and answers and available output ports.
We are interested in consuming this data product. Let's get some information about the available output ports. To do so:
- Scroll down to the
Output Ports
panel, here you will see the list of available output ports for the data product.
- To get some details on any output port, pass the mouse over it, and click on the
View Details
button
You will land in the output port page, with some insightful information.
- We decided to consume this S3 storage output port. To do so, let's request access to it, from the
Output Ports
panel:
- You will be prompted to insert some data to let the Data Product owner be able to grant access to you or your team in case the request is accepted:
- Click
Send
to finalize the request!
Visualizing our Data Mesh
Graph visualization
The marketplace module gives you the possibility to visualize, in the form of a Graph, all relationships between data
products published in the Marketplace.
This graph visualization is available in Marketplace
> DP Graph
.
Each node of the graph represents a data product. Each circle, instead, is a domain. A circle can then contain other circles (i.e. sub-domains) or nodes (i.e. data products).
As you can notice, no relationship is represented in the graph yet, hence there is no edge connecting one or more nodes.
Now, click on any node:
In the example image, Customer Invoice shows that it consumes an output port from Power Generation, while it is consumed by Customer Touchpoint and Power Generation. This underlines a possible circular dependency between the two that should be investigated further.
Meanwhile, the navigation panel on the right side, helps us to now lose track of where we are when navigating the graph visualization.
Mesh Supervision
Mesh Supervision is another tool to inspect how Data Mesh is going inside our organization.
It provides a dashboard to the board of directors to make informed decisions through the visualization of KPIs. As well as
to inform Platform Team about the quality that Data Teams are bringing into the Data Mesh.
Mesh Supervision can be found in Marketplace
> Mesh Supervision
.