Skip to main content

Getting started

Let's take a look at how you can start your Data Mesh journey by creating a Data Product in witboost.

Prerequisites

As this is only an introductory overview, you do not need to take any action during this walkthrough. But in general, to start using the platform, you should already:

  • Be able to log in to witboost through your credentials or authentication provider
  • Have configured Git credentials in your settings OR A global access token has been configured for all witboost users
  • Have at least one Domain defined in the platform

Creating my first Data Product

To create a Data Product, we start with the Builder module

  1. Go to Builder > Templates to get the list of available Builder templates.

witboost Menu

  1. Choose Data Product template. this is the most important template in witboost, and you will always find it in the list.

Templates page

  1. Fill the form with the information required, by following the wizard. If you have doubts about how to fill Choose a location section, you can take a look here
info

Domain names are always transformed to lowercase to keep the same format and ensure proper linking between dependent entities.

Template Data Product wizard

  1. At the end of the wizard, you can review some of the information filled in the fields. Let's click on Create. This will start the scaffolding process that will create a new repository will all needed files.
tip

Detailed information about each field can be found in the Data Product Specification page

Remember that you must not fill in all the metadata right away, since you can always access the catalog-info.yaml file directly from the repository and edit it there.

Details can be found on Update Data Product Metadata page](./p6_advanced/p6_4_update_dp_meta.md)

The data product is now listed on My Data Products page. When opening its details page, you should be able to see its generic information, and the dependencies between that entity and the other ones (e.g. domain, user, etc).

Creating Output Ports for my Data Product

Let's make the Data Product consumable by other domains and users. We will do so by creating an output port.

  1. Get back to the list of templates by clicking on Builder > Templates. From there, select any of the listed templates. For the sake of the example, we will select Internal Storage S3 Template.

S3 template

  1. Again, let's fill in all information needed, as we did before. This time, make sure to link the Output Port to the right Data Product:

Link component

  1. After following all steps in the wizard, click on Create

If you now go back to your Data Product (Builder > My Data Products > <Your Data Product>), you will see in the Relations panel that there will be an Output Port attached to it. In our case, this Output Port will be an S3 Storage:

Relations panel

In the example shown in the picture, there is more than one output port attached to the data product.

tip

If you need to import metadata from an infrastructure service like AWS into your output port, the Reverse Provisioning feature provides a convenient solution. It enables you to effortlessly integrate the desired metadata, potentially saving you a significant amount of time. You can find more information about it in the Reverse Provisioning documentation.

Deploying my first Data Product

After implementing all necessary logic, configurations, and documentation, it's now time to deploy our first Data Product.

We may also want to get a full descriptor preview and eventually check if my Data Product is compliant with all governance policies defined in the platform.

Let's start by taking a look at a full preview descriptor, to do so:

  1. Go to Builder > My Data Products > <Your data product> > Control Panel tab
  2. Select a target environment, for this example, we want to deploy our Data Product in our development environment
  3. After selecting an environment you will see the full preview descriptor in the Preview | Test box:

Control Panel preview descriptor

If you didn't notice any typos or errors in the descriptor, it's time to test it against governance policies!

To do so, from the Preview | Test panel:

  1. Click on Launch test and let's see the results:

policies all ok

In our case, that's all fine! Great. This means our Data Product is compliant with our organization's Governance Policies and we can proceed to the Deploy step.

If instead, there is any failing check, you will notice a red mark describing the status of the test. To inspect the error, click on the test and you will get a list of policy checks performed. The policy checks marked in red as KO means they failed. For each one, click on the failing check to get the details of the error, then fix it.

Deploying my first Data Product

To deploy a Data Product, you will first need a deployment unit: a Snapshot.

Let's see how we can do so and finally deploy our Data Product:

  1. Still from the Control Panel (Builder > My Data Products > <Your data product> > Control Panel tab)

  2. Select the target environment where you want to deploy the Data Product

  3. In the Release | Deploy panel, click on New Snapshot. This will create a new snapshot right away:

new snapshot

  1. Now you can try to deploy it and see if everything works: click on Deploy. This will start the deployment operation. Each circle indicates the deployment status of each component, including the data product itself.

deploy

  1. If everything went right, you will see all circles turning green, with a checkmark.

deploy success

If the deployment is successful, the data product gets published to the marketplace, and it's available to be consumed by other domains.

If you are not happy with your current version of the Data Product, or you deployed something misconfigured, or simply you want to change some logic, you will need to first make changes on the components or to the data product itself, then, you should update the deployment unit before deploying it again. To update the deployment unit, just click on Commit.

If you want more details about commit/release/deploy/snapshot operations, take a look at the deploy guide

Consuming Data Products

We have briefly seen how to build and work with a Data Product from scratch. Then we understood that after a deployment is performed, our data product is published in the marketplace to be consumed by other data teams within the organization. How can we consume a Data Product in witboost?

To do so, we will use the Marketplace module:

  1. Go to Marketplace > DP Catalog. This page contains a list of all published Data Products within the organization. You can filter them by environment or by typing some search keywords in the top-right corner.

Marketplace catalog

OR

  1. Go to Marketplace > DP Graph. The graph visualization shows all relationships between the data products published within your organization.

Marketplace graph

  1. Click on the data product you are interested in. In this example, we will consider the Finance Customer data product for the production environment.

marketplace dp page

From this page you will have a set of useful information like general information, dependencies, reviews, questions and answers and available output ports.

We are interested in consuming this data product. Let's get some information about the available output ports. To do so:

  1. Scroll down to the Output Ports panel, here you will see the list of available output ports for the data product.

list of output ports

  1. To get some details on any output port, pass the mouse over it, and click on the View Details button

view details button

You will land in the output port page, with some insightful information.

details page

  1. We decided to consume this S3 storage output port. To do so, let's request access to it, from the Output Ports panel:

access request

  1. You will be prompted to insert some data to let the Data Product owner be able to grant access to you or your team in case the request is accepted:

access request modal

  1. Click Send to finalize the request!

Visualizing our Data Mesh

Graph visualization

The marketplace module gives you the possibility to visualize, in the form of a Graph, all relationships between data products published in the Marketplace. This graph visualization is available in Marketplace > DP Graph.

Marketplace graph

Each node of the graph represents a data product. Each circle, instead, is a domain. A circle can then contain other circles (i.e. sub-domains) or nodes (i.e. data products).

As you can notice, no relationship is represented in the graph yet, hence there is no edge connecting one or more nodes.

Now, click on any node:

nodes relationships

In the example image, Customer Invoice shows that it consumes an output port from Power Generation, while it is consumed by Customer Touchpoint and Power Generation. This underlines a possible circular dependency between the two that should be investigated further.

Meanwhile, the navigation panel on the right side, helps us to now lose track of where we are when navigating the graph visualization.

Mesh Supervision

Mesh Supervision is another tool to inspect how Data Mesh is going inside our organization. It provides a dashboard to the board of directors to make informed decisions through the visualization of KPIs. As well as to inform Platform Team about the quality that Data Teams are bringing into the Data Mesh. Mesh Supervision can be found in Marketplace > Mesh Supervision.

mesh supervision