Skip to main content

Overview

witboost is a modular product, which means that the single modules are decoupled, and they interact only with a specific set of contracts between them.

To enforce this, all the modules are deployed as containers in a Kubernetes cluster, and a UI is exposed to the users to let them interact more easily with all the modules' functionalities.

Requirements

witboost is a highly customizable platform that allows our customers to shape their Data Landscape on top of their cloud providers and technologies.

When installing witboost, the customer should provide:

  • a Kubernetes cluster with a dedicated namespace. This can be self-hosted or provided by a cloud provider, and this will be the cluster where all the product's containers will be deployed.
  • a Postgres Database.
  • a target versioning system where all the repositories will be hosted (e.g. GitLab). Right now GitLab, Bitbucket Server, Azure DevOps, and GitHub are supported.
  • a technical integration with the company organization's structure, like Active Directory or LDAP. This is needed to get users' and groups' details, so witboost can correctly identify the logged user.
  • (optionally) an Elastic Search cluster.

Setup

When installing all the desired components by deploying them in a Kubernetes cluster, there are some additional steps that we need to take into account. Here a list of the main setup operations required:

Database

There are three databases instances serving witboost for persistence purposes:

  • The witboost database (i.e. the Builder database): that is the core database of the Builder module and other internal modules of witboost (RBAC, catalog, scaffolder, search, etc.)
  • The marketplace database: that serves the Marketplace module
  • The provisioning_coordinator database: that server the Provisioning Coordinator microservice

You need to specify a retention policy for the audit table (that can be found in the Builder database). This table will keep growing since it contains all the operations performed by the logged user (REST API invocation and Hasura mutations). So you can check with the customer which kind of retention should be applied to this kind of data.

Environments

Environments must be configured the Builder module. After defining a set of all the desired environments, they must be added to the Builder configuration under mesh.builder.environments using the following criteria:

mesh:
builder:
environments:
- name: production
priority: 0
- name: staging
priority: 1
- name: development
priority: 2

or

mesh:
builder:
environments:
- production
- staging
- development

The priority number is needed to understand which environment should be considered the "production" one. The production environment should always have priority 0, whereas other environments should be assigned values greater than 0 for sorting purposes in the UI.

If only a list of environments as strings is provided, the priority will be assigned automatically based on the order in which they are listed (in the example, production will have priority 0, staging will have priority 1, and development will have priority 2).

note

Please note that external modules (e.g. marketplace, cgp) will pull the environments from the Builder module at startup.

Domains

Members of the governance team can define which domains belong to the Data Mesh. To do so, a new repository can be created to keep track of all the domains (you can also create multiple repositories, one per domain, or group them however you prefer). Since domains are Witboost entities as well, they are simply represented by catalog-info.yaml files.

One simple way of defining domains is to create first a new repository with the following structure:

catalog-info.yaml
domains/
├──────── finance.yaml
├──────── marketing.yaml
└──────── sales.yaml

Where the catalog-info.yaml file contains the list of files (one per domain) that should be imported:

apiVersion: backstage.io/v1alpha1
kind: Location
metadata:
name: all-domains
description: A collection of all Data Mesh domains
spec:
targets:
- ./domains/finance.yaml
- ./domains/marketing.yaml
- ./domains/organization.yaml

and all the single domain files simply contain the basic metadata, like for finance.yaml:

apiVersion: backstage.io/v1alpha1
kind: Domain
metadata:
name: finance
description: Everything related to finance
links:
- url: http://example.com/domain/finance/
title: Finance Domain
spec:
owner: 'group:datameshplatform'
mesh:
name: 'Finance'

When defining such files remember to use a unique value for the metadata.name field, a human readable display name in the spec.mesh.name field, and an existing group for the owner. The metadata.links can be used to add relevant links, like documentation or domain-related resources.

Subdomains

You can also specify a parent domain when registering a domain. To do so, add the following configuration:

spec:
subDomainOf: superDomain

The registered domain will be now considered a subdomain of "superDomain".

caution

If you register a domain catalog-info.yaml with a name that already exists, witboost will tell you that the domain is already registered. For this reason, if you edit the domain in a feature branch, and you want to add the changes to test them, remember to change the metadata name, or you will not see any change at all.

caution

If you register a new domain that is not a subdomain, you will not have permission to view it. In such cases, please refer to the RBAC section for instructions on how to create a new permission.

Once the repository is set, you can simply go to the import page (you can select "register existing component" from the Templates page of the Builder menu) and add the URL of the catalog-info.yaml file. Once this is done, you can simply change the repository without the need to import it again: Witboost periodically scans all repositories and will import the new domains it will find.

warning

It can happen that sometimes you change the domains definition but the changes are not reflected in witboost right away. Remember than you can always go to Builder > Software Catalog, unregister the location, and register it back by going in Builder > Templates. After this procedure you should be able to see all the domains defined in the catalog-info.yaml file.

Unregistering a domain

Since domains are also witboost entities, they can be unregistered from the platform by removing them from the builder. When unregistering a domain, the corresponding RBAC grants for users and groups on that domain will be automatically removed as well. If you later decide to register the domain back in witboost, you will have to grant users and groups all the needed permissions for that domain from scratch.

The builder entities registered in that domain will remain unaffected, but trying to deploy them will fail.

If in the marketplace there are entities belonging to the domain you unregistered they will remain in the marketplace, but in the 'Domain' column the string "Unassigned" will be displayed. Registering the domain again will restore the correct value.

To prevent all of the above, you can decide to block domain unregistration when there are entities in the builder belonging to the domain you are trying to delete. To do so, you can set the configuration:

catalog:
disableNonEmptyDomainUnregister: true

Resources

As for environments, external resources must be configured on both the Builder and the Marketplace modules. External resources are anything that Data Products could use (read from) that is not handled inside witboost, e.g. external databases, remote APIs or services, etc.

Resources are added to witboost as entities, in the same way defined for domains, simply register them with a structure like:

apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: marketing.external-postgres-db
description: Remote Postgresql database
links:
- url: http://example.com/postgres/db/
title: Postgres DB documentation
spec:
owner: 'group:datameshplatform'
domain: domain:marketing
mesh:
name: 'External Postgres DB'

Note that the resource's name under metadata.name should be structured as {domain}.{name} (e.g. marketing.external-postgres-db).

Then you must add the same resource also in the marketplace database, by adding a new row in the Resource table. The fields that you should fill are:

  • id: a progressive numeric number
  • name: the resource's name. You can copy it from the metadata.name of the entity
  • display_name: the resource's display name. You can copy it from the spec.mesh.name of the entity
  • fully_qualified_name: the resource's fully qualified name. You can copy it from the spec.mesh.name of the entity or add more details to it
  • description: the resource's description. You can copy it from the metadata.description of the entity
  • domain: display-only metadata for the domain of resource's domain (this and the following display-only fields are just informative, and completely optional)
  • kind: display-only metadata for the kind of resource's domain
  • technology: display-only metadata for the technology of resource's domain
  • platform: display-only metadata for the platform of resource's domain
  • descriptor: you can just put an empty json object {} since this field is not leveraged yet
  • created_at: you can use the default value (now)
  • updated_at: you can use the default value (now)
  • external_id: this is the core field of the resource, and it should be defined as urn:dmb:rsr:<domain>:<name>, where <domain> and <name> are taken from the metadata.name; e.g. if the metadata.name is finance.external-postgres-db, the external_id field will be urn:dmb:rsr:finance:external-postgres-db
note

Please note that as for environments, those two sets must be always kept aligned, so when new resources are used, they must be added to both modules.

By default, resources are not visible to anyone, except if the user is already allowed to read all entities within a given domain and the resource is part of that domain.

To restrict visibility of the resource to a specific domain (e.g. the domain it belongs to), fill or add the spec.domain field in the resource's descriptor, using the following format: domain:<domain> (e.g. domain:marketing). That is, if a group of users is allowed to read entities within that domain, they will now be able to see the resource entity in the catalog.

If there is the need of giving more granular access to resources, more restricted access can be created with Role-Based Access Control. To do so, simply add an association between the subject (user or group) and the resource's URN (e.g. for the example above, the association will have the following scope: urn:dmb:rsr:marketing:external-postgres-db). Head to the RBAC section of the documentation to get more details on how to create permissions.

caution

While it is possible to restrict visibility of resources in the Software Catalog of the Builder module, the same possibility is however not available for the Marketplace module.

Configurations

witboost configurations are stored in the app-config.yaml file. More info here

Git providers

witboost supports the following git providers to host and manipulate data products and components in their Git repositories:

  • GitLab
  • Bitbucket Server
  • Azure DevOps
  • GitHub
info

Can't find your git provider in the list? Contact us to request support for a different git provider.

To authorize read/write operations on repositories, witboost requires some configuration parameters (e.g. access tokens). There are two ways to pass git providers configurations:

  • Through the app-config: each user of witboost will use the same credentials to authorize operations
  • Through personal user settings (recommended): each user of witboost will use its own credentials to authorize operations

Configuring Git providers with app-config

In your app-config.yaml in the root folder of the project just place the lines below.

integrations:
usePersonalToken: false
gitlab:
- host: gitlab.com
token: <access token>
warning

Authentication parameters supplied by app-config are global. This means that all users of witboost use this token. Hence, the author of each operation on the Git provider performed through witboost is done by the owner of the configured token.

Configuring git providers through user settings requires the following steps:

  1. Define a technical user token with read-only permissions in your Git provider. The user generating this token should have the rights to view all repositories that you intend to use in witboost.
  2. In the app-config.yaml in the root folder of the project place your read-only token.
integrations:
usePersonalToken: true
gitlab:
- host: gitlab.com
token: <read-only access token>
info

The read-only token is needed because some witboost tasks running in background are non-user triggered operations, but they still require to read repositories to perform some refreshes.

  1. Inform witboost users to setup their own Access Token. Head to the user manual (Configuring Git Credentials for more details.

Platform Settings

Platform settings contain some configurations the platform team needs to enable in order to unlock some functionalities of Witboost. To enable platform settings, you must set mesh.platformSettings.enabled to true. Platform settings is currently used to register Access Control Request Templates import access control request template