Skip to main content

Anatomy of a Data Product Repository

When you create a new data product, hence using a template, witboost will create a repository into your configured git platform, and initialize it with some files. Below, an example repository is shown:

dp repository

For the sake of the example, our data product does not expose any output port, so that its descriptor is kept short.

info

A data product is often composed of several output ports, and those can either be stored in the same (mono)repository or be hosted in a completely different location for each output port. So, the files and folders shown above can differ.

Our repository contains:

  • docs/ folder: which, as the name suggests, will contain documentation files
  • environments/ folder: will host environment-specific configurations for the data product
  • releases/ folder: it is fully managed by Witboost and stores metadata and descriptors of released/ongoing releases.
  • README.md: it is left to you as a best practice to always inform on the contents of the repository
  • catalog-info.yaml: general metadata about your data product. This is the main file of this repository and the one that is used by Witboost to keep track of the whole entity.
  • mkdocs.yml: Witboost's metadata about documentation that is present under the docs folder, such as page listings

Later sections will show how each operation in the release lifecycle interacts with a repository.

caution

To avoid inconvenient situations, such as overwriting your data, we encourage you to not modify files and folders under the release folder.