Skip to main content

Overview

witboost provides some customizable parts that are left to developers to integrate with. Those integration points create the possibility to bring into the platform some technology-specific integrations and company-specific workflows or internal processes. A customer can implement and plug in the platform with the following component:

  1. Technology Adapter
  2. Marketplace
  3. Data Catalog
note

Even if the Provisioning Coordinator should not be replaced by a custom implementation it is included for sake of clarity and completeness.

Component diagram in the following picture:

API OVERVIEW

Further details about component interactions can be found in the Architecture section.

API Definitions

Are you looking for all available backend API definitions?

You can find them listed here: Full API definition (swagger)

PROVISIONING COORDINATOR API

The Provisioning Coordinator exposes 5 main endpoints:

  • POST /validate: the input YAML Data Product Descriptor is validated using all defined policies. The response could be OK or a list of validation errors. Validation is performed by creating the Provisioning Plan related to the input descriptor. Then the whole Descriptor is sent to the Validator which will validate the Federated Governance policies. Then each Descriptor section relative to a specific component is sent to the relative Technology Adapter for validation. This validation will check for syntactic and semantic rules. N.B. each time a Technology Adapter is invoked, the URL of the service is extracted using the Provisioner Registry that acts like a dictionary for components to URLs.

  • POST /deploy AND POST /undeploy: the first step of the deployment is to invoke the validation as performed for the validate endpoint. If validation was successful, then the plan is submitted to the internal scheduler. The Scheduler resolves the dependencies between the components by creating an execution DAG. The scheduler will invoke each Technology Adapter in the right order: the graph is created on explicit dependencies among components in the descriptor. Each invocation of the Technology Adapter could be synchronous (receiving a 200 with the provisioning result) or asynchronous (receiving a 202 with a token). In case of an asynchronous response, the scheduler will poll the Technology Adapter with the token received to get updates on the provisioning status. After all the Specific Provisioners operations are completed (or at least one failed) the status is updated on the internal database, and in case of success, the marketplace and the data catalog are updated with the deployed Data Product details. The whole endpoint is asynchronous, so after the validation, the control is returned to the caller with a token that can be used to check the request status.

  • GET /status AND GET /provisioningplan: these endpoints are used to get the provisioning status from a request token. The first one will return just generic details like the status and the global result. The second one will return the whole provisioning plan with a status associated with every single task. The provisioning plan and the statuses are extracted from the inner structures or the database for old requests.

  • /reverse-provisioning*: endpoints to start and monitor the Reverse Provisioning workflow.

  • /execution-plans*: endpoints to interact with the Coordinator's task scheduler and provisioning and reverse provisioning execution plans.

Complete API description can be found here

SPECIFIC PROVISIONER API

Once the Provisioning Coordinator has validated and compiled a Data Product Descriptor into a Provisioning Plan, it schedules the Provisioning Plan calling in the right sequence the appropriate Specific Provisioners. Involved Specific Provisioners receive a Provisioning Task for creating/deleting the specified Data Product Component. It is possible to extend the platform capabilities by implementing custom Specific Provisioners. Custom Technology Adapter must implement a set of APIs to be able to correctly interact with the Provisioning Coordinator:

  • POST /provision: This API is responsible for actually deploying a Data Product descriptor to a remote environment. It takes in the body parameter a YAML that represents a Data Product descriptor wrapped as a string into a simple object.
  • POST /unprovision: This API is responsible to undeploy an already-deployed resource. The request body is the same for the /provision API.
  • GET /provision/{token}/status: This API is responsible to fetch the current unprovision or provision status of a running plan.
  • POST /validate: This API is responsible to validate a Data Product descriptor. The request body is the same as /provision and /unprovision APIs.
  • POST /updateACL: This API is responsible to provide users and groups access to specific components.
  • /reverse-provisioning*: endpoints to start and monitor the Reverse Provisioning workflow.

Detailed API description can be found here

MARKETPLACE PLUGIN API

Once the Provisioning Plan compiled by the Provisioning Coordinator has been successfully executed by the Specific Provisioners involved, it invokes a Marketplace Plugin endpoint to update the Marketplace database. It is possible to plug a customer Marketplace by implementing the following APIs:

  • POST /insert-provisioning-results: updates the Marketplace with new data product info
  • POST /delete: removes every reference of a data product from the Marketplace
  • PATCH /update-acl: updates the ACL info of a data product in the Marketplace

Detailed API description can be found here

DATA CATALOG PLUGIN API

  • POST /validate: Formally validate the provisioning request for the output ports metadata. In details, validates the format and the existence of glossary terms and of classification tags
  • POST /provision: Provisioning of metadata in Purview Data Catalog from input descriptor
  • POST /unprovision: Unprovisioning of metadata in Purview Data Catalog from input descriptor
  • GET /provision/{token}/status: Get the status for a provisioning request
  • GET /entity/reference: Returns references (id, links, etc) to the data catalog entity that refers to the provided output port id

Detailed API description can be found here

The Data Product Descriptor flow from the Provisioning Coordinator to a Technology Adapter

The Provisioning Coordinator takes in input a YAML that represents a Data Product Descriptor (Figure 1) and creates a plan object that contains multiple tasks, in the following way:

  • parameter infrastructureTemplateId is optional in the Descriptor header. If it appears there, a Data Product level provisioning (or unprovisioning) task is added to the plan
  • parameter infrastructureTemplateId is mandatory in each component specification. For each component listed in the descriptor, a component provisioning (or unprovisioning) task is added to the plan

Each component can depend on other components, so, if we want to deploy a component A that has a dependsOn constraint to another component B, it needs to deploy component B first and then component A. Instead, the undeploy case expects the reverse order of execution. Also notice that, when a Data Product level provisioning task is required, then it must be performed:

  • [provisioning case] before all the provisioning tasks of the components
  • [unprovisioning case] after all the unprovisioning tasks of the components

Therefore, in that case, some implicit dependencies arise in the plan between the Data Product level task and each component task.

Let's define a basic Data Product Descriptor:

name: Marketing-Invoice-1
environment: dev
description: Dataproduct invoice
owner: group:bigdata
domain: domain:marketing
type: dataproduct
email: contact@example.com
version: 1.0.0
fullyQualifiedName: InvoiceDataProduct
displayName: Invoice
informationSLA: Info
maturity: Tactical
billing: {}
tags: []
id: urn:dmb:dp:marketing:marketing-invoice:1
infrastructureTemplateId: urn:dmb:itm:my-cdp-resource:1.0.0
useCaseTemplateId: urn:dmb:utm:my-dp-prov-template-id:1.0.0
specific:
workspace: marketing
components:
- name: marketing.Marketing-Invoice-S3-Ingestion-Data-A-1.1
id: urn:dmb:cmp:marketing:marketing-invoice:1:marketing-invoice-s3-ingestion-data-a
description: Marketing Invoice S3 Ingestion Data
owner: group:bigdata
domain: domain:marketing
type: outputport
fullyQualifiedName: Marketing Invoice S3 Ingestion Data
displayName: Invoice S3 Input
resourceType: raw
platform: CDP_AWS
technology: CDP_S3
version: 1.0.2
creationDate:
startDate:
processDescription: Underlying process
billingPolicy: Billing
securityPolicy: Sensible information
consumerPolicy: Effectivly consumption of the data
slo: {}
intervalOfChange: {}
timeliness: {}
endpoint: endpoint
allow:
- group:test
dependsOn:
- urn:dmb:cmp:marketing:marketing-invoice:1:marketing-invoice-s3-ingestion-data-b
tags: []
sampleData: {}
semanticLinking: {}
specific:
cdpEnvironment: CDPenv
schema: {}
location: DirectoryName
infrastructureTemplateId: urn:dmb:itm:cdp-aws-s3:1.0.0
useCaseTemplateId: urn:dmb:utm:template-id-1:1.0.0

The coordinator takes the input descriptor and builds a plan for provisioning it, that consists of a directed acyclic graph composed by multiple tasks. Each task is responsible for a single operation, and in particular there will be at least one task for each component that needs to be provisioned. Also, in configured, the plan will contain a task that will invoke a technology adapter for the data product's provisioning; the technology adapter that receives this provisioning request will perform operations at data product level (e.g. the company needs to store an entry for each data product created in a database).

Each provision (or unprovision) task built and planned by the coordinator has the accountability of preparing a well-defined yaml that will be forwarded when invoking the analogous technology adapter API. The Provisioning request sent will be composed of:

  • descriptorKind: an enumeration that can have three different values:
    • DATAPRODUCT_DESCRIPTOR - It is used in the data product level provisioning workflow.
    • COMPONENT_DESCRIPTOR - Provisioning descriptor for a single data product component.
    • DATAPRODUCT_DESCRIPTOR_WITH_RESULTS - This value is not currently used in the scope of a technology adapter.
  • descriptor: the descriptor specification in yaml format. Its structure changes according to descriptorKind:
    • when the kind is DATAPRODUCT_DESCRIPTOR - This is just the complete descriptor of the data product.
    • when the kind is COMPONENT_DESCRIPTOR - An object that includes both the complete data product descriptor (in the dataProduct object field) and the id of the component to be provisioned (in the componentIdToProvision string field).
  • latestEnrichedDescriptor: the complete data product descriptor (YAML format), enriched with provisioning info provided by the specific provisioners during the latest (successful or failed) provisioning/unprovisioning operation for each component.
  • removeData: if true, it means that when a component is undeployed, its underlying data will also be deleted

The following descriptor shows how the above Data Product descriptor is sent to be provisioned to the technology adapter associated to components's id urn:dmb:cmp:marketing:marketing-invoice:1:marketing-invoice-s3-ingestion-data-a:

descriptorKind: COMPONENT_DESCRIPTOR
descriptor:
componentIdToProvision: urn:dmb:cmp:marketing:marketing-invoice:1:marketing-invoice-s3-ingestion-data-a
dataProduct:
name: Marketing-Invoice-1
environment: dev
description: Dataproduct invoice
owner: group:bigdata
domain: domain:marketing
type: dataproduct
email: contact@example.com
version: 1.0.0
fullyQualifiedName: InvoiceDataProduct
displayName: Invoice
informationSLA: Info
maturity: Tactical
billing: {}
tags: []
id: urn:dmb:dp:marketing:marketing-invoice:1
infrastructureTemplateId: urn:dmb:itm:my-cdp-resource:1.0.0
useCaseTemplateId: urn:dmb:utm:my-dp-prov-template-id:1.0.0
specific:
workspace: marketing
components:
- name: marketing.Marketing-Invoice-S3-Ingestion-Data-A-1.1
id: urn:dmb:cmp:marketing:marketing-invoice:1:marketing-invoice-s3-ingestion-data-a
description: Marketing Invoice S3 Ingestion Data
owner: group:bigdata
domain: domain:marketing
type: outputport
fullyQualifiedName: Marketing Invoice S3 Ingestion Data
displayName: Invoice S3 Input
resourceType: raw
platform: CDP_AWS
technology: CDP_S3
version: 1.0.2
creationDate:
startDate:
processDescription: Underlying process
billingPolicy: Billing
securityPolicy: Sensible information
consumerPolicy: Effectivly consumption of the data
slo: {}
intervalOfChange: {}
timeliness: {}
endpoint: endpoint
allow:
- group:test
dependsOn:
- urn:dmb:cmp:marketing:marketing-invoice:1:marketing-invoice-s3-ingestion-data-b
tags: []
sampleData: {}
semanticLinking: {}
specific:
cdpEnvironment: CDPenv
schema: {}
location: DirectoryName
infrastructureTemplateId: urn:dmb:itm:cdp-aws-s3:1.0.0
useCaseTemplateId: urn:dmb:utm:template-id-1:1.0.0
latestEnrichedDescriptor:
name: Marketing-Invoice-1
environment: dev
description: Dataproduct invoice
owner: group:bigdata
domain: domain:marketing
type: dataproduct
email: contact@example.com
version: 1.0.0
fullyQualifiedName: InvoiceDataProduct
displayName: Invoice
informationSLA: Info
maturity: Tactical
billing: {}
tags: []
id: urn:dmb:dp:marketing:marketing-invoice:1
infrastructureTemplateId: urn:dmb:itm:my-cdp-resource:1.0.0
useCaseTemplateId: urn:dmb:utm:my-dp-prov-template-id:1.0.0
specific:
workspace: marketing
components:
- name: marketing.Marketing-Invoice-S3-Ingestion-Data-A-1.1
id: urn:dmb:cmp:marketing:marketing-invoice:1:marketing-invoice-s3-ingestion-data-a
description: Marketing Invoice S3 Ingestion Data
owner: group:bigdata
domain: domain:marketing
type: outputport
fullyQualifiedName: Marketing Invoice S3 Ingestion Data
displayName: Invoice S3 Input
resourceType: raw
platform: CDP_AWS
technology: CDP_S3
version: 1.0.2
creationDate:
startDate:
processDescription: Underlying process
billingPolicy: Billing
securityPolicy: Sensible information
consumerPolicy: Effectivly consumption of the data
slo: {}
intervalOfChange: {}
timeliness: {}
endpoint: endpoint
allow:
- group:test
dependsOn:
- urn:dmb:cmp:marketing:marketing-invoice:1:marketing-invoice-s3-ingestion-data-b
tags: []
sampleData: {}
semanticLinking: {}
specific:
cdpEnvironment: CDPenv
schema: {}
location: DirectoryName
infrastructureTemplateId: urn:dmb:itm:cdp-aws-s3:1.0.0
useCaseTemplateId: urn:dmb:utm:template-id-1:1.0.0
removeData: false
note

The technology adapter for Data Product level provisioning must be different from "ordinary" specific component provisioners, although they expose the contract described above. In particular, it must be able to expect the whole original DP descriptor and perform operations that are shared for the whole data product. Specific operations at components level will be handled by the other specific provisioners.

HTTP headers forwarding

When the Builder module calls the Provisioning Coordinator, it includes the following HTTP headers:

  • Authorization header: Bearer JWT token that identifies the user on the Builder module
  • (optional) User headers: HTTP headers defined by the user (see User headers and User header configurations)

Depending on the Provisioning Coordinator's headers forwarding configuration settings, these headers can be further propagated to the other components (Technology Adapter, Marketplace, Data Catalog).