Skip to main content

Coordinator Errors

The Provisioning Coordinator is a fairly complex service, handling user inputs, scheduling tasks and communicating with several other microservices to achieve its task of deploying the resources you want to put into the world. Should a malfunction occur we must be prepared for it, knowing which actions to take and to help the user understand if they can solve the problem by themselves. The Provisioning Coordinator handles its errors under this philosophy.

Currently, the Provisioning Coordinator manages API requests that may return an OK result or a failure result under 4xx or 5xx HTTP error codes. Moreover, each returned error follows a nested structure in order to give the largest amount of information for both users and platform team members to understand what went wrong and to reduce the time needed to fix a certain issue, avoiding to receive unformatted strings, stack traces or json dumps that then need to be analyzed by a platform team member but might be incomprehensible for the user that initially sees it. The full error structure can be seen below:

Error:
required:
- code
- userMessage
- moreInfo
type: object
properties:
code:
type: string
description: Internal error code for the thrown error
userMessage:
type: string
description: User-readable message to be displayed
input:
type: string
description: Optional field to include the file or descriptor that raised the error
inputErrorField:
type: string
description: Optional field to include the field path (in dot format) that raised the error
moreInfo:
$ref: "#/components/schemas/ErrorMoreInfo"

ErrorMoreInfo:
required:
- problems
- solutions
type: object
description: Object that will include the more in-depth, specific information about the error
properties:
problems:
type: array
description: "Array of possible multiple problems: i.e. multiple validations failed"
items:
type: string
solutions:
type: array
description: Array of possible solutions that the developer gives to the user to solve the issue
items:
type: string

However, if you check the API endpoints of the Provisioning Coordinator, you will find a similar but not equal structure, designed in order to maintain retro-compatibility with old versions of the other services. At API level there are two kinds of errors: ValidationError and SystemError, but if you look closely they share the same Error structure, with only one exception based on the deprecated error and errors fields. The current version of the Provisioning Coordinator is retro-compatible with older versions of errors used by the platform, so these fields are kept to ensure the API contract doesn't break.

API error body
ValidationError:
required:
- errors
type: object
properties:
errors:
type: array
deprecated: true
items:
type: string
code:
type: string
description: Internal error code for the thrown error
userMessage:
type: string
description: User-readable message to be displayed
input:
type: string
description: Optional field to include the file or descriptor that raised the error
inputErrorField:
type: string
description: Optional field to include the field path (in dot format) that raised the error
moreInfo:
$ref: "#/components/schemas/ErrorMoreInfo"

SystemError:
required:
- error
type: object
properties:
error:
type: string
deprecated: true
code:
type: string
description: Internal error code for the thrown error
userMessage:
type: string
description: User-readable message to be displayed
input:
type: string
description: Optional field to include the file or descriptor that raised the error
inputErrorField:
type: string
description: Optional field to include the field path (in dot format) that raised the error
moreInfo:
$ref: "#/components/schemas/ErrorMoreInfo"

ErrorMoreInfo:
required:
- problems
- solutions
type: object
description: Object that will include the more in-depth, specific information about the error
properties:
problems:
type: array
description: "Array of possible multiple problems: i.e. multiple validations failed"
items:
type: string
solutions:
type: array
description: Array of possible solutions that the developer gives to the user to solve the issue
items:
type: string
caution

Since the error and errors fields come from older versions of the Provisioning Coordinator in the future these fields will not be supported anymore. Please avoid using them to show error information and migrate all services you might have to retrieve the information from the new fields. All the information contained in these fields is already available on other sections of the error.

The other fields are designed to contain the following information:

fieldrequiredtypedescriptionexample
code*stringInternal code for the error definition"COR_PARSE_DESCR_1"
userMessage*stringUser-readable message to be displayed"Error while parsing descriptor. DataProduct descriptor couldn't be parsed"
inputstringOptional field to include the file or descriptor that raised the error(YAML file of a data product descriptor)
inputErrorFieldstringOptional field to include the field path (i.e. specific.name) that raised the error"kind"
moreInfo*Object that will include the more in-depth, specific information about the error
moreInfo.problems*[string]Array of possible multiple problems: i.e. multiple validations failed. The array can be empty["Deployment unit kind not supported"]
moreInfo.solutions*[string]Array of hints or possible solutions that the Provisioning Coordinator developer gives to the user to solve the issue. The array can be empty["Check that the descriptor is well defined with all the required values", "Check that the input field 'kind' exists and its format and value are correct"]

As you can see, the error is designed with the goal of being user-friendly, as they will be the primary receivers of the errors, specially on the UI. This is even more important when considering that several of the errors thrown by the coordinator are because of inputs that are not well formatted, or an essential step of the process that was skipped, which are all things that the user themselves can fix without intervention of the platform team.

In the occasions that the error was not caused directly by the user, but by an internal error or misconfiguration of the service or request, the error will correctly guide the user into contacting the platform team. The team member, with all the error information plus the tools they have to manage the platform, will be able to quickly help the user and unblock them as soon as possible.

Error Categories

For the Provisioning Coordinator, errors are divided into categories based on the processes it performs. This way, the following categories have been identified, and they're reflected on the error codes you will receive:

Coordinator Error Categories (COR)

COR_TEMPL: Template errors

Errors about the registering of infrastructure templates and use case templates on the Provisioning Coordinator.

  • COR_TEMPL_VAL_#: Template validation errors. When the request is ill-formed or the values are not compliant with the formats requested for the IDs and the versions. When calling the templates register endpoints, be sure that the values comply with the format defined on the Provisioning Coordinator configuration by the platform team, and that the version of the template is equal to the version present in the template ID.

COR_PROV: Provisioning errors

Errors that occur during the deployment lifecycle of a resource. Since the deployment validates the to-be deployed resource, this can also throw validation errors.

  • COR_PROV_VAL_#: Validation errors. When the validation of the to-be deployed resource fails. These validations can be either from the Provisioning Coordinator, the Specific Provisioner validations or the Computational Governance Platform validations. Often these errors can be fixed by the user themselves by checking what validation failed and fixing the resource to be compliant. However, these validations can also be thrown because of misconfiguration of the services involved in the validation of a resource, so it's important to check the details of the error to understand what went wrong and how to fix it.
  • COR_PROV_DEPL_#: Deployment errors. Errors that occur when deploying because of invalid tasks, actions performed on invalid or not existent components like granting access permissions on an invalid component, etc.
  • COR_PROV_UNDEPL_#: Undeployment errors: Errors that occur when undeploying a resource. Different errors can occur, so it's important to check the details of the error to understand what went wrong and how to fix it.
  • COR_PROV_GEN_#: General errors. Other errors not directly tied with a step of the provisioning lifecycle but that can occur in several occasions. These include not found errors, URI resolving errors and platform configuration errors. It's important to check the details of the error to understand what went wrong, who can fix it (user or platform team member) and how to fix it.

COR_PARSE: Parsing errors

Validation errors when parsing the user inputs. They usually contain the input that is being parsed and explicit information on what couldn't be correctly parsed both at input level or at field level.

  • COR_PARSE_DESCR_#: Descriptor parsing errors. Parsing a component or data product descriptor resulted in an error, either because the input is not a valid YAML/JSON, a field is missing, or it contains an invalid value. The error details will contain the information necessary for the user to understand what went wrong and eventually how to fix it.
  • COR_PARSE_URL_#: URL parsing errors. Parsing an URL used for calling an external service failed either because it's empty or has an invalid format. This usually happens when the registering of a infrastructure template introduced a wrong URL. It's important to check the details of the error to understand what went wrong, who can fix it (user or platform team member) and how to fix it.

COR_SCHED: Scheduler errors

Errors when scheduling the execution plan for validation, deployment or undeployment. These errors are directly related to the task built to execute the plan and the dependencies defined among them.

  • COR_SCHED_DEP_#: Task dependency errors. There were dependencies errors while scheduling the execution plan
  • COR_SCHED_SYS_#: DAG and scheduler errors. Errors while scheduling the execution plan. This could happen for example if the execution plan has some dependencies that makes it not acyclic (not a Directed Acyclic Graph). It's important to check the details of the error to understand what went wrong, who can fix it (user or platform team member) and how to fix it.

COR_GEN: Generic errors

Some errors don't fit into a specific category, either because they are transversal to the Provisioning Coordinator, they're too specific for a category, or they are unknown errors not handled by the other categories. There are two categories of generic errors based on the reason of the error: Error based on validating what the user sent as input, or internal system errors. For all of these errors it's important to check the details of the error to understand the specific error instance, what went wrong, who can fix it (user or platform team member) and how to fix it.

  • COR_GEN_SYS_#: Generic errors
  • COR_GEN_VAL_#: Generic validation errors

Others - External services

The Provisioning Coordinator contacts other services during the provisioning lifecycle of the resource. These external services can als return errors that the Provisioning Coordinator should handle. However, since they're outside the direct control of the platform, is not possible to enforce a set of error categories to them, neither define an internal error code categorization to the received errors, as these can mutate much more easily than the internal coordinator errors.

Because of this, we introduce a set of default error codes for the services called by the Provisioning Coordinator, adding it to the returned error so that the end user understands the source of the error. The only exception to this is the Computational Governance Platform. Being a service under the control of the platform team, it has its own error code categorization and thus the Provisioning Coordinator propagates this error as-is.

  • SP_ERROR: Default code for responses received from Specific Provisioners calls.
  • EXT_ERROR: Default code for responses received from other external services except CGP.