Appirio uses a hosted instance of GitLab to which we don’t have direct admin access. Our primary CI job runner ran out of disk space causing CI jobs to fail.
We responded by creating another CI runners on the same hosted service, but this runner encountered errors that indicated a misconfiguration that we couldn’t fix without admin access.
Our hosted service provider has reduced their SLA to 1 business day, and they did not respond to this High Severity incident as quickly as they have in the past.
After the incident had persisted for several hours without a response from the service provider, we decided to create a CI runner on a separate service that we had admin access to. That resolved the outage.
Our plan for some time has been to move to a fully self-hosted GitLab instance, and this outage gives us additional motivation for that. In addition, we are now running multiple redundant CI runners to provide failover support.