The outage started at 8/19/2017, 2:11 AM and ended at 09:30 AM same day.
We succeeded to recover access to all of the data at 3:00 PM Saturday.
Due to the outage the week before, a manual configuration error was made which was auto-corrected by our self-healing system a week later.
This autocorrect caused the system to stop responding.
Confusing error messages sent our investigation teams down an incorrect route, leading to time being spent trying to solve a problem that didn’t exist due to our hypersensitivity to this component from failing a week prior.
Once the teams realised what the issue was it was fixed almost immediately.
Damage Assessment
Throughout the duration of the incident no transactions were able to be processed.
All calls to any financial action (authorize, capture, charge, refund, void) will have received a “500” error from our API.
Investigation and Findings
We had a wrong configuration in an internal framework caused by human error whilst manually changing values in the incident last week - we should never configure things manually.
We had a corrupted backup which was undetected, and was one we attempted to use, which lengthened our investigation.
Action Plan
Create further tests to validate component backup procedure. - COMPLETED 20/08/2017
Create tests for all known backups we have taken and check integrity of the backup. - COMPLETED 20/08/2017
Automate the deployment of fixes/changes to core frameworks. - COMPLETED 20/08/2017
Review investigation and deployment processes during investigation. - COMPLETED 21/08/201
Posted Aug 22, 2017 - 06:46 UTC
Resolved
Our Payments API was unavailable due to an internal issue with processor account credentials.