Payments API Unavailbility
Incident Report for Zooz
Postmortem

Background:

  • First Incident a week before.

Incident Details:

  • The outage started at 8/19/2017, 2:11 AM and ended at 09:30 AM same day.
  • We succeeded to recover access to all of the data at 3:00 PM Saturday.
  • Due to the outage the week before, a manual configuration error was made which was auto-corrected by our self-healing system a week later.
  • This autocorrect caused the system to stop responding.
  • Confusing error messages sent our investigation teams down an incorrect route, leading to time being spent trying to solve a problem that didn’t exist due to our hypersensitivity to this component from failing a week prior.
  • Once the teams realised what the issue was it was fixed almost immediately.

Damage Assessment

  • Throughout the duration of the incident no transactions were able to be processed.
  • All calls to any financial action (authorize, capture, charge, refund, void) will have received a “500” error from our API.

Investigation and Findings

  • We had a wrong configuration in an internal framework caused by human error whilst manually changing values in the incident last week - we should never configure things manually.
  • We had a corrupted backup which was undetected, and was one we attempted to use, which lengthened our investigation.

Action Plan

  • Create further tests to validate component backup procedure. - COMPLETED 20/08/2017
  • Create tests for all known backups we have taken and check integrity of the backup. - COMPLETED 20/08/2017
  • Automate the deployment of fixes/changes to core frameworks. - COMPLETED 20/08/2017
  • Review investigation and deployment processes during investigation. - COMPLETED 21/08/201
Posted Aug 22, 2017 - 06:46 UTC

Resolved
Our Payments API was unavailable due to an internal issue with processor account credentials.

Service has been restored.
Posted Aug 19, 2017 - 02:21 UTC