Standards, values, and other information relevant to the NYPL Engineering Team.
No Bib/Item Platform Updates For 3 Days
Incident Date
March 30, 2018
Incident Summary
On March 27, 2018 at ~2:55pm, bib/item updates from Sierra stopped propagating to our Platform.
It was discovered on March 30, 2018 at ~11:00am (almost 3 days later) that the primary Sierra API key used by our Platform services was accidently deleted.
Author(s)
Kevin Friedman
Timeline
On March 30, 2018 at ~11:00am, it was noticed that there were no bib/item updates reported on our dashboard (:+1: for dashboards).
Upon checking CloudWatch, it was discovered that were virtually no updates posted since March 27, 2018 at ~2:55pm.
After checking the CloudWatch logs from the bib/item pollers (:+1: for good logging), it was discovered that there was an authentication error retrieving an access token from the Sierra API.
It was confirmed that the Sierra API key used by our Platform services was deleted.
A new Sierra API key was issued and installed on appropriate services.
Normal service resumed by March 30, 2018 at ~12:30pm.
Root Causes
The Sierra API key used by our Platform was accidentally deleted.
Alarms were improperly configured.
The metric filter was being generated from the wrong log group web-1.error.log.
It should have been generated from web-1.log.
Resolution and Recovery
A new Sierra API Key was issued and saved in Parameter Store.
Alarms were re-configured based off the correct metric filter.
Preventative Measures
When used by services, Sierra API keys should be generated and documented as for services/applications and not people.
Communicate that metric filters should be configured to watch the correct log group.