Monitoring & Alarms
All services, applications, and consumers SHOULD have a CloudWatch Alarm set up.
Multiple alarms MAY be set up.
Owners SHOULD identify and monitor other key success metrics.
Alarm Standards
Alarms MUST be named consistently:
- The name MUST be in upper camel case and MUST be prefixed with the component name. For example:
BibServiceErrorAlarm.
Alarms SHOULD generally use the sum of a metric to trigger an alarm.
Alarms SHOULD notify a SNS topic when triggered.
Alarms SHOULD be added to an alarms dashboard.
Testing
Alarms SHOULD be tested as part of the Production Readiness process:
- This can often be done during a CHAOS session. For example:
- Turning off an upstream service or reducing or raising alarm threshold in order to artificially trigger alarm.
- Verify all people get emails that should; and that dashboard updates appropriately.
Alarm Triggers
Log Messages
All services, applications, and consumers MUST create a metric filter for their log messages of severity ERROR and greater.
The metric filter MUST be named and namespaced consistently:
- The namespace MUST be
LogMetrics. - The name MUST be in upper camel case and MUST be prefixed with the component name. For example:
BibServiceError.
The RECOMMENDED search for a metric filter is { $.levelCode <= 3 }.
The metric created from the metric filter MUST be used to trigger an alarm.
How to Set up a Metric
-
Log in the AWS Service with your credentials.
-
Go to
CloudWatchby clickingServicesdropdown menu from the top navigation and chooseCloudWatch. - Go to
Logsfrom the left navigation and find the log group you want to build the metric on from theLog Groupslist.- Warning: Be careful about choosing the correct log group - especially for Elastic Beanstalk log groups (see a previous post-mortem that had metric filters improperly configured).
-
Click the left circle of the log, and then go up to the top of the list, click
Create Metric Filter. -
In
Define Logs Metric Filterpage, enter the filter you like inFilter Patternfield, such as{ $.levelCode <= 3 }. More details forFilter Pattern, see here. Then clickAssign Metric. - In
Create Metric Filter and Assign a Metricpage, enter your filter name and metric name based on the conventions from the previous paragraph. Also, notice that all the customized metrics SHOULD be assigned in theMetric NamespaceofLogMetrics. Click Create Filter to finish creating the metric.
Other Alarms
Additional alarms MAY be set up for other metrics. For example:
- HTTP errors (5xx, 4xx status codes)
- Lambda errors
- Kinesis errors
Alarm Configuration
How to Set up an Alarm
-
You SHOULD set up a SNS Topic before you set up the alarm, so the alarm will have the place to go.
-
Go to
CloudWatchby clickingServicesdropdown menu from the top navigation and chooseCloudWatch. -
If it is your first alarm of the metric, a big chance you might not have any logs in the metric yet, thus you will not be able to see the metric by searching it. To set up the alarm on it, go to
Logsfrom the left navigation. And find the log that has your metric fromLog Groups. -
On the log, you will find that it indicates how many filters it has on
Metric Filterscolumn. Click the link such as1 filter. -
Now you will be on the page that has all the filters the log has. On the filter you want, you can find a link to
Create Alarmon the top right corner of the filter block. -
In the pop-up
Create Alarmwindow, first enter the name and the threshold of the alarm underAlarm Thresholdsection. The name SHOULD follow the naming conventions. InWheneversection,is:is usually set up to greater and equal to 1. -
In
Additional settingssection, setTreat missing data as:as good. -
In
Actionssection, setWhenever this alarm:asState is ALARM, andSend notification to:as the SNS Topic you have already created. -
In
Alarm Previewsection, choose your preferred period yet the Statistic SHOULD beStandardandSum. -
Click
Create Alarmto finish it.
SNS Topic
The SNS topic SHOULD be setup before creating an alarm.
The SNS topic MUST be named consistently and SHOULD be reused for all the component alarms.
- The name MUST be in upper camel case and MUST be prefixed with the component name. For example:
BibServiceErrorAlarm.
The SNS topic SHOULD notify the component owner(s) by email or other method.
How to Set up a SNS Topic
-
Click
Servicesdropdown menu from the top navigation and chooseSimple Notification Serviceunder the category ofMessaging. -
Click
Create topicon the page, and name your topic following the naming conventions from the previous paragraph. And then create the topic. -
You should be on
Topic detailspage now. InSubscriptionsection, clickCreate subscription. In theProtocoldropdown menu, choose the method you want to recieve the notifications. Generally we useEmail.Then, in theEndpointfield, enter your email address. Finally, clickCreate subscription.