The use of alarms is an essential requirement when working with various resources in the cloud. It is one of the most efficient ways to monitor and understand the behavior of an application if the metrics are different than expected.
In this post, we're going to create an alarm from scratch using AWS CloudWatch based on specific scenario. There are several other tools that allow us to set up alarms, but when working with AWS, setting alarms using CloudWatch is very simple and fast.
Use Case Scenario
To a better understanding, suppose we create a resiliency mechanism in an architecture to prevent data losses. This mechanism always acts whenever something goes wrong, like components not working as expected sending failure messages to a SQS.
CloudWatch allows us to set an alarm. Thus, when a message is sent to this queue, an alarm is triggered.
First of all, we need to create a queue and sending messages just to generate some metrics that we're going to use in our alarm. That's a way to simulate a production environment. After queue and alarm creation, we'll send more message for the alarms tests.
Creating a SQS Queue
Let's create a simple SQS queue and choose some metrics that we can use in our alarm. Thus, access the AWS console and in the search bar, type "sqs" as shown in the image below and then access the service.
After accessing the service, click Create queue
Let's create a Standard queue for this example and name as sqs-messages. You don't need to pay attention to the other details, just click on the Create queue button to finish it.
Queue has been created, now the next step we'll send a few messages just to generate metrics.
Sending messages
Let's send few messages to the previously created queue, feel free to change the message content if you want to.
After sending these messages, automatically will generate some metrics according to the action. In this case, a metric called NumberOfMessagesSent was created on CloudWatch and we can use it to create the alarm.
Creating an Alarm
For our example, let's choose the metric based on number of messages sent (NumberOfMessagesSent).
Access AWS via the console and search for CloudWatch in the search bar, as shown in the image below.
After accessing the service, click on the In Alarms/In alarm option in the left corner of the screen and then click on the Create alarm button.
Select metric according to the screen below
Choose SQS
Then click Queue Metrics
Search for queue name and select the metric name column item labeled NumberOfMessagesSent, then click Select Metric.
Setting metrics
Metric name: is the metric chosen in the previous steps. This metric measures the number of messages sent to the SQS (NumberOfMessagesSent).
QueueName: Name of the SQS in which the alarm will be configured.
Statistic: In this field we can choose options such as Average, Sum, Minimum and more. This will depend on the context you will need to configure the alarm and the metric. For this example we choose Sum, because we want to get the sum of the number of messages sent in a given period.
Period: In this field we define the period in which the alarm will be triggered if it reaches the limit condition, which will be defined in the next steps.
Setting conditions
Threshlod type: For this example we will use Static.
Whenever NumberOfMessagesSent is...: Let's select the Greater option
Than...: In this field we will configure the number of NumberOfMessagesSent as a condition to trigger the alarm. Let's put 5.
Additional configuration
For additional configuration, we have the datapoints field for the alarm in which I would like to detail its operation a little more.
Datapoints to alarm
This additional option makes the alarm configuration more flexible, combined with the previously defined conditions.
By default, this setting is: 1 of 1
How it works?
The first field refers to the number of points and the second one refers to the period. Keeping the previous settings combined to the additional settings means that the alarm will be triggered if the NumberOfMessagesSent metric is greater than the sum of 5 in a period of 5 minutes. Until then, the default additional configuration does not change the previously defined settings, nothing changes.
Now, let's change this setting to understand better. Let's change from: 1 of 1 to 2 of 2.
This tells us that when the alarm condition is met, i.e. for the NumberOfMessagesSent metric, the sum is greater than 5. Thus, the alarm will be triggered for 2 datapoints in 10 minutes. Note that the period was multiplied due to the second field with the value 2.
Summarizing, even if the condition is met, the alarm will only be triggered if there are 2 datapoints above the threshold in a period of 10 minutes. We will understand even better when we carry out some alarm activation tests.
Let's keep the following settings and click Next
Configuring actions
On the next screen, we're going to configure the actions responsible for notifying a destination if an alarm is triggered.
On this screen, we're going to keep the In alarm setting and then creating a new topic and finally, we're going to add an email in which we want to receive error notifications.
Select the option Create new topic and fill in a desired name and then enter a valid email in the field Email endpoints that will receive notification ...
Once completed, click Create topic and then an email will be sent to confirm subscription to the created topic. Make sure you've received an email confirmation and click Next on the alarm screen to proceed with the creation.
Now, we need to add the name of the alarm in the screen below and then click on Next. The next screen will be the review screen, click on Create alarm to finish it.
Okay, now we have an alarm created and it's time to test it.
Alarm Testing
In the beginning we sent a few messages just to generate the NumberOfMessagesSent metric but at this point, we need to send more messages that will trigger the alarm. Thus, let's send more messages and see what's going to happen.
After sending the messages, notice that even if the threshold has exceeded, the alarm was not triggered. This is due to the threshold just reached 1 datapoint within the 10 minute window.
Now, let's send continuous messages that exceed the threshold in short periods within the 10 minute window.
Note that in the image above the alarm was triggered because in addition to having reached the condition specified in the settings, it also reached the 2 data points.
Check the email added in the notification settings, probably an email was sent with the alarm details
The status alarm will set to OK when the messages not exceed the threshold anymore.
Books to study and read
If you want to learn more about and reach a high level of knowledge, I strongly recommend reading the following book(s):
AWS Cookbook is a practical guide containing 70 familiar recipes about AWS resources and how to solve different challenges. It's a well-written, easy-to-understand book covering key AWS services through practical examples. AWS or Amazon Web Services is the most widely used cloud service in the world today, if you want to understand more about the subject to be well positioned in the market, I strongly recommend the study.
Well that’s it, I hope you enjoyed it!
Comments