Basic Amazon Cloudwatch Alarms

In this lab we will look at the simple monitoring available through the Amazon MSK Service

  1. Login to the Console of the account where your cluster is running and go to the Amazon MSK Service Console.

  2. Click on the name of the Amazon MSK Cluster you are interested in monitoring

  1. Click on the Monitoring tab

  2. You will see a simple dashboard that’s showing you metrics from your cluster:

  1. In the top right corner, you will see a button that says Create CloudWatch alarm - let’s click that!

  1. You will get to the first step of a wizard that is going to guide you through setting up an Amazon Cloudwatch Alarm - we will alarm based on Disk Space > 85%

  1. Click on Select Metric

  2. Click on ‘AWS/Kafka’ to explore the metrics available from your Amazon MSK Cluster

  3. Select Broker ID, Cluster Name to get to the Disk space metrics.

  • For more details on the other metric categories see the advanced CloudWatch section of this lab.
  1. Optional If you have more than on cluster, you can filter metrics to show you only your cluster by typing the cluster name (ex: MSKCluster) into the search bar. Alternately you can click next to your Cluster Name and select the down arrow, then Add to search

  1. In the Search bar, enter KafkaDataLogsDiskUsed to filter to the metric we want to alarm on. You can also find the metric in the list, select the down arrow and Add to search

  2. Check box beside Broker ID 1 - you will see a graph appear above, showing you the values of the metric for the given broker. We will use this to create your disk space alarm

  1. Click on Select Metric in the bottom right corner.

This will take you back to the Create Alarm wizard where you will now configure the rest of the alarm - thresholds, actions, and metadata.

  1. Scroll down to Conditions and select Static and Greater because we want to alarm on our metric (KafkaDataLogsDiskUsed) being greater than 85%

  2. In Threshold Value enter 85

  3. Select the down arrow beside Additional Configuration

  4. Under Datapoints to Alarm enter 2 in the first box, and 3 in the second - this indcates that we want 2 out of 3 polls (every 5 min) to be above 85% before alarming - this should help make sure the cluster is above 85% and didn’t just quickly spike over.

  5. Leave Missing Data Treatment as Treat missing data as missing - this will not impact behvavior of the alarm

  1. Select Next to configure the action to take

  2. In the Configure Actions section, select In Alarm

  3. Select Create new topic and enter msk_cloudwatch_alarm as the name below the radio box

  4. Enter your email address in the Email endpoints that will receive the notification

  1. Click Create Topic This step will create an SNS topic with your email address as the garget, and it will send a confirmation email to your address. Please confirm your email by clicking on the link.

  2. Click ‘Next’ button in the bottom right to fill out the metadata for the alarm

  3. In Alarm Name enter MSK Broker Disk Utilization - Broker 1

  4. In Alarm description enter MSK broker data volume is over 85% utilized. Investigate and add capaicty if required.

  5. Click ‘Next’ button in the bottom right to review your Alarm and activate it

  6. Scroll through the summary of your alarm, and if everything looks correct, click Create Alarm

Congrats - you have an alarm for broker 1 disk space! But we want the other brokers in your cluster to be monitored too! So we will duplicate this alarm and tweak it to monitor the other brokers.

You will notice the the alarm will start in state Insufficient data - this is because the alarm is still waiting for enough polls to pass by to establish a known state - then it will be marked OK

  1. Click the checkbox beside MSK Broker Disk utilization - Broker 1, then click Actions and Copy

  2. Under Broker ID change the value to 2 then scroll down and hit Next and Next

  3. In Name and Description change Broker 1 to Broker 2 and hit Next

  4. Click Create Alarm

Repeat the previous 4 steps for other brokers in the cluster. Otherwise, you can move on to the next exercise, or you can try creating additional alarms and explore other features..