In this lab we will look at the the more numerous metrics available directly via the Cloudwatch console
Login to the Console of the account where your cluster is running and go to the Cloudwatch console
Metrics in the left hand navigation pane
This screen shows you all the different services emitting metrics - we want
AWS/Kafka for the Amazon MSK cluster
Notice that there are 3 categories of metrics -
Broker ID, Cluster Name, Topic,
Broker ID, Cluster Name, and
Cluster Name. These generally all different metrics at different levels of granularity or scope.
Broker ID, Cluster Name, Topic - These are at the per topic level, on each broker, for each cluster. They will include many individual metrics per topic and are useful if you need to investigate performance or balance issues for particular topics.
An example of this would be looking at the MessagesInPerSec for a topic (ExampleTopic) in a particular cluster, broken down by broker:
Note that you can use the search bar to filter - you can type directly in to the bar, or you can select the down arrow beside any field and filter from that menu
Broker ID, Cluster Name - These metrics are more focused on the broker and topics, looking at things like memory, CPU, networking, and higher level monitoring such as counts of UnderReplicatedPartitions.
Note here that the Statistic field is Maximum - I’ve switched to the
Graphed Metrics tab, and set this to show the maximum value for each poll. It is generally a bad idea to use ‘average’ as it can give skewed results, hide spikes/dips, and make identifying problems harder. Evaluate the metric and your goal and select one of the many options available for the statistic
Cluster Name - the higest level metrics, looking at the overall state and health of a cluster.
In this graph we can see 2 important metrics, but the scale of their values is vastly different - one is large and growing integers (GlobalTopicCount), the other is measing ms of latency (ZooKeeperRequestLatecyMsMean). To accomodate these differing metrics and values, I’ve moved the ZooKeeperRequestLatencyMsMean to the 2nd Y Axis, allowing us to have the different scales graphed on the same graph easily.
For this exercise, lets select an important metric to watch -
MessagesInPerSec in Broker ID, Cluster Name. This will be the total messages in to Amazon MSK per broker.
Note - If you want MessagesInPerSec at a topic level, then you need to use the **Broker ID, Cluster Name, Topic** metric source.
MSKCluster) then enter the metric you want to graph -
You will see the messages in to each broker.
If you were investigating a problem you might start here - look at the messages coming in the brokers and compare it to normal values. To do this, you would use the time controls in the top right corner
Other options would be:
We can zoom out to
1w to look at historical trends, or zoom in to look at current data with greater granularity.