In this exercise you will create an Amazon MSK cluster using the AWS CLI.
We need to get the subnets to deploy the brokers in to. For that, we need to know the VPC ID for the lab.
Use the cli to get a list of VPCs in your account
aws ec2 describe-vpcs --output table
Look in the table for the VPC you’re using for the exercise. If you created a new VPC as per the prep step, then you want the VPC called AWSKafkaTutorialVPC
. If you are using the VPC created as part of the workshop event, it will be named MMVPC
Copy the VPCid (example: vpc-001ed0757f999e2b5
) to your notepad
Use the cli to get a list of subnets in that VPC:
aws ec2 describe-subnets --filters "Name=vpc-id,Values=vpc-001ed0757fbb9e2b5" --output table | egrep "Name|AvailabilityZone|SubnetId"
This will list the subnets in the selected VPC, then grab only the AZ, SubnetID, and Name, making it easier for you to grab the SubnetIds for the 3 private subnets in 3 different AZs. Add these to your notepad for later use.
Example:
[ec2-user@ip-10-0-0-245 ~]$ aws ec2 describe-subnets --filters "Name=vpc-id,Values=vpc-XYZ" --output table | egrep "Name|AvailabilityZone|SubnetId"
|| AvailabilityZone | us-east-1a ||
|| AvailabilityZoneId | use1-az1 ||
|| SubnetId | subnet-001155c2fc9f8e42d ||
||| Name | PrivateSubnetMSKOne |||
|| AvailabilityZone | us-east-1a ||
|| AvailabilityZoneId | use1-az1 ||
|| SubnetId | subnet-005c7f3995a2834f9 ||
||| Name | PublicSubnet |||
|| AvailabilityZone | us-east-1b ||
|| AvailabilityZoneId | use1-az2 ||
|| SubnetId | subnet-005f628db0b37d3b9 ||
||| Name | PrivateSubnetMSKTwo |||
|| AvailabilityZone | us-east-1c ||
|| AvailabilityZoneId | use1-az4 ||
|| SubnetId | subnet-01b91b7d7f3a7d4e4 ||
||| Name | PrivateSubnetMSKThree |||
We are going to enable your new Amazon MSK cluster to have the following settings:
8
hours for the lab. Note that this is the default configuration, it can still be overridden at the topic levelOn your KafkaClientInstance, create a file called ‘cluster_config.txt’ with the following content
vi ~/cluster_config.txt
Put in the contents (hit i
to enter insert mode)
auto.create.topics.enable = true
delete.topic.enable = true
log.retention.hours = 8
Press :wq
to exit
This will push the configuration in to the Amazon MSK service for use at cluster creation time
Run the command:
aws kafka create-configuration --name "WorkshopMSKConfig" --description "Configuration used for MSK workshop - Auto topic creation; topic deletion; 8hrs retention" --kafka-versions "2.3.1" "2.2.1" --server-properties file://cluster_config.txt
The --kafka-versions
option is used to tell Amazon MSK which versions of Amazon MSK this configuration is allowed to be used with.
If you see an error like this:
An error occurred (BadRequestException) when calling the CreateConfiguration operation: Unsupported KafkaVersion [2.1.1]. Valid values: [1.1.1, 2.1.0, 2.2.1, 2.3.1]
then ensure that you’ve typed in the kafka version string correctly (including quotes)
When the command is run, it will return a JSON object, including the ARN for the configuration object. You should copy and paste this in to your text editor for use later, or assign it to an environement variable (export CLUSTER_ARN="arn:..."
)
Example:
{
"Arn": "arn:aws:kafka:us-east-1:xyz:configuration/WorkshopMSKConfig/53481d97-3d6f-4abe-94e4-233ce39e3332-6",
"CreationTime": "2020-02-15T23:02:11.571Z",
"LatestRevision": {
"CreationTime": "2020-02-15T23:02:11.571Z",
"Description": "Configuration used for MSK workshop - Auto topic creation; topic deletion; 8hrs retention",
"Revision": 1
},
"Name": "WorkshopMSKConfig4"
}
You can review the configuration using the CLI
Using the ARN provided in the output step above (or retrieved from aws kafka list-configurations
) you can query for your saved configuration
$ aws kafka describe-configuration --arn $CLUSTER_ARN
The output:
{
"Arn": "arn:aws:kafka:us-east-1:xyz:configuration/WorkshopMSKConfig/2d99aad1-a420-4f62-83c1-0e2473aea998-6",
"CreationTime": "2020-02-15T22:52:19.563Z",
"Description": "Configuration used for MSK workshop - Auto topic creation; topic deletion; 8hrs retention",
"KafkaVersions": [
"2.3.1",
"2.2.1"
],
"LatestRevision": {
"CreationTime": "2020-02-15T22:52:19.563Z",
"Description": "Configuration used for MSK workshop - Auto topic creation; topic deletion; 8hrs retention",
"Revision": 1
},
"Name": "WorkshopMSKConfig"
}
For more details on creating and managing Amazon MSK Cluster Configuration, see the MSK Configuration Operations document
To complete this step, you need the following
You will now combine the data above into a cluster definition file (clusterinfo.json
). It will look something like this, where you will replace the values with the values from above:
Example of a complete file:
{
"BrokerNodeGroupInfo": {
"BrokerAZDistribution": "DEFAULT",
"InstanceType": "kafka.m5.large",
"ClientSubnets": [
"subnet-05e9ee503739783a4", "subnet-054990841567825d4", "subnet-07af438a99fe7f508"
],
"SecurityGroups": [
"sg-008cf48ee78b732ef"
],
"StorageInfo": {
"EbsStorageInfo": {
"VolumeSize": 100
}
}
},
"ClusterName": "MSKWorkshopCluster",
"ConfigurationInfo": {
"Arn": "arn:aws:kafka:us-east-1:xyz:configuration/WorkshopMSKConfig/2d99aad1-a420-4f62-83c1-0e2473aea998-6",
"Revision": 1
},
"EncryptionInfo": {
"EncryptionAtRest": {
"DataVolumeKMSKeyId": ""
},
"EncryptionInTransit": {
"InCluster": true,
"ClientBroker": "TLS_PLAINTEXT"
}
},
"EnhancedMonitoring": "PER_TOPIC_PER_BROKER",
"KafkaVersion": "2.3.1",
"NumberOfBrokerNodes": 3,
"OpenMonitoring": {
"Prometheus": {
"JmxExporter": {
"EnabledInBroker": true
},
"NodeExporter": {
"EnabledInBroker": true
}
}
}
}
We can now use the command line tool and the cluster definition to create the cluster:
aws kafka create-cluster --cli-input-json file://clusterinfo.json
The command will return a JSON object that containers your cluster ARN, name and state. Grab the ARN.
This step will take some time. You can move on to the next step to see how to monitor progress and review the cluster deployment.
You can check on your cluster configuration and status by using the cli and the --describe-cluster
option. You will need the cluster arn for this, which you got from the last step
Use the ARN and get the cluster configuration and state, changing the example ARN to the one from the command above:
aws kafka describe-cluster --cluster-arn arn:aws:kafka:us-east-1:xyz:cluster/MSKWorkshop/20a94343-552f-4298-9076-99673162e023-6 | grep -i state
"State": "CREATING",
When the cluster is ready, you will get the state as “ACTIVE”