Cluster creation with the CLI

In this exercise you will create an Amazon MSK cluster using the AWS CLI.

Step 1 - Get Subnet Information

We need to get the subnets to deploy the brokers in to. For that, we need to know the VPC ID for the lab.

  1. Use the cli to get a list of VPCs in your account

     aws ec2 describe-vpcs --output table
    
  2. Look in the table for the VPC you’re using for the exercise. If you created a new VPC as per the prep step, then you want the VPC called AWSKafkaTutorialVPC. If you are using the VPC created as part of the workshop event, it will be named MMVPC

  3. Copy the VPCid (example: vpc-001ed0757f999e2b5) to your notepad

  4. Use the cli to get a list of subnets in that VPC:

     aws ec2 describe-subnets --filters "Name=vpc-id,Values=vpc-001ed0757fbb9e2b5" --output table | egrep "Name|AvailabilityZone|SubnetId"
    

This will list the subnets in the selected VPC, then grab only the AZ, SubnetID, and Name, making it easier for you to grab the SubnetIds for the 3 private subnets in 3 different AZs. Add these to your notepad for later use.

Example:

[ec2-user@ip-10-0-0-245 ~]$ aws ec2 describe-subnets --filters "Name=vpc-id,Values=vpc-XYZ" --output table | egrep "Name|AvailabilityZone|SubnetId"
||  AvailabilityZone                       |  us-east-1a                                                                                   ||
||  AvailabilityZoneId                     |  use1-az1                                                                                     ||
||  SubnetId                               |  subnet-001155c2fc9f8e42d                                                                     ||
|||  Name                         |  PrivateSubnetMSKOne                                                                                  |||
||  AvailabilityZone                       |  us-east-1a                                                                                   ||
||  AvailabilityZoneId                     |  use1-az1                                                                                     ||
||  SubnetId                               |  subnet-005c7f3995a2834f9                                                                     ||
|||  Name                         |  PublicSubnet                                                                                         |||
||  AvailabilityZone                       |  us-east-1b                                                                                   ||
||  AvailabilityZoneId                     |  use1-az2                                                                                     ||
||  SubnetId                               |  subnet-005f628db0b37d3b9                                                                     ||
|||  Name                         |  PrivateSubnetMSKTwo                                                                                  |||
||  AvailabilityZone                       |  us-east-1c                                                                                   ||
||  AvailabilityZoneId                     |  use1-az4                                                                                     ||
||  SubnetId                               |  subnet-01b91b7d7f3a7d4e4                                                                     ||
|||  Name                         |  PrivateSubnetMSKThree                                                                                |||

Step 2 - Create a custom cluster configuration

We are going to enable your new Amazon MSK cluster to have the following settings:

  • auto.create.topics.enable - allow topics to be created automatically by producers and consumers. This is not typically enabled in a production cluster, but it is handy for development and testing to lower the operational overhead
  • delete.topic.enable - enables topic deletion on the server. If topic deletion is not enabled, you cannot delete topics. You likely want to turn this on on all clusters you build unless you have a specific need not to.
  • log.retention.hours - we will set this to 8 hours for the lab. Note that this is the default configuration, it can still be overridden at the topic level
  1. On your KafkaClientInstance, create a file called ‘cluster_config.txt’ with the following content

    vi ~/cluster_config.txt

Put in the contents (hit i to enter insert mode)

    auto.create.topics.enable = true
    delete.topic.enable = true
    log.retention.hours = 8

Press to exit insert mode, then type :wq to exit

Create the configuration object

This will push the configuration in to the Amazon MSK service for use at cluster creation time

Run the command:

aws kafka create-configuration --name "WorkshopMSKConfig" --description "Configuration used for MSK workshop - Auto topic creation; topic deletion; 8hrs retention" --kafka-versions "2.3.1" "2.2.1" --server-properties file://cluster_config.txt

The --kafka-versions option is used to tell Amazon MSK which versions of Amazon MSK this configuration is allowed to be used with.

If you see an error like this:

An error occurred (BadRequestException) when calling the CreateConfiguration operation: Unsupported KafkaVersion [2.1.1]. Valid values: [1.1.1, 2.1.0, 2.2.1, 2.3.1]

then ensure that you’ve typed in the kafka version string correctly (including quotes)

When the command is run, it will return a JSON object, including the ARN for the configuration object. You should copy and paste this in to your text editor for use later, or assign it to an environement variable (export CLUSTER_ARN="arn:...")

Example:

{
    "Arn": "arn:aws:kafka:us-east-1:xyz:configuration/WorkshopMSKConfig/53481d97-3d6f-4abe-94e4-233ce39e3332-6",
    "CreationTime": "2020-02-15T23:02:11.571Z",
    "LatestRevision": {
        "CreationTime": "2020-02-15T23:02:11.571Z",
        "Description": "Configuration used for MSK workshop - Auto topic creation; topic deletion; 8hrs retention",
        "Revision": 1
    },
    "Name": "WorkshopMSKConfig4"
}

Review the configuration created

You can review the configuration using the CLI

Using the ARN provided in the output step above (or retrieved from aws kafka list-configurations) you can query for your saved configuration

$ aws kafka describe-configuration --arn $CLUSTER_ARN

The output:

{
    "Arn": "arn:aws:kafka:us-east-1:xyz:configuration/WorkshopMSKConfig/2d99aad1-a420-4f62-83c1-0e2473aea998-6",
    "CreationTime": "2020-02-15T22:52:19.563Z",
    "Description": "Configuration used for MSK workshop - Auto topic creation; topic deletion; 8hrs retention",
    "KafkaVersions": [
        "2.3.1",
        "2.2.1"
    ],
    "LatestRevision": {
        "CreationTime": "2020-02-15T22:52:19.563Z",
        "Description": "Configuration used for MSK workshop - Auto topic creation; topic deletion; 8hrs retention",
        "Revision": 1
    },
    "Name": "WorkshopMSKConfig"
}

For more details on creating and managing Amazon MSK Cluster Configuration, see the MSK Configuration Operations document


Step 4 - Create the cluster definition file

To complete this step, you need the following

  • Private SubnetID us-east-1a (from step 1 above)
  • Private SubnetID us-east-1b (from step 1 above)
  • Private SubnetID us-east-1c (from step 1 above)
  • Securitygroup ID for the SG “MSKWorkshopKafkaService” (from preparation stage)
  • Cluster configuration ARN (from step 2 above)

You will now combine the data above into a cluster definition file (clusterinfo.json). It will look something like this, where you will replace the values with the values from above:

Example of a complete file:

{
    "BrokerNodeGroupInfo": {
        "BrokerAZDistribution": "DEFAULT",
        "InstanceType": "kafka.m5.large",
        "ClientSubnets": [
            "subnet-05e9ee503739783a4", "subnet-054990841567825d4", "subnet-07af438a99fe7f508"
        ],
        "SecurityGroups": [
            "sg-008cf48ee78b732ef"
        ],
        "StorageInfo": {
            "EbsStorageInfo": {
                "VolumeSize": 100
            }
        }
    },
    "ClusterName": "MSKWorkshopCluster",
    "ConfigurationInfo": {
        "Arn": "arn:aws:kafka:us-east-1:xyz:configuration/WorkshopMSKConfig/2d99aad1-a420-4f62-83c1-0e2473aea998-6",
        "Revision": 1
    },
    "EncryptionInfo": {
        "EncryptionAtRest": {
            "DataVolumeKMSKeyId": ""
        },
        "EncryptionInTransit": {
            "InCluster": true,
            "ClientBroker": "TLS_PLAINTEXT"
        }
    },
    "EnhancedMonitoring": "PER_TOPIC_PER_BROKER",
    "KafkaVersion": "2.3.1",
    "NumberOfBrokerNodes": 3,
    "OpenMonitoring": {
        "Prometheus": {
            "JmxExporter": {
                "EnabledInBroker": true
            },
            "NodeExporter": {
                "EnabledInBroker": true
            }
        }
    }
}

Step 5 - Create the cluster

We can now use the command line tool and the cluster definition to create the cluster:

    aws kafka create-cluster --cli-input-json file://clusterinfo.json

The command will return a JSON object that containers your cluster ARN, name and state. Grab the ARN.

This step will take some time. You can move on to the next step to see how to monitor progress and review the cluster deployment.

Step 6 - Review the cluster deployed

You can check on your cluster configuration and status by using the cli and the --describe-cluster option. You will need the cluster arn for this, which you got from the last step

Use the ARN and get the cluster configuration and state, changing the example ARN to the one from the command above:

aws kafka describe-cluster --cluster-arn arn:aws:kafka:us-east-1:xyz:cluster/MSKWorkshop/20a94343-552f-4298-9076-99673162e023-6 | grep -i state
    "State": "CREATING",

When the cluster is ready, you will get the state as “ACTIVE”