DEV Community

Cover image for A Complete Guide to Setting Up and Troubleshooting AWS MSK Connect in Private Subnets
Eunice js
Eunice js

Posted on

A Complete Guide to Setting Up and Troubleshooting AWS MSK Connect in Private Subnets

Amazon Managed Streaming for Apache Kafka (MSK) simplifies running Kafka on AWS. MSK Connect extends this by allowing data to flow between Kafka topics and external systems such as Amazon S3, Elasticsearch, or databases. While powerful, the setup process often runs into networking, authentication, and plugin issues, especially when the MSK cluster is placed in private subnets.

This article provides a step-by-step walkthrough for setting up MSK Connect in private subnets, explains why errors occur, and details how to fix them. It also covers both scenarios: when you are creating a new MSK cluster from scratch, and when you already have an MSK cluster running.

Scenario 1: Setting Up MSK Connect from Scratch in Private Subnets

If you don’t yet have an MSK cluster, you must first provision one inside a VPC. Because we are focusing on private subnets, all network communication will rely on correct security group rules and VPC endpoints.

Step A: Create the MSK Cluster

VPC and Subnets:

  1. Use private subnets for your brokers.
  2. Ensure these subnets have appropriate routing (e.g., NAT gateway or VPC endpoints) to reach AWS services like S3 and CloudWatch.

Security Group (SG) Setup:

  1. Create a dedicated security group for MSK.
  2. Inbound rules: Allow traffic from the private subnets where clients or Kafka Connect will run.
  3. Outbound rules: This is critical. MSK needs outbound access to reach services. If you skip this, you may see errors like:

     org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata
    

    The cluster may still provision, but your connectors will not be able to communicate with the brokers. The fix is simple: ensure outbound rules allow traffic (0.0.0.0/0 on required ports or at least AWS service endpoints).

Authentication and Broker Ports:

MSK supports different authentication mechanisms. The chosen method determines which broker endpoint and port you should use later:

  1. IAM authenticationbootstrap_brokers_sasl_iam on port 9098
  2. SASL/SCRAM authenticationbootstrap_brokers_sasl_scram on port 9096
  3. TLS onlybootstrap_brokers_tls on port 9094

Common mistakes here include using the wrong broker endpoint for the authentication method you selected, which will result in connectivity errors. For example, if you provision the cluster with SASL but later try to connect using the IAM bootstrap brokers, you’ll face timeouts.

Another consideration:

  • If IAM = false and SASL = true, you must explicitly create usernames and passwords for your MSK cluster.
  • If you choose IAM only, no manual credentials are required.

Step B: Create the Kafka Connect Cluster

Once the MSK cluster is ready, you can provision Kafka Connect in the same VPC.

Authentication Choice in Connect

  • Kafka Connect only allows two options: NONE or IAM.
  • If your MSK cluster was created with SASL, you must select NONE.
  • If your MSK cluster was created with IAM, then configure Connect to use IAM and point it to bootstrap_brokers_sasl_iam (port 9098).

Choosing incorrectly will result in connection failures or metadata fetch errors.

Executor Role Permissions

Kafka Connect tasks run under an IAM execution role. If you plan to use S3 as a sink or source, this role must include at least:

  • s3:GetObject
  • s3:ListBucket

Without these, connectors fail when trying to write or read from S3.

VPC Endpoint for S3

Since your MSK Connect cluster is in a private subnet, it cannot reach S3 directly. You need to create a Gateway VPC Endpoint for S3:

   com.amazonaws.<region>.s3
Enter fullscreen mode Exit fullscreen mode

If this is missing, you will encounter errors such as:

   org.apache.kafka.connect.errors.ConnectException: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to s3.us-east-1.amazonaws.com:443 failed: connect timed out
Enter fullscreen mode Exit fullscreen mode

The fix is to create the VPC endpoint and associate it with the private route tables.

CloudWatch Logging

Always create a CloudWatch log group for Kafka Connect. This allows you to see detailed error messages from tasks, which are invaluable during troubleshooting.

Custom Plugins

Many real-world connectors (such as the S3 Sink Connector or Protobuf Converter) are not built-in.

  • Download or build the connector JAR files.
  • Package them as a ZIP file.
  • Upload the ZIP to an S3 bucket.
  • Reference the S3 path when creating the Kafka Connect cluster.

If the plugin is missing or not zipped correctly, your connector creation will fail.

Common Errors and Fixes (Scenario 1)

Error Cause Resolution
TimeoutException: Timed out waiting to send the call Wrong broker port used or SG outbound blocked Confirm broker endpoint matches your authentication type. Check SG outbound rules.
ConnectException: Unable to execute HTTP request MSK Connect in private subnet cannot reach S3 Create a Gateway VPC Endpoint for com.amazonaws.<region>.s3.
Connector cannot access S3 Missing IAM permissions on executor role Add s3:GetObject and s3:ListBucket to the role.
Plugin not found error Plugin not uploaded or wrong format Upload plugin ZIP to S3 and specify correct path in Connect configuration.

Scenario 2: Setting Up MSK Connect with an Existing Cluster

If you already have an MSK cluster in a private subnet, the process is simpler but still requires validation.

Check Cluster Configuration

  • Which authentication method is enabled (IAM, SASL, or TLS)?
  • Which broker endpoint corresponds to that method?
  • Are security group outbound rules configured?

Kafka Connect Setup

  • Deploy Kafka Connect in the same VPC and private subnets as the cluster.
  • Match authentication correctly:

    • If cluster uses SASL → select NONE.
    • If cluster uses IAM → select IAM and use the IAM bootstrap brokers.

Networking and Permissions

  • Ensure the VPC endpoint for S3 is present.
  • Confirm the executor role has S3 permissions.
  • Verify CloudWatch log group exists.
  • Confirm plugins are available in S3 in ZIP format.

Troubleshooting

If connectors still fail, check CloudWatch logs. Typical issues point back to:

  • Incorrect broker endpoints
  • Missing S3 permissions
  • Absent VPC endpoint
  • Plugin packaging errors

Best Practices

  • Always create MSK clusters in private subnets with the necessary VPC endpoints for dependent services.
  • Double-check which broker endpoint you should use. Many timeouts come from mixing IAM/SASL/TLS endpoints.
  • Use least-privilege IAM policies, but don’t forget that Kafka Connect executor roles need explicit S3 permissions.
  • Package connectors properly in ZIP format before uploading to S3.
  • Monitor Kafka Connect logs in CloudWatch for faster troubleshooting.

Conclusion

Running MSK Connect in private subnets requires more than just clicking through the AWS console. You must carefully manage VPC design, security groups, authentication settings, and service endpoints. Most errors arise from either networking misconfigurations (outbound rules, missing VPC endpoints) or mismatched broker authentication. By validating each step and following the error–resolution table, you can avoid the most common pitfalls and deploy a stable Kafka-to-S3 pipeline.

Top comments (0)