Amazon Managed Streaming for Apache Kafka (MSK) simplifies running Kafka on AWS. MSK Connect extends this by allowing data to flow between Kafka topics and external systems such as Amazon S3, Elasticsearch, or databases. While powerful, the setup process often runs into networking, authentication, and plugin issues, especially when the MSK cluster is placed in private subnets.
This article provides a step-by-step walkthrough for setting up MSK Connect in private subnets, explains why errors occur, and details how to fix them. It also covers both scenarios: when you are creating a new MSK cluster from scratch, and when you already have an MSK cluster running.
Scenario 1: Setting Up MSK Connect from Scratch in Private Subnets
If you don’t yet have an MSK cluster, you must first provision one inside a VPC. Because we are focusing on private subnets, all network communication will rely on correct security group rules and VPC endpoints.
Step A: Create the MSK Cluster
VPC and Subnets:
- Use private subnets for your brokers.
- Ensure these subnets have appropriate routing (e.g., NAT gateway or VPC endpoints) to reach AWS services like S3 and CloudWatch.
Security Group (SG) Setup:
- Create a dedicated security group for MSK.
- Inbound rules: Allow traffic from the private subnets where clients or Kafka Connect will run.
-
Outbound rules: This is critical. MSK needs outbound access to reach services. If you skip this, you may see errors like:
org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata
The cluster may still provision, but your connectors will not be able to communicate with the brokers. The fix is simple: ensure outbound rules allow traffic (0.0.0.0/0 on required ports or at least AWS service endpoints).
Authentication and Broker Ports:
MSK supports different authentication mechanisms. The chosen method determines which broker endpoint and port you should use later:
- IAM authentication →
bootstrap_brokers_sasl_iam
on port 9098 - SASL/SCRAM authentication →
bootstrap_brokers_sasl_scram
on port 9096 - TLS only →
bootstrap_brokers_tls
on port 9094
Common mistakes here include using the wrong broker endpoint for the authentication method you selected, which will result in connectivity errors. For example, if you provision the cluster with SASL but later try to connect using the IAM bootstrap brokers, you’ll face timeouts.
Another consideration:
- If IAM = false and SASL = true, you must explicitly create usernames and passwords for your MSK cluster.
- If you choose IAM only, no manual credentials are required.
Step B: Create the Kafka Connect Cluster
Once the MSK cluster is ready, you can provision Kafka Connect in the same VPC.
Authentication Choice in Connect
- Kafka Connect only allows two options:
NONE
orIAM
. - If your MSK cluster was created with SASL, you must select
NONE
. - If your MSK cluster was created with IAM, then configure Connect to use IAM and point it to
bootstrap_brokers_sasl_iam
(port 9098).
Choosing incorrectly will result in connection failures or metadata fetch errors.
Executor Role Permissions
Kafka Connect tasks run under an IAM execution role. If you plan to use S3 as a sink or source, this role must include at least:
s3:GetObject
s3:ListBucket
Without these, connectors fail when trying to write or read from S3.
VPC Endpoint for S3
Since your MSK Connect cluster is in a private subnet, it cannot reach S3 directly. You need to create a Gateway VPC Endpoint for S3:
com.amazonaws.<region>.s3
If this is missing, you will encounter errors such as:
org.apache.kafka.connect.errors.ConnectException: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to s3.us-east-1.amazonaws.com:443 failed: connect timed out
The fix is to create the VPC endpoint and associate it with the private route tables.
CloudWatch Logging
Always create a CloudWatch log group for Kafka Connect. This allows you to see detailed error messages from tasks, which are invaluable during troubleshooting.
Custom Plugins
Many real-world connectors (such as the S3 Sink Connector or Protobuf Converter) are not built-in.
- Download or build the connector JAR files.
- Package them as a ZIP file.
- Upload the ZIP to an S3 bucket.
- Reference the S3 path when creating the Kafka Connect cluster.
If the plugin is missing or not zipped correctly, your connector creation will fail.
Common Errors and Fixes (Scenario 1)
Error | Cause | Resolution |
---|---|---|
TimeoutException: Timed out waiting to send the call |
Wrong broker port used or SG outbound blocked | Confirm broker endpoint matches your authentication type. Check SG outbound rules. |
ConnectException: Unable to execute HTTP request |
MSK Connect in private subnet cannot reach S3 | Create a Gateway VPC Endpoint for com.amazonaws.<region>.s3 . |
Connector cannot access S3 | Missing IAM permissions on executor role | Add s3:GetObject and s3:ListBucket to the role. |
Plugin not found error | Plugin not uploaded or wrong format | Upload plugin ZIP to S3 and specify correct path in Connect configuration. |
Scenario 2: Setting Up MSK Connect with an Existing Cluster
If you already have an MSK cluster in a private subnet, the process is simpler but still requires validation.
Check Cluster Configuration
- Which authentication method is enabled (IAM, SASL, or TLS)?
- Which broker endpoint corresponds to that method?
- Are security group outbound rules configured?
Kafka Connect Setup
- Deploy Kafka Connect in the same VPC and private subnets as the cluster.
-
Match authentication correctly:
- If cluster uses SASL → select
NONE
. - If cluster uses IAM → select
IAM
and use the IAM bootstrap brokers.
- If cluster uses SASL → select
Networking and Permissions
- Ensure the VPC endpoint for S3 is present.
- Confirm the executor role has S3 permissions.
- Verify CloudWatch log group exists.
- Confirm plugins are available in S3 in ZIP format.
Troubleshooting
If connectors still fail, check CloudWatch logs. Typical issues point back to:
- Incorrect broker endpoints
- Missing S3 permissions
- Absent VPC endpoint
- Plugin packaging errors
Best Practices
- Always create MSK clusters in private subnets with the necessary VPC endpoints for dependent services.
- Double-check which broker endpoint you should use. Many timeouts come from mixing IAM/SASL/TLS endpoints.
- Use least-privilege IAM policies, but don’t forget that Kafka Connect executor roles need explicit S3 permissions.
- Package connectors properly in ZIP format before uploading to S3.
- Monitor Kafka Connect logs in CloudWatch for faster troubleshooting.
Conclusion
Running MSK Connect in private subnets requires more than just clicking through the AWS console. You must carefully manage VPC design, security groups, authentication settings, and service endpoints. Most errors arise from either networking misconfigurations (outbound rules, missing VPC endpoints) or mismatched broker authentication. By validating each step and following the error–resolution table, you can avoid the most common pitfalls and deploy a stable Kafka-to-S3 pipeline.
Top comments (0)