evolve632

Posted on Jun 10

Modern Cloud Patterns: Building a Serverless Kafka Consumer with Python, AWS Lambda & a Solid GitLab CI/CD Test Strategy

#productivity #kafka #serverless #aws

Modernizing legacy systems is rarely just about rewriting code. It’s about breaking old assumptions, aligning with cloud-native principles, and ensuring that delivery stays fast, safe, and scalable.

In one of our recent cloud modernization efforts, we migrated a Kafka consumer written in Java and hosted on on-prem servers to a Python-based AWS Lambda integrated with Confluent Kafka. This move wasn’t just about changing languages or platforms—it was about simplifying our architecture, improving observability, and embedding quality deeply into the pipeline.

Let’s walk through how we rebuilt the system using AWS Lambda + Event Source Mapping (ESM), handled Avro schema parsing with caching, and set up a robust GitLab CI/CD testing strategy to ensure confidence in every deployment.

This post breaks down how we approached it—architecture, trade-offs, and the testing pipeline that gave us the confidence to ship faster.

Legacy systems can quietly anchor your business—and hold it back at the same time. For years, one of our core backend services ran on Tibco EMS, consuming messages from queues and topics inside a tightly coupled, on-prem Java stack. It worked, until it didn’t scale… or integrate well… or deploy cleanly.

As part of a broader cloud modernization initiative, we reimagined the system as a Python-based AWS Lambda, consuming Confluent Kafka messages via Event Source Mapping (ESM), and used Avro schemas with schema caching for performance. All wrapped in a GitLab CI/CD pipeline that enforced continuous testing.

Here’s how we made the shift—from Tibco to Kafka, from servers to serverless, from manual to automated.

The Starting Point: Tibco EMS + On-Prem Java
Our original system:

Listened to Tibco EMS topics/queues
Ran as a Java process on on-prem servers
Had manual deployments and minimal test automation
Lacked observability and required restart coordination
Was fragile under spikes and hard to scale horizontally
It was stable in a pre-cloud world, but unsustainable in a fast-moving product environment.

Our Target Architecture:

We replaced the EMS layer with Confluent Kafka, and the Java service with a Python AWS Lambda. The Lambda consumes messages directly from Kafka via AWS MSK Event Source Mapping, handles Avro decoding via schema registry with in-memory caching, and processes the payload in a stateless way.

Core Components:

Confluent Kafka (Cloud): Replaces Tibco EMS
Avro Schemas + Schema Registry: Strongly typed messages
AWS Lambda (Python): Stateless consumer, no servers to manage
MSK Event Source Mapping: Native Kafka → Lambda binding
GitLab CI/CD: Full test pipeline

From EMS to Kafka: The real shift migrating from EMS to Kafka wasn’t a simple switch. We had to rethink:

Delivery guarantees: JMS had built-in durability; we now rely on Kafka's offset-based semantics.

Message formats: We moved from XML/POJO to Avro-encoded JSON with schema validation.

Queue semantics: Topics in Kafka behave differently; partitioning and replayability introduced new patterns.

Operational model: From app servers and Tibco brokers to cloud-native managed services.

We built producers upstream to publish Avro messages into Kafka with schema compatibility enforcement. Downstream, our Lambda consumer needed to decode messages with performance in mind.

Schema Handling with Caching
Early on, we realized that decoding Avro messages with live HTTP calls to Schema Registry was a bottleneck. So, we introduced a simple in-memory schema caching layer:

Example:

from fastavro import schemaless_reader
import requests, io

SCHEMA_CACHE = {}

def get_schema(schema_id):
    if schema_id in SCHEMA_CACHE:
        return SCHEMA_CACHE[schema_id]
    response = requests.get(f"https://<registry>/schemas/ids/{schema_id}", auth=(user, pass))
    schema = response.json()['schema']
    SCHEMA_CACHE[schema_id] = schema
    return schema

def decode_message(bytes_msg):
    schema_id = int.from_bytes(bytes_msg[1:5], byteorder='big')
    schema = get_schema(schema_id)
    return schemaless_reader(io.BytesIO(bytes_msg[5:]), schema)

This improved cold-start latency and helped with rate-limited environments.

GitLab CI/CD Pipeline: with Built-In Testing
We didn’t just modernize runtime—we modernized delivery.

Our .gitlab-ci.yml defines a pipeline with testing at every stage:
`
stages:

lint
unit
integration
bdd
performance
security
package
deploy`

Kafka and Schema Registry were mocked in test environments using Docker Compose.

Observability & Error Handling

Structured logging: Every message includes a partition, offset, schema ID
Monitoring: CloudWatch metrics and alarms + Appdynamics for tracing
OpenTelemetry Lambda layer for unified distributed tracing

Lessons Learned:

Schema caching isn't optional—it's critical at scale.
Tibco to Kafka is not apples to apples—expect to rethink patterns.
Lambda concurrency and ESM batch size tuning can drastically improve throughput.
Security scans and contract tests should block MRs—not just warn.
Dev parity is essential—Dockerizing Kafka+Registry locally helped our team move faster.

Final Thoughts:
We didn’t just refactor a message processor—we replatformed a key system from on-prem, tightly-coupled middleware to a cloud-native, serverless, testable event pipeline.

The result?

Faster deploys
Lower ops overhead
Resilience by default
Confidence with every change

And the datalayer we built using no-sql database i.e Dynamodb
If you’re still running EMS queues in a dusty datacenter corner, it might be time to step into the stream. Cheers!

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.