Anita Andonoska for AWS Community Builders

Posted on Jan 1, 2024 • Originally published at Medium

Data Streaming Hands-On: Building Kinesis Data Streams App

#aws #lambda #kinesis #stream

This is a step-by-step guide that will walk you through the process of building an AWS Kinesis Data Streams application.

Set Up AWS Kinesis Data Streams service

When creating AWS Kinesis Data Streams service there are two capacity modes to choose from:

On demand, should be used when your data stream’s throughput requirements are unpredictable and variable. With on-demand mode, your data stream’s capacity scales automatically.
Provisioned: should be used when you can reliably estimate throughput requirements of your data stream. With provisioned mode, your data stream’s capacity is fixed.

For this example I am using a Kinesis Data Streams service with On-Demand capacity mode, so the data stream’s capacity will scale automatically. At time of creation I went with default settings for the Kinesis Data Stream service.

Build a Data Producer

We need to create an application that will simulate data production. To send the data we’ll utilize AWS Kinesis SDK so that we can interact with the Kinesis API.

In my example I am using AWS SDK for .NET, and I have created a Lambda function as a producer, but this can be any type of application or script, and of course it can run outside of AWS.

The AWS Kinesis SDK offers two methods for sending the data, PutRecord for sending a single record, and PutRecords for sending a batch of records. For the Lambda function to be able to send data to Kinesis Data Streams, it is needed to have the PutRecord and/or PutRecords permission assigned based on which API method you’ll be using. Here we’ll show the usage of the PutRecords method since sending a batch is more fun.

Each PutRecords request can support up to 500 records. Each record in the request can be as large as 1 MiB, up to a limit of 5 MiB for the entire request, including partition keys. To send a batch of records we’ll need to make sure that a single PutRecords request does not go over the maximum number of records or the maximum size limit, see the code sample below.

	private static List<List<PutRecordsRequestEntry>> SplitEventsIntoBatches(List<UpdateProfileEvent> events)
	{
	int currentBatchSizeBytes = 0;
	var recordsBatch = new List<List<PutRecordsRequestEntry>>();
	var recordBatch = new List<PutRecordsRequestEntry>();
	foreach (var eventData in events)
	{
	try
	{
	byte[] eventDataBytes = Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(eventData));

	int eventDataSizeBytes = eventDataBytes.Length;

	var isBatchRequestLimitReached = currentBatchSizeBytes + eventDataSizeBytes > maxBatchSizeBytes \|\| recordsBatch.Count >= maxRecordsPerBatch;
	if (isBatchRequestLimitReached)
	{
	recordsBatch.Add(recordBatch);

	recordBatch.Clear();
	currentBatchSizeBytes = 0;
	}

	using (var memoryStream = new MemoryStream(eventDataBytes))
	{
	recordBatch.Add(new PutRecordsRequestEntry
	{
	Data = memoryStream,
	PartitionKey = Guid.NewGuid().ToString()
	});

	currentBatchSizeBytes += eventDataSizeBytes;
	}
	}
	catch(Exception ex)
	{
	LambdaLogger.Log($"An unexpected error occured on SplitEventsIntoBatches for event {eventData.Id} {ex.Message}");
	}
	}

	if (recordBatch.Any())
	{
	recordsBatch.Add(recordBatch);
	}

	return recordsBatch;
	}

view raw KinesisProducer.cs hosted with ❤ by GitHub

Build a Data Consumer

The Amazon Kinesis Data Streams integrates with many AWS and third party services for consuming the data (more info in my previous post), also a custom application can be created to read the data by utilizing Kinesis Data Stream API.

Here as a data consumer, I have created an AWS Lambda that integrates with the Kinesis Data Stream. The Kinesis Data Stream service is added as a trigger for the function. Also proper permissions (kinesis GetRecords, ListStreams, DescribeStream etc.) are needed to be assigned to the function so that it can be triggered by Kinesis Data Stream service .

In the code below the messages are being read from the stream.

	public void FunctionHandler(KinesisEvent kinesisEvent, ILambdaContext context)
	{
	context.Logger.LogInformation($"Beginning to process {kinesisEvent.Records.Count} records...");

	var shouldLogEvents = Environment.GetEnvironmentVariable("LogEventContent") == true.ToString();

	List<UpdateProfileEvent> events = new List<UpdateProfileEvent>();
	foreach (var record in kinesisEvent.Records)
	{
	try
	{
	string recordData = GetRecordContents(record.Kinesis);
	var evnt = JsonConvert.DeserializeObject<UpdateProfileEvent>(recordData);
	if (evnt != null)
	{
	events.Add(evnt);
	}
	if (shouldLogEvents)
	{
	context.Logger.LogInformation($"Record Data:");
	context.Logger.LogInformation(recordData);
	}
	}
	catch (Exception ex)
	{
	context.Logger.LogError($"An error occured for event {record.EventId}: {ex}.");
	}
	}

	context.Logger.LogInformation("Stream processing complete.");
	}

view raw KinesisConsumerLambda.cs hosted with ❤ by GitHub

Monitor and Troubleshoot

For monitoring and troubleshooting the Data Streams Service offers an automatic dashboard:

The dashboard is based on metrics that automatically are sent to CloudWatch from the Data Streams service. Additionally you can leverage the metrics to create your own dashboards or create alarms.

Beside this I am using CloudWatch logs for the custom logging from the producer and consumer Lambda functions.

Conclusion

In the preceding discussion, we’ve walked through the steps of creating a Data Streaming app, tackling the generation and transmission of streaming data, and finally, the consumption of that data.

You can find the complete code on my GitHub repository.

DEV Community

Data Streaming Hands-On: Building Kinesis Data Streams App

Set Up AWS Kinesis Data Streams service

Build a Data Producer

Build a Data Consumer

Monitor and Troubleshoot

Conclusion

Top comments (0)

Read next

How AWS manages fully managed services: Simplifying the cloud

Simplifying Internal APIs with Direct AWS Lambda Invocation

Document Translation Service using Streamlit & AWS Translator

Amazon S3 Tables: A Game Changer in Analytics and Data Lake Space