DEV Community

Renicius Pagotto
Renicius Pagotto

Posted on

Understanding Azure Event Hubs Capture

Hello everyone, today we are going to talk about data capture in Azure Event Hubs and understand how this feature can empower data processing and storage.

What is Event Hubs Capture?

It's a feature of Azure Event Hub that allows Azure Blob Storage or Azure Data Lake Storage Gen 1 and Gen 2 to automatically receive all events sent to the hub with the flexibility to specify a time interval or size.

According to Microsoft, setting up Capture is fast, there are no administrative costs to run it, and it scales automatically with Event Hubs throughput units in the standard tier or processing units in the premium tier.

Image description

For more information, please visit Azure Event Hubs Capture.

Important

  • The destination storage (Azure Storage or Azure Data Lake Storage) account must be in the same subscription as the Azure Event Hubs.
  • Azure Event Hubs does not support capturing events in an Azure Premium Storage account.

When to use?

A good scenario to use Azure Event Hubs Capture is for event retention where, once captured, we can replay those events.

Image description

Another use case would be Event Sourcing, where you can store all the events that happened in your application.

Image description

Another scenario that Azure Event Hubs Capture allows you to focus on data processing instead of data capture.

How to create a Azure Event Hubs resource?

Open the Azure Portal and in the search engine type Azure Event Hubs and select the resource.

Image description

Select +Create option

Image description

Now, we need to provide some information that will be used to create the resource, feel free to provide the names you want. Once completed, at the bottom of the page select Review + create to validate the information and select Create when validation is complete to create the resource.

Image description

This operation may take a few minutes and after that your resource will be available to be used in the application.

Now, we need to create a Hub that will receive the application's events. From the left menu, select Event Hubs and then select + Event Hub

Image description

We need to provide some information for creating the Hub, again feel free to name it as you wish. Once completed, at the bottom of the page, select Review + create to validate the information and select Create when validation is complete to create the resource.

Image description

After that, the hub will be available for use.

Image description

How to enable Event Hub Capture?

Now, it's time to enable the Capture feature, but before that, you need to create an Azure Blob Storage and a container or an Azure Data Lake to use to receive the events. In my demo scenario I will use Azure Blob Storage.

After creating the resource that will receive the events, open the Event Hub we created above and select the Capture option, select Avro as serialization format and enable capture.

Apache Avro is a data serialization system, for more information visit Apache Avro

Image description

Now we need to define some information that Azure Event Hubs Capture will use to capture the events and send it to the target resource.

Image description

Below we have the explanation of the information:

  • Time window - The time interval in which the data will be collected, in this case every 5 minutes the data will be captured and sent to the target resource.
  • Size window - The amount in MB of events that the process will capture. The default value is 300 MB.
  • Do not emit empty files when no events occur during the Capture time window - Once the checkbox is checked, the capture will only capture events that have information, if no events are found, then nothing will be sent to the target destination.
  • Capture Provider - Here we define the destination of events which can be Azure Blob Storage or Azure Data Lake.
  • Azure Storage Container or Data Lake Store - This information is based on the previous one, if you set the Capture Provider as Azure Blob Storage then you need to select the container to receive the events, if you set it as Azure Data Lake then you need to inform the Data Lake name and Data Lake Path.
  • Sample Capture file name formats - Here we define the path format for storing the event.
  • Capture file name format - This format will follow the format of the Sample Capture File Name Formats field.

Below is a simple example of the way to find the event in Azure Blob Storage.

e.g. customer-event/customer-created/0/2022/12/20/13/06/53

In my case, I chose Azure Blob Storage as the Capture Provider and defined the container that will receive the events.

After filling, select Save changes to definitely enable the feature.

Image description

Once capture is enabled, all events are automatically captured every 2 minutes in 300MB batches and sent to Azure Blob storage.

Show me how it works

To demonstrate how it works, I created an event producer to publish them in the event hub.

Basically, this producer is an API project that contains an HTTP POST endpoint that, once the request is made, will publish an event in the hub of a created user.

        [HttpPost]
        public async Task<IActionResult> SendEvent()
        {
            var eventHubConnectionString = _configuration.GetValue<string>("EventHub:Connection");
            var eventHubName = _configuration.GetValue<string>("EventHub:ProductHubName");

            var producerClient = new EventHubProducerClient(eventHubConnectionString, eventHubName, new EventHubProducerClientOptions
            {
                RetryOptions = new EventHubsRetryOptions
                {
                    MaximumRetries = 5,
                    Delay = TimeSpan.FromSeconds(2)
                }
            });

            var eventBatch = await producerClient.CreateBatchAsync();

            var obj = new
            {
                Name = "Renicius Pagotto",
                Email = "test@test.com",
                Age = 30
            };

            eventBatch.TryAdd(new EventData(Encoding.UTF8.GetBytes(JsonSerializer.Serialize(obj))));
            await producerClient.SendAsync(eventBatch);

            return Ok($"The event has been published on {DateTime.Now}");
        }
Enter fullscreen mode Exit fullscreen mode

Below is the postman request

Image description

Now is the time to view the event in Azure Blob Storage and for that, we must use the format that was defined in the capture resource, which basically represents the file path.

In my scenario, my time zone is GMT-3 which means I need to add 3 hours from the time the message was posted so I can get the correct time as Azure works with UTC.

So to find the event in Azure Blob Storage the path is 2022 - 12 - 20 - 13 - 18 - 27

Image description

And opening the file, we can see the information we posted.

Image description

Project: https://github.com/reniciuspagotto/eventhub-producer-consumer

Did you like this post? Want to talk a little more about? Leave a comment with your questions and ideas or get in touch through LinkedIn.

Top comments (0)