4

I'm dealing with a high throughput application of EventHub. According to the documentation, in order to achieve very high throughput from a single sender, then client-side batching is required (without exceeding the 256 KB limit per event).

Best Practices for performance improvements using Service Bus brokered messaging suggests Client-side batching for achieving performance improvements. It describes client-side batching is available for queue or topic clients, which enables delaying the sending of messages for a certain period of time, then it transmits the messages in a single batch.

Is client-side batching available in the EventHub client?

  • I've tried specifying MessagingFactorySettings,AmqpTransportSettings.BatchFlushInterval without any effect. – Attila Cseh Apr 19 '16 at 14:05
  • Is there any problem with use of SendBatch and SendBatchAsync methods? – Alex Belotserkovskiy Apr 20 '16 at 15:39
  • Please post the code used and a detailed description of the problem, including the full text of any exceptions (ie use Exception.ToString()). What does "without any effect" mean? Was there an error? Were no messages were sent? Or were the messages sent as individual messages instead of a batch? How did you check? – Panagiotis Kanavos Apr 20 '16 at 15:43
  • Also note that the batch can't go beyond 256KB. Did you try to send a larger batch perhaps? – Panagiotis Kanavos Apr 20 '16 at 15:45
6

ShortAns: EventHubs is designed to support very-high thruput scenarios - Client-side batching is one of the Key features to enable this. API is `EventHubClient.SendBatch(IEnumerable).

Long Story:

The link that you found: Best Practices for performance improvements using Service Bus brokered messaging applies to ServiceBus Queues & Topics - which uses a Microsoft Proprietary protocol called - SBMP - and is not an Open Standard. We implemented BatchFlushInterval in that Protocol. This was a while back (guess around 2010) - where Amqp protocol wasn't standardized yet. When we started building Azure EventHubs service - Amqp is the new Standard protocol for implementing performant messaging solutions and hence, we used Amqp as our first-class protocol for Event Hubs. BatchFlushInterval doesn't have any effect in EventHubs (Amqp).

EventHubClient translates every raw event that you need to send to EventHub into AmqpMessage (refer to Messaging section in the (Amqp Protocol Specification).

In order to do that, as per the protocol, it adds few extra bytes to each Message. The estimated Size of each Serialized EventData (to AmqpMessage) can be found using the property - EventData SerializedSizeInBytes.

With that background, coming to your scenario: Best way, to achieve very high-thruputs - is to use EventHubClient.SendBatch(IEnumerable<EventData>) api. The contract of this Api is - before invoking SendBatch - the caller need to make sure the Serialized Size of this Batch of messages doesn't exceed 256k. Internally, this API converts the IEnumerable<EventData> into 1 Single AmqpMessage and sends to EventHub Service. The limit on 1 single AmqpMessage imposed by EventHubs service as-of 4-25-2016 is 256k. Plus, one more detail - when the list of EventData are translated to a Single AmqpMessage - EventHubClient needs to promote some information into the BatchMessage header - which is common for all of those messages in the batch(info like partitionKey). This info. is guaranteed to be a max of 6k.

So, all-in-all, the caller need to keep track of the aggregate size of all EventData in the IEnumerable<EventData> and make sure that this falls below 250k.


EDIT ON 09/14/2017

WE added EventHubClient.CreateBatch API to support this scenario.

There is no more guess work involved in constructing a Batch of EventDatas. Get an Empty EventDataBatch from EventHubClient.CreateBatch API and then use TryAdd(EventData) api to add events to construct the Batch.

And, finally use EventDataBatch.ToEnumerable() to get the underlying events to pass to the EventHubClient.Send() API.

more on Event Hubs...

| improve this answer | |
  • Thank you for the accurate answer I was waiting for and the information for creating a custom implementation. – Attila Cseh Apr 26 '16 at 14:50
  • Why do you say "estimated Size of each Serialized EventData"? How can we get the exact size? If the size of my batch is estimated to be 200KB but it's actually 257KB then I have a problem... – Ohad Schneider Sep 14 '17 at 17:15
  • And if we can't get the exact size, some guarantee about the approximation of SerializedSizeInBytes would be great. For example if I knew the approximation was at most 1KB off per event, and I had 10 events, I could stop when I got to 240KB. – Ohad Schneider Sep 14 '17 at 17:21
  • 1
    Great stuff, thanks for updating! Note that I do not receive notifications unless you mention me (@OhadSchneider), I just happened to check back and saw this. BTW what's your @microsoft.com alias (I'm an MS employee too)? – Ohad Schneider Sep 17 '17 at 16:46
  • 1
    @OhadSchneider - client negotiates message size with service every time you create an EventHubClient object. This needs to be passed on to EventDataBatch and hence the non-static factory method... – Sreeram Garlapati Sep 18 '17 at 18:02
1

your links are accurate. There are SendBatch and SendBatchAsync methods of Event Hubs. https://msdn.microsoft.com/library/azure/microsoft.servicebus.messaging.eventhubclient.sendbatch.aspx?f=255&MSPPError=-2147217396

There is a nice article and extension by Paolo Salvatori as well.

| improve this answer | |
  • I require EventHub client functionality similar to ServiceBus Queues & Topics clients, which delay sending of messages for a certain period of time, then they transmit the messages in a single batch (not exceeding 256K). According to Sreeram, this functionality has not been implemented in the EventHub client, so a custom implementation is required. Paolo Salvatori's extension could be used for creating batches not exceeding the 256K limit, but the implementation is not accurate unfortunately and MessageSizeExceededException might be thrown. – Attila Cseh Apr 26 '16 at 14:45

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.