Discussion on: Serverless circuit breakers with Durable Entities

View post

Hi Jeff,
This sounds great for many of our common problems, however I can see this almost works for our largest problem (but not quite). Think of the SQL Database in your example above and imagine the database is an Azure SQL Database with a set amount of DTUs assigned. You then have a queue as in your example. We would want to somehow throttle the message queue to consume as many DTUs as possible without just causing a flood of throttling errors (Http Status 429s).
Can you see a nice way to use this pattern to throttle rather than circuit break?
It really feels like there's an elegant way to use this to achieve this.
This is effectively the situation where the circuit closes again and potentially all the newly queued work while the circuit was open then overwhelms the database immediately.

Bryden

John Mason • Oct 27 '19

Hi Bryden

"We would want to somehow throttle the message queue to consume as many DTUs as possible"

Isn't this just limiting the number of items in the batch read from the queue based on load test from a perf environment? You could even make the number configurable and set it externally from scaling logic on your Azure SQL db

Jeff Hollan • Oct 21 '19

Thanks Bryden - I think what you’ve described above I spot more into rate limiting and throttling than circuit break. A feature we’ve wanted to implement on a more granular level - hopefully in upcoming months will at least have number of instance throttling in all plans (now just a premium plan feature)

Bryden Oliver • Oct 22 '19

Jeff,
While that will be useful, instance throttling is very much a blunt force instrument. We are currently looking at a resource based consumption throttler. So monitoring the use of each resource that gets used and has some sort of rate limit associated. Then we are building something that will handle that accordingly. I'll sit down and thrash out in detail whether we can get something working that would handle this nicely.
In particular the reason we are looking at this is that we are effectively partitioning our data across multiple rate limited pieces of storage, so in theory we'd like to avoid the situation where we are limiting based on our lowest throughput piece of storage.
Rather we'd like to circuit break those calls quickly and return a 429 or similar to the caller and continue to consume all of our available throughput on everything else.
At it's worst, the durable entities sound like a potentially better storage for this than the Redis cache we are using now.
I guess what I'm saying is that I suspect for most consumers, instance level throttling potentially won't cut it (also because a single instance is quite capable of completely overwhelming a downstream resource all on its own). So investigating a more granular level of throttling would be well worthwhile. For now, we are quite happy to continue investigating, but if you guys had some clever thoughts that might improve our direction that would be great.

Jeff Hollan • Oct 22 '19

Yes makes sense. Would be interested to learn more what you are thinking. Above the "blunt force" instance limiting we have been evaluating execution limiting, but what you are describe sounds even more granular than that. Almost something like "I have 400 locks for SQL, 2000 locks for Azure Storage -- hey functions, do your thing, but before you can run this line of code you need to make sure you have a lock first." Is that accurate?

Mathias • Mar 31 '20

How would you handle rate/throttling limits from a downstream api inside your azure function? You can try to retry but what if the retrying takes longer than the azure function default timeout. Some downstream api provide a retry-after time what if it exceeds the 5minute default timeout?