Lowering Latency with Realtime API Orchestration

#api #orchestration #architecture #softwareengineering

Today’s software applications run on the big three: cloud, microservices, and APIs. Although APIs are a lightweight, flexible, and easy-to-consume means of interconnecting multiple services and data sources, the complexity of orchestrating multiple API calls adds up quickly. In production, APIs come with a myriad of implementation concerns: how to route traffic, handle request spikes, eliminate cascading failures, and manage execution flow all at once.

Enter stage right: realtime API orchestration.

What is API orchestration?

API orchestration streamlines complex application flows by coordinating multiple API requests and responses to produce a concerted response for the end client, such as a web browser. This approach is indispensable for building responsive, user-first web applications, as it accelerates loading speeds and safeguards against failure scenarios.

Diagram of the API Orchestration layer that interfaces between the client and the backend APIs. — The API orchestration layer enables the centralized flow of multiple APIs, standardized failure handling, granular workflow observability, and more.

The need for speed

Consider loading a website application—your preferred investment platform or streaming service. A single page typically fetches data across multiple services, which must be aggregated before it is delivered to the end user.

Screen mockup of a booking user interface and the underlying APIs needed. — A single page fetches data across multiple services, which must be aggregated before it is delivered to the end user.

These API requests add up, and a longer load time translates into substantial business costs. According to recent surveys, a B2B site that loads in 1 second has a 3x conversion rate compared to a site that loads in 5 seconds and 5x compared to a site that loads in 10 seconds. Speed could be the make-or-break criterion for high-intent web pages, like log-in screens, transaction/checkout pages, or demo interfaces.

Bar chart showing decreasing conversion rate as load time increases. — Source: Portent

Strategies for realtime speed in API calls

What are some of the strategies you can use to improve your API performance?

Taming the tail latency

Your aggregate API is only as fast (or slow) as the slowest API call. One well-known way to handle tail latency is to hedge your API requests (making multiple requests) and use the request that completes the fastest. This ensures that the tail latencies don’t impact your overall API performance.

Parallel API requests

Send requests in parallel wherever possible to improve the throughput of your request volume.

Caching

A cache serves as a temporary, high-speed data storage layer for previously retrieved data, so that repeated requests do not need to be attempted again. Caching cuts precious processing time for the most frequent API calls, providing speed at scale.

Circuit breakers

The circuit breaker pattern blocks applications from sending traffic to services that are down until the service has recovered. This enables faulty services to recover and stabilize without being inundated with requests, shortening the downtime.

API orchestration

API orchestration provides a high-speed switchboard that executes, monitors, and governs multiple API calls simultaneously. An orchestration platform comes optimized for low latencies and equipped with features that enable teams to easily implement the various design patterns for realtime speeds.

Benefits of API orchestration

API orchestration is a full-scale solution for scaling API performance alongside an application’s growth. As the application complexity grows, the dependency graph of the APIs that need to work together also grows. And as that grows, it becomes harder to debug, detect failures, and gain insights into latencies.

More than just executing an intricate graph of API calls, API orchestration empowers teams to track, debug, and detect failures and performance issues.

Performance at every level—Complex API execution at realtime speed.
Increased reliability—Automated failure handling and fallback mechanisms.
Governance—Visibility into execution graphs for debugging and performance metrics for monitoring.
Developer productivity—Build and debug more quickly with version control, reusable configs, payload introspection.

API orchestration in practice: Orkes Conductor

Orkes Conductor—originally built at Netflix—is a well-known platform for orchestrating microservices. It lets you build distributed applications that are resilient and easily scale with both the volume and complexity of the services.

Let’s explore an example application flow that relies on complex API orchestration, built using Orkes Conductor.

Screenshot of example Conductor workflow that loads a list of financial assets to be added to a watchlist. — Example Conductor workflow.

In this example, we are loading a list of financial assets that can be added to a watchlist. This means retrieving the user data, the list of assets, and the current watchlist; followed by posting additional data when new assets are added to the watchlist.

Using Orkes Conductor, caching behavior can be easily implemented, which cuts the request time from mere milliseconds (10-40ms) to near-zero.

Diagram demonstrating how caching behavior works in Orkes Conductor. — With caching enabled, subsequent requests for commonly-used APIs take almost no time to execute.

Rate limits, retries, and timeouts can be natively configured on Conductor as well, ensuring that transient failures are automatically handled for every single execution.

Diagram demonstrating how failure handling works in Orkes Conductor. — The orchestration layer will retry failed task executions based on the failure handling configuration.

Last but not least, Conductor supports parallel requests, enabling non-blocking API calls to execute simultaneously. Static fork-joins are useful when the number of API calls is predetermined ahead of runtime, while dynamic fork-joins are especially handy when the number of calls is determined at runtime. For example, a dynamic fork is used when a user adds a number of assets to the watchlist, and all the PUT requests are executed in parallel, even if two or twenty assets are added.

Diagram demonstrating how parallel execution works in Orkes Conductor. — With parallel execution, the total execution time only takes as long as the longest request.

In this demonstration, we have seen how API orchestration works in practice. As shown, API orchestrators yield low latencies through a variety of features and capabilities beyond just coordinating requests.

Core capabilities of realtime API orchestrators

When deciding on an API orchestration platform, here are some key requirements to look out for:

Near-zero response time
High throughput
Customizable caching
Payload enforcement
Integration with event streams
Support for various API protocols and specifications (REST, gRPC, GraphQL)
Comprehensive failure handling implementation (circuit breakers, request hedging, rate limits, retries)

Wrap up

With the right tool for API orchestration, developers no longer have to spend time writing code for common API design patterns from scratch. This means more time to focus on the core recipe of your business capabilities while ensuring top-notch execution, speed, and reliability.

Conductor is an open-source orchestration platform for executing durable long-running flows, lightning-speed API calls, or any case in between. Check out the full list of features or start building for free with our Developer Playground today.