I think most API design advice is too technical. Developers get sidetracked by discussions about what "real" REST is, whether HATEOAS is the right thing to do, and so on. In this post, I'll try to cover everything I know about designing good APIs.
When designing an API, it is important to balance clarity and flexibility.
This is true of systems, and it is even more true of APIs: good APIs are boring . An interesting API is a bad API (or at least it would be better if it were less interesting). APIs are complex products to their developers, taking a lot of time to design and improve. But to the people who use them, they are tools they need to do some other task. All the time they spend thinking about the API instead of thinking about the task is time wasted. From their perspective, the ideal API should be so familiar that they can more or less use it before they even start reading the documentation 2 .
However, APIs are very different from most software systems in that APIs are hard to change . Once an API is published, people start using it, and any changes to the interface will break their software. Of course, it is possible to make changes . But (as I will discuss below) every change comes with a significant cost: every time you force users to upgrade their software, they seriously consider switching to a different, more stable API. This greatly incentivizes API designers to design their APIs carefully and get it right the first time.
This pressure creates an interesting dynamic for API engineers. On one hand, they want to create the simplest API possible. On the other hand, they want to use smart solutions to maintain flexibility in the long term. In short, API design is a trade-off between these two incompatible goals.
We don't break user space
What happens when we need to make changes to the API? Additive changes, such as adding a new field to the response, are usually fine. Some consumers will break if they get more fields than they expected, but I think that's irresponsible behavior. API consumers should be expected to ignore unexpected fields (reasonable typed languages that parse JSON do this by default).
However, you cannot remove or change the types of fields. You cannot change the structure of existing fields (for example, move them user.addressto in user.details.addressa JSON response). If you do, every block of code that depends on those fields will immediately break. Consumers of that code will report it as a bug, and the maintainers of the code will rightfully be angry at you for intentionally breaking their software.
The principle that applies here is in the style of Linus Torvalds' famous slogan: WE DON'T BREAK USER SPACE . If you're an API maintainer, you have a kind of sacred duty: you must avoid hurting downstream consumers. This law is very strong because so many programs depend on so many APIs (which in turn depend on upstream APIs, and so on). One careless API maintainer high enough upstream can break hundreds or thousands of programs downstream.
It's never a good idea to change an API just because it's prettier or because it's a little ugly. A famous example: the "referer" header in the HTTP spec is the word "referrer" with a typo, but it wasn't changed because we're not breaking user space.
Making API changes without breaking user space
Frankly, it's hard to think of examples where an API truly requires breaking changes. But sometimes the technical value of a change is so high that you decide to take the risk and implement it anyway. How do you change an API responsibly in these cases? That's where versioning comes in .
By versioned APIs, we mean that we will handle both the old and new versions at the same time. Existing consumers can continue to use the old version, and new consumers can decide to use the new version. The easiest way to do this is to add something like to the API URL /v1/. OpenAI's chat API is located at v1/chat/completions , so if the company decides to completely redesign it, they can do it in v2/chat/completionswithout breaking anything for existing consumers.
Once the old and new versions are running at the same time, you can offer users the opportunity to upgrade to the new version. This will take a long time : months or even years. Even if you have banners on your website, documentation, emails, and headers in your API response, when the old version is finally removed, there will still be a lot of angry users complaining that you broke their software. But at least you tried to do something about it.
There are plenty of other ways to implement API versioning. The Stripe API implements versioning in the header and allows accounts to set their default version in the UI. But the principle remains the same — all Stripe API consumers can be confident that Stripe hasn't decided to break their apps, and that they can upgrade versions at their own pace.
I don't like API versioning. I think it's a necessary evil at best, but it's still an evil. It confuses users, who have a hard time finding API documentation without checking that the version selector matches the version they're using. And it's a nightmare for maintainers. If you have thirty API endpoints, each new version adds thirty new endpoints to support. Soon you have hundreds of APIs to test and debug, and support for their users.
Of course, adding a new version doesn't double the size of the codebase. Any reasonable API versioning backend has some translation layer that turns the response into one of the versions of the public API. Stripe has something similar : the business logic itself is the same for all versions, so versioning is only taken into account when serializing and deserializing parameters. However, such abstractions always leak.
The success of your API depends entirely on the product
The API itself doesn't do anything. It's a layer between the user and what they actually want. In the case of OpenAI, the API is the ability to do inference using a language model. In the case of Twilio, the API is sending SMS messages. No one uses an API just because the API itself is beautifully designed. They use it to interact with your product . If your product is valuable enough, users will switch even to a terrible API.
This is why some of the most popular APIs are so terrible. Facebook and Jira are notorious for having terrible APIs, but it doesn’t matter if you want to integrate with Facebook or Jira (and you do), you’ll have to spend time understanding them. Yes, it would be great if these companies had better APIs. But why invest the time and money if users still want the integration? Writing good APIs is hard.
In the rest of this post, I'll give a lot of specific advice on how to write good APIs. But it's worth remembering that most of the time, it doesn't matter. If your product is desirable and popular, a barely working API will do; if it's unloved, a good API won't help. API quality is a non-essential feature: it only matters when a user is choosing between two essentially equivalent products.
However, the issue of API availability is a completely different story. If one product has no API at all, then that is a serious problem. Technical users will demand that you implement some way to integrate through code with the software they buy.
Poorly designed products usually have bad APIs
A technically sound API won't save a product that no one wants to use. However, a technically poor product makes it nearly impossible to create a beautiful API . This is because API design is typically based on the product's "core resources" (e.g. Jira resources are issues , projects , users , etc.). When these resources are poorly implemented, the API becomes ugly.
For example, consider a blogging platform that stores comments in memory as a linked list (each comment has a field nextpointing to the next comment in the thread). This is a terrible way to store comments. A naive solution for attaching a REST API to this system would be something like this:
GET /comments/1 -> { id: 1, body: "...", next_comment_id: 2 }
Or, worse, like this:
GET /comments -> {body: "...", next_comment: { body: "...", next_comment: {...}}}
This example may seem silly, because in practice we would simply iterate over the linked list and return an array of comments in the API response. But even if we were willing to do this extra work, how far would we iterate? In a thread with thousands of comments, wouldn't it be impossible to get any comments after the first few hundred? Would your comment retrieval API be forced to use a background task, turning the interface into something like this?
POST /comments/fetch_job/1 -> { job_id: 589 } GET /comments_job/589 -> { status: 'complete', comments: [...] }
This is how some of the worst APIs are created. Technical limitations can be cleverly hidden in the UI and exposed in the API, forcing API consumers to understand the system architecture much more deeply than necessary.
Authentication
You should allow people to use your APIs with a long-lived API key. Yes, API keys are not as secure as short-lived credentials like OAuth (which you should probably support, too). But that doesn't matter. Every integration with your API starts life as a simple script, and using an API key is the easiest way to get a simple script working. You should make it as easy as possible for developers to get started with your API.
While API consumers will write code, many of your users won’t be professional developers . They might be salespeople, product managers, students, hobbyists, and so on. When you’re an engineer at a tech company building an API, it’s easy to imagine that you’re building it for people like you: competent, professional, full-time software developers. In fact, you’re not. You’re building it for a broad cross-section of people, many of whom have trouble reading or writing code. If your API requires users to do something complicated, like perform an OAuth handshake, many of them will have trouble.
Idempotency and retries
When an API request succeeds, you know what it was trying to do. But what if it failed? Some types of failures tell you what happened: 422 usually means that the failure occurred during the request validation stage, before any action was taken 3 . But what about 500? What about a timeout?
This is relevant for API operations that perform actions . If you call a Jira API to create a comment on an issue, and the request returns a 500 or times out, should you try resubmitting it? You don't know for sure whether the comment was created, because the error may have occurred after the operation. If you retry, you may end up posting two comments. And it's even more important when there's more at stake than a Jira comment. What if you're transferring money? Or prescribing medication?
The solution to this problem is idempotency , the ability to safely repeat a request without creating duplicates. The standard is to support an "idempotency key" in the request (say, a user-defined string in a parameter or header). When the server receives a "create comment" request with an idempotency key, it first checks whether it has seen such an idempotency key before. If so, it does nothing; otherwise, it creates the comment and then stores the idempotency key. This allows the user to send as many retries as they want: if they all have the same idempotency key, the operation will only be performed once.
How should the key be stored? I've seen cases where it's stored in some kind of durable, resource-bound way (like a column in a table comments), but I don't think that's strictly necessary. The easiest way is to store it in Redis or some similar key-value store (where the key is the idempotency key). UUIDs are unique enough that they don't need to be scoped to each user, but that's fine. If you're not doing payments, you might even want to expire them after a few hours, since most retries happen right away.
Do we need idempotency keys for every request? They are not needed for read requests because double reading won't do any harm. They are also usually not needed for delete requests because if you delete a resource ID, that ID serves as an idempotency key. If we send three requests DELETE comments/32 in a row, we will not delete three comments. The first successful request will delete the comment with ID 32, and the remaining requests will return 404 because they can't find the already deleted comment.
In most cases, idempotency should be optional. As discussed above, you want to make sure your API is understandable to non-technical users (who often find idempotency a complex concept). In general, getting more people to use your API is more important than the occasional duplicate comment from users who haven't read the documentation.
Security and Request Rate Limiting
Users interacting with your UI are limited by their typing speed. If a flow is expensive for your backend, a malicious or careless user will be able to run that flow no faster than the speed of clicks. The situation is different with APIs. All operations exposed via APIs can be called at the speed of code.
Be careful with APIs that do a lot of work in a single request. When I worked at Zendesk, we had an API that sent notifications to all users of a particular app. One clever third-party developer5 exploited this to create an in-app chat system: every message sent a notification to all other users in the account. When accounts had a large enough number of active users, this hack consistently killed the apps' backend server.
We didn't anticipate that anyone would build a chat app on top of this API. But once it was publicly available, people could do whatever they wanted with it. I've dealt with many incidents where the root cause was some client integration doing stupid things like:
- Creating and deleting the same entries hundreds of times a minute without any benefit
- Infinite polling of a large endpoint /index without pauses
- Import or export multiple data without stopping in case of errors
Rate limits should be imposed on API requests, and the more expensive the operations, the stricter the limits should be. It is also wise to provide the ability to temporarily disable the API for specific clients, to relieve the load on the backend system in cases of increased stress.
Add rate limit metadata to API responses. Headers X-Limit-Remaining both Retry-After give clients the information they need to use the API respectfully and allow them to enforce rate limits if needed.
Pagination
Almost every API needs to handle a large list of records. Sometimes this list is extremely long (for example, /tickets Zendesk's API can contain millions of tickets). How can these records be transferred?
A naive solution would SELECT * FROM tickets WHERE... clog up all available memory (if the data is not in a database, this would happen in the application layer where you would try to serialize a list with a million elements). We simply cannot pass all tickets in one request. Pagination is required .
The simplest way to implement pagination is to use pages (or, more generally, "offsets"). When we access , /ticketswe pass the first ten tickets to the account. To get more, we access , /tickets?page=2 or /tickets?offset=20. This is easy to implement because the server can simply append , to the end of the database query OFFSET 20 LIMIT 10. But it doesn't scale well with very large numbers of records. Relational databases have to calculate the offset each time, so each page passed will be slightly slower than the previous one. By the time the offset gets into the hundreds of thousands, this becomes a real problem.
The correct solution is "cursor-based pagination". Instead of passing offset=20 to get the second page, we take the last ticket on the first page (say, with ID 32) and pass cursor=32. The API then returns the next ten tickets, starting with ticket number 32 . The query does not use OFFSET, it is WHERE id > cursor ORDER BY id LIMIT 10. This query is equally fast whether you are at the beginning of the list or hundreds of thousands of tickets later, because the database can instantly find the (indexed) position of the cursor ticket instead of calculating the entire offset.
For databases that are likely to become large, you should always use cursor-based pagination. This is a harder concept for consumers to understand, but when scaling issues start to arise, you will likely be forced to switch to cursor-based pagination, and the cost of making such a change is often very high. However, for other cases, page-based or offset-based pagination is perfectly acceptable.
It's usually a good idea to add a field to API responses with lists next_page. This saves consumers from having to figure out the next page number or cursor themselves.
Optional Fields and GraphQL
If some parts of the API response are expensive to process, make them optional. For example, if the backend must make an API call to get the user's subscription status, you can make the endpoint /users/:id not return the subscription status unless the request passes a parameter include_subscription. In general, you can implement an array parameter includes with all optional fields. This is often used for related records (for example, you can pass a includes: [posts] to the user's request to get the user's posts in the response).
This is one of the principles behind GraphQL , an API style where instead of calling different endpoints for each operation, we create a single request with all the necessary data, and the backend handles it for us 6 .
I don't particularly like GraphQL for three reasons. First, it's completely opaque to non-engineers (and many engineers). Once you learn it, it's a common tool, but the barrier to entry is much higher than GET /users/1. Second, I don't like giving users the freedom to create arbitrary queries. It makes caching more difficult and increases the number of edge cases you have to account for. Third, in my experience, implementing the backend requires a lot more setup than a standard REST API.
I don't have a lot of negativity towards GraphQL. I've been working with it in various contexts for about half a year, so I'm by no means an expert. I'm sure it provides enough flexibility in some cases to make the cost worth it. But for now, I'd only use it when absolutely necessary.
Internal APIs
Everything I've said so far applies to public APIs. What about internal APIs that are used only by colleagues within the company? Some of the assumptions I've made don't apply to internal APIs. For example, their consumers are usually professional software developers. They're also easy to break because (a) there are often orders of magnitude fewer users, and (b) you can release new code to all of those users. You can even add an authentication form if you want.
However, internal APIs can still be sources of incidents, and they must be idempotent for key operations.
Let's sum it up
- APIs are hard to develop because they are not flexible, but they must be easy to learn.
- The primary responsibility of API maintainers is NOT TO BREAK USER SPACE. Never make breaking changes to public APIs.
- API versioning allows for changes, but it creates serious barriers to implementation and adoption.
- If your product is valuable enough, the quality of the API doesn't really matter, people will still use it.
- However, if your product is poorly designed, then no matter how carefully you design your API, it will likely still be bad.
- Your API should support simple API keys for authentication because many users will not be professional developers.
- Requests that perform actions (and especially critical actions like payments) should include some kind of idempotency key to secure retries.
- Your API will always be a source of incidents. Implement rate limits and a kill switch.
- Use pagination with cursors for datasets that have the potential to become very large.
- Make expensive fields optional and disabled by default, but GraphQL is overkill (in my opinion).
- With internal APIs the situation is a bit different (because the consumers are very different). What I didn't cover? I didn't write much about REST vs. SOAP or JSON vs. XML because I don't think it's that important. I like REST and JSON, but I can't say that nothing else is worth using. I also didn't talk about OpenAPI Schema - it's a useful tool, but I think it's perfectly fine to write API documentation in Markdown if you want.
Top comments (0)