Adam Nathaniel Davis

Posted on Jul 18, 2020

API Sorrows

#api #rest #swagger #codequality

The longer I write code, the more it feels that my life can essentially be boiled down to the act of connecting APIs. If I'm not connecting to our own internal APIs, I'm fetching data from some sort of external/public API.

For the most part, this process seems to get progressively better every single year. I have horrible war stories to tell about SOAP and deep battle scars from XMLHttpRequest. But I survived those trials and the API world today feels so much more... accommodating.

But that doesn't mean that everything in API Land is Rainbows & Lollipops. There are still headaches lurking out there. And because this blog is my own, personal, unpaid, self-administered therapy, I'm going to spend a few minutes venting my spleen.

If you ever find yourself in a position to actually write endpoints, then I want you to listen up. Because I'm going to lay out a step-by-step guide whereby you can make all the consumers of those endpoints hate your guts.

Show Me The Dang Swagger

If you are publishing REST endpoints, I have one thing, and only one thing, to ask of you:

Where's the Swagger file???

If your answer starts to tail off into some diatribe about the detailed documentation site that your team spent six months building, Ima stop you right there and ask again:

Where's the Swagger file???

If you're bold enough to think that you can mollify me with some promise of Swagger docs that might be available after the endpoints are "officially" released, I'll make sure to show you the rising anger in my face before I say:

I'm sorry. Maybe we're speaking different languages. Because you definitely don't seem to be processing the words that are coming outta my mouth. So I'm going to ask one more time with all the civility I can muster: Where is the got-dang Swagger file???

In the last couple of years, I can't even tell you how many times I've been asked to integrate with some set of vendor/partner REST endpoints. And as soon as I ask, "Where's the Swagger file?" everyone looks at me like I showed up at a civil rights protest in blackface.

I understand that Swagger files are not the "end-all / be-all" of API documentation. But if you're publishing REST endpoints, they should be the starting point and the ending point of all documentation. If you wanna throw up some detailed "How to use our API" website... great! But don't you dare tell me that your DIY site is meant to be a replacement for good ol' fashioned Swagger files.

For REST endpoints, Swagger files are not a "nice to have". They are a basic requirement.

Leave the Writing To Stephen King

As much as I cherish a good Swagger file, this can also lead to a terribly-false sense of security. (And epic headaches down-the-road.) Some of my most painful programming experiences in the last several years occurred when I was given a Swagger file - and the Swagger file was an aspirational work of fiction.

I'm frequently required to integrate with vendors/endpoints that haven't yet been deployed. And you know what?? That's OK. Or at least, it should be OK. Because, as long as I have a detailed Swagger file at my disposal, I can crank out vast amounts of functionality designed to interact with your future endpoints.

Here's where that "OK" has fallen apart - badly.

Some vendor/partner sends me a Swagger file and I proceed to write vast quantities of code designed to interact with that API. Then the API goes live, and... my code doesn't work. NONE of my code works.

My bosses, and my clients, and anyone else monitoring my work immediately believe that I've "dropped the ball" and I start scrambling to figure out what went wrong. That's when I realize that the Swagger files I've been given were as accurate as a Trump speech.

But how can this be? Shouldn't Swagger files be a real-time reflection of the actual code?? Well... they are - if your Swagger files are automatically generated.

In other words, I've had too many experiences where someone on the API team was literally writing the Swagger files, by hand, to reflect how their API would presumably behave. Of course, by the time that I could finally hit their endpoints, in real time, the Swagger files I'd been given (and against which I was coding) were a complete fiction. The behavior of the live endpoints bore no resemblance to the responses that were defined in the Swagger files.

This is the 2020s, people. No one should be manually writing API documentation anymore. There are plenty of packages out there that will document your endpoints dynamically, and in real time, based on the actual code that you've promoted to production.

I'd bet good money that the documentation someone manually wrote for your application five years ago is, today, practically useless. Similarly, the act of manually writing API documentation is a complete-and-utter waste of time.

Don't write API documentation. Generate it. Automatically. I fully understand that this can still cause last-minute issues if you make last-minute changes to your code. But at a minimum, I can feel secure in the knowledge that, anytime I hit your dynamic API documentation site, I'm at least seeing an accurate representation of how your endpoints behave at this moment in time. If you're manually writing your API documentation, it may as well be scribbled, on schoolyard pavement, in chalk for all I care.

Manufacturing Bottlenecks

Those who write APIs spend a lot of time thinking about issues surrounding performance and usage. After all, an API is, basically, an open invitation for the whole world to bombard your servers. So it makes sense to be hyper-vigilante, right???

Well...

I've seen too many scenarios where the API architects were content to create rote, mindless limitations. These limitations often make sense when you look at individual API calls - in a vacuum. But when you widen the lens - just a little bit - it quickly becomes apparent that these limitations are actually harmful.

Lemme give you a real-world example from Spotify's API. [Note: If you've noticed that I'm picking on Spotify a lot lately, it's only because I've been building some new tools around their API. So it brings all of their shortcomings into sharp focus in my mind.]

All of the Spotify endpoints have limits on the total number of records that can be returned. For example, if you want to retrieve all of the tracks in a given playlist, there's an endpoint for that. But... that endpoint will retrieve no more than 100 tracks at a time.

Maybe that makes sense to you. After all, you don't want someone launching a single API call that returns, say, 10,000 records, right?? Well... think about the alternative.

You see, most people who use Spotify frequently have their music sorted into playlists. And those playlists frequently hold well-over 100 tracks each.

But if I'm writing a feature that's designed to interact with a given user's playlist, what are the odds that my feature will only need to grab the first 100 tracks from that playlist? Or the last 100 tracks? In fact, if I'm trying to do playlist management, what are the odds that any operation I execute can suffice to only know about 100 tracks in that playlist??

The far more likely scenario is that, if I'm building functionality that's designed to help you manage a playlist, then I probably need to know about all the tracks in that playlist. If I need to know all the tracks in a 500-track playlist, and the endpoint will only ever let me return 100 tracks at a time, then my application will have no choice but to make five consecutive calls to the same endpoint - probably in fairly-rapid succession.

So in this scenario, is the 100-track limit doing anything to aid the performance of your server? If one request must be converted into five rapid-fire requests, are your record limits actually serving their purpose???

To be clear, I understand that there are times when record limits make perfect sense. If, say, you have an API that allows people to search all songs recorded over the last century (which would be millions of records), then maybe it makes perfect sense that you would limit any particular search to a given number of records.

But when you're dealing with something like a user's playlist, you typically need to get all of the tracks in that playlist. And limiting the return set only forces the consumer to spawn more (expensive) round-trip calls to your endpoint.

Don't Be A REST Cultist

Look... I could write another long diatribe about what's great - and what sucks - about REST. And I assume that I probably will at some point in the future. But for now, suffice it to say that slavishly following every minute detail of the REST Purists' Bible can make life hell for your consumers.

A perfect example of this is what I call the REST 404 Paradox.

In theory, when a resource isn't available, you're supposed to return a 404. But, depending upon how you read the REST standards, this also means that you should return a 404 when a search returns no results. Quite frankly, that's annoying AF.

I thoroughly understand that this URL:

GET https://myapi.com/v1/users/e91781a4-21e7-427a-b970-d92fca15c556/

Will return a 404 if there is no user with that GUID.

But this URL gets extremely confusing if you return a 404:

GET https://myapi.com/v1/users?state=FL&lastName=Davis

Does the 404 from this URL happen because there are no users, in Florida, with the last name of Davis? Or does the 404 happen because there is no endpoint at this address??? There's no way to be absolutely sure.

REST also becomes a nightmare when the designers are super-duper anal retentive about ensuring that every single entity can only be returned under its own endpoint.

For example, consider this possible return from the /v1/users endpoint:

{
  user: {
    firstName: "Adam",
    lastName: "Davis",
    addresses: [
      {
        street: "101 Main Street",
        city: "Palookaville",
        state: "Idaho",
        postalCode: 32211
      },
      {
        street: "102 State Street",
        city: "Mainville",
        state: "Illinois",
        postalCode: 42218
      },
      {
        street: "103 Baluga Street",
        city: "Fishville",
        state: "Maine",
        postalCode: 53319
      }       
    ]
  }
}

There are plenty of "REST Acolytes" who would swear that this data model is "wrong" for a REST endpoint. They'll yell you down with the idea that the addressses should absolutely be their own endpoint.

And I'm not telling you that it's "wrong" to create a standalone address endpoint. But if the only reason you're creating that endpoint is to satisfy some nagging inner REST purity-check, then... you should think carefully about what you're doing.

Let me put this another way:

If the addresses only make sense in the context of their users, then creating a separate address endpoint may be an unnecessary headache - and it will force your users to make more calls against your endpoints and further stress your web servers.

Additionally, endpoint designers too-often assume that data can only live under one endpoint. For example, in the data set shown above, they'll demand that addresses are only returned under a standalone address endpoint OR they are only returned under the users endpoint.

But life doesn't always have to be so rigid. There's nothing theoretically wrong with the idea that there may be an address endpoint and addresses may be returned whenever you query a particular user.

Conclusion

APIs should be a service to your users. They should help technically savvy users to leverage and extend your functionality. If you make them jump through an inordinate number of hoops, you undermine the whole purpose for the API to exist at all.

Top comments (5)

David • Jul 19 '20 • Edited

Thoroughly enjoyed your rant, Adam. LOL.

I work on designing APIs from time to time and funnily enough, I've worked on two hackathon apps that use Spotify's API.

But for anyone that's worked with anything in software long enough, we can always invoke the "law of leaky abstractions" -- in this context, though, this basically amounts to someone (API designers/architects) making assumptions (which we all must sort of do), because there's no possible way to enumerate all the possible ways that end-users will consume an API's resources.

Let's revisit your operations on the playlist resource (from Spotify) -- with the limit being 100 tracks. I still think this is fair, because there may be an internal metric that shows that most playlists, save for the largely curated and disseminated ones (but even many of those) contain less than 100 tracks. I think your use case is more an exceptional one rather than the most common. Letting a few consumers do as you're doing (multiple requests) -- is actually okay. But I will admit, it could be an oversight as a sane default. One of the toy apps that I worked on, operated on playlists, but only removal and appending of tracks and its metadata, like name, description, # of tracks. I would imagine that there may be some underlying reason, perhaps data model related, that makes 100 the default...

My thoughts with regard to REST -- I'm pretty pedantic and prefer some deterministic rules versus bending and opening up the window for exceptions, but what you've cited is just poor API design (if it forces consumers to use it in an inefficient manner), as REST doesn't necessarily dictate that. I've worked with a bunch of different specs and a lack of specs, but designed with flexibility in mind -- from the API provider's perspective, they'd like to reduce size of payload sent over the wire and not send any unnecessary data (could introduce multiple subroutines) -- one way that I've deal with this in the past is to allow an option for eager-loading associations/relations with an includes query parameter against the resource's endpoint. Alternatively, in today's world, GraphQL putting the onus on the consumer to specify precisely what they need, seems to be a considerable advantage here.

Lastly, I'll note that a hand-rolled API introduction and overview should also be requisite, because to my knowledge, back-porting Swagger for existing APIs won't generate enough documentation necessary for people to understand what a resource is even if we can read all the fields. It's necessary to discuss authentication, rate limits, elevated provisions (if any) or roles, and which resources are accessible at what access levels. Designing public APIs are hard.

But if it brings you any comfort, I, too, use to rant about the same things until someone asked me these questions or made at least think about what a designer/architect was thinking at the time that didn't seem too outside the lines.

Adam Nathaniel Davis • Jul 19 '20

Love the feedback!

because there's no possible way to enumerate all the possible ways that end-users will consume an API's resources

Absolutely. Many of my rants originate from the fact that I'm forced to deal with the assumptions, made by others, about how I'll use the system. Of course, as a developer myself, I know that it's absolutely necessary to make assumptions. It can't be avoided. But I strongly believe that it's always a good idea to double- and triple-check those assumptions. Because ill-informed assumptions can make life hell for your end-users.

I think your use case is more an exceptional one rather than the most common.

This is an interesting point. On one hand, my (admittedly, anecdotal) experience is that most playlists do not contain fewer than 100 tracks. Because most of the people I've talked to about the headaches they have with Spotify are curating large playlists. Specifically, I've done a lot of tinkering with their broken shuffle functionality.

But I may be in my own demographic here. You see, if you're the type of person who's deeply concerned about your playlists' shuffle capabilities, then you probably have this concern because you have a large playlist. If your playlist contains, say, 30 tracks, then you may not care so much about which order they come up in.

I can confirm that there are a great many people who are deeply frustrated by Spotify's broken shuffle feature. But even that large population is probably a minority of Spotify's userbase. And for the majority of Spotify's userbase, their playlists may all be under 100 tracks.

one way that I've deal with this in the past is to allow an option for eager-loading associations/relations with an includes query parameter against the resource's endpoint. Alternatively, in today's world, GraphQL putting the onus on the consumer to specify precisely what they need, seems to be a considerable advantage here.

100% agree. From my perspective, you can make your API restrictive as hell - as long as you give the occasional consumer the ability to tailor the response. I'm fine with the idea that certain stuff doesn't come back by default. But it's frustrating when the only solution is for me to load up a series of successive API calls to get the "full" data set.

David • Jul 19 '20

But I may be in my own demographic here. You see, if you're the type of person who's deeply concerned about your playlists' shuffle capabilities, then you probably have this concern because you have a large playlist. If your playlist contains, say, 30 tracks, then you may not care so much about which order they come up in.

Fair! I do think there should be a forum or some way to voice such concerns through a developer advocate at the company. One thing that I really wanted, which Spotify likely won't give is track play count. It's been an issue filed against their API repo on GitHub that I chimed in on. Now, I'm wondering if your issue has been raised there, too. The "issue" associated with play count did receive responses from the Spotify team. I'm not a fan of these walled-gardens, especially as a paying customer. But going back to your issue, the onus is now put on you to write and maintain code to do extra work -- orchestrating multiple calls: essentially a reduce or concatenation and think about failure scenarios (if at scale) like what happens if 1/3 calls for an entire playlist fails? Nobody wants to write extra code when they don't need to....

It was only yesterday that I tweeted at Spotify and their SpotifyCares accounts about playlist management in the desktop application -- I can access the search bar with keyboard shortcut, but it doesn't expose search against my own playlists or my playlists for that matter. I was pretty frustrated and the work-around, having discovered on my own, was to create a folder and order the playlists that I always want immediate access to there as a way to "pin" it.

Adam Nathaniel Davis • Jul 22 '20 • Edited

Now, I'm wondering if your issue has been raised there, too.

FWIW, my issue has absolutely been brought up in their forum. And on their SpotifyCares accounts. The "answers" they give are... dismissive and insulting. (I reference this in the previous post, where I provide a link to one of my forum posts.) There are many forum threads started for this issue. Some of them go on for hundreds of pages. For the most part, Spotify ignores them or just provides a link to their 8-year-old blog article explaining their brilliant methodology. Far worse (IMHO), they've taken to simply marking the issues as "not an issue" or "implemented".

If you tell me, "We hear you, but it's not on our agenda to address this right now." I can actually deal with that. I get it. When you mark my issue as "not an issue" or "implemented" - after you've done nothing to address the issue, admittedly, it pisses me off.

I've actually come to realize that this is a standard feature of Spotify's "support". Even if you find something that's outright broken, good luck getting them to fix it. Or even to acknowledge it.

For another example of this, you used to be able to drag tracks from your History tab onto your playlists. Then, a few years ago, it just stopped working. No answer for why it doesn't work. It just doesn't. You can find references to it in the forums - but there is no fix action, nor any acknowledgment of the issue. One day it used to work. The next day it didn't. And if you don't like it... oh, well.

Adam Nathaniel Davis • Jul 22 '20

I can access the search bar with keyboard shortcut, but it doesn't expose search against my own playlists or my playlists for that matter.

Interesting finding. I might add it to my Spotify Toolz site (spotifytoolz.com). I'm in the process of building it right now, but the first feature (basic, random shuffling) is live. I'm currently working on putting up de-dup capabilities, because I often find that I've managed to get multiple copies of some tracks in my playlists and Spotify doesn't really have any efficient way to deal with that. The site will also have an improved mechanism for finding new music (cuz their track suggestions are another one of my gripes that could fill an entire blog...).

I might add a search-in-playlists feature. I could see that as being helpful.