Vickie Li for ShiftLeft

Posted on Jul 13, 2021 • Originally published at blog.shiftleft.io on Jul 13, 2021

API Security 101: Excessive Data Exposure

#softwaredevelopment #programming #cybersecurity #softwareengineering

Hey, I found your access tokens on your profile page.

You’ve probably heard of the OWASP top ten or the top ten vulnerabilities that threaten web applications. OWASP also periodically selects a list of top ten vulnerabilities that threaten APIs, called the OWASP API top ten. The current API top ten are Broken Object Level Authorization, Broken User Authentication, Excessive Data Exposure, Lack of Resources & Rate Limiting, Broken Function Level Authorization, Mass Assignment, Security Misconfiguration, Injection, Improper Assets Management, and Insufficient Logging & Monitoring.

Many of these vulnerabilities affect application components besides APIs as well, but they tend to manifest themselves in APIs. Last time, we talked about broken user authentication and how they affect API systems. This time, let’s dive into my favorite vulnerability to find in APIs: OWASP API #3, Excessive Data Exposure.

Why is excessive data exposure my favorite API vulnerability to find? Because I realized that I’ve been looking for it throughout my bug hunting and pentesting career, without even realizing that it’s one of the top vulnerabilities that affect APIs! Today, let’s talk about what these vulnerabilities are, how I usually look for them, and how you can prevent them.

OWASP API #3

What is OWASP API #3, Excessive Data Exposure, exactly? It’s when applications reveal more information than necessary to the user via an API response.

Let’s consider a simple use case of APIs. A web application retrieves information using an API service, then uses that information to populate a web page to display to the user’s browser.

        displays data requests data
user <----------------- application -------------------> API service
(browser) (API client)

For many API services, the API client applications do not have the ability to pick and choose which data fields are returned in an API call. Let’s say that an application retrieves user information from the API to populate user profiles. The API call to retrieve user information looks like this:

https://api.example.com/v1.1/users/show?user_id=12

The API server will respond with the entire corresponding user object:

{
  "id": 6253282, 
  "username": "vickieli7", 
  "screen_name": "Vickie", 
  "location": "San Francisco, CA", 
  "bio": "Infosec nerd. Hacks and secures. Creates god awful infographics.", 
  "api_token": "8a48c14b04d94d81ca484e8f32daf6dc", 
  "phone_number": "123-456-7890", 
  "address": "1 Main St, San Francisco, CA, USA"
}

You notice that besides basic information about the user, this API call also returns the API token, phone number, and address of that user. Since this call is used to retrieve data to populate the user’s profile page, the application only needs to send the username, screen name, location, and bio to the browser.

Some application developers assume that if they do not display the sensitive information on the webpage, users cannot see it. So they in turn send this entire API response to the user’s browser without filtering out the sensitive info first and rely on client-side code to filter out the private information. When this happens, anyone who visits a profile page will be able to intercept this API response and read sensitive info about that user!

Attackers might also be able to read sensitive data by visiting certain endpoints that leak information or perform a MITM attack to steal API responses sent to the victim.

Preventing excessive data exposure

Excessive data exposures happen when the API client application does not filter the results it gets before returning the data to the user of the application.

When APIs send data that is sensitive, the client application should filter the data before forwarding it to the user. Carefully determine what the application’s user should know and make sure to filter out anything the user should not be allowed to access. Ideally, return the minimum amount of data needed to render the webpage.

If the API allows it, you could also request the minimum amount of data needed from the API server. For instance, GraphQL APIs allow you to specify the exact object fields you need in an API request.

Finally, avoid transporting sensitive information with unencrypted traffic.

Hunting for excessive data exposure

I mentioned that I’ve always looked out for these vulnerabilities when I hunt for bugs. As a bug hunter and penetration tester, I got into the habit of grepping every server response for keywords like “key”, “token”, and “secret”. And more often than not, I’d find sensitive info leaks this way.

A lot of the time, these sensitive info leaks are caused by precisely the problem I described here: the server being too permissive and returning the entire API response from the API server instead of filtering it before forwarding it to the user.

Excessive Data Exposure is, unfortunately, extremely common. And when combined with OWASP API #4, Lack of Resources & Rate Limiting, they could become an even bigger issue. Next time, let’s look at the OWASP API top ten #4, Lack of Resources & Rate Limiting, and why and when they become issues. Next time, why you should not overlook those bug reports about the lack of rate-limiting.

What other security concepts do you want to learn about? I’d love to know. Feel free to connect on Twitter @vickieli7.

Want to learn more about application security? Take our free OWASP top ten courses here: https://www.shiftleft.io/learn/.