A deep dive into how LinkedIn exfiltrates browser extension data from their users' browser.
LinkedIn is actively spraying their users' browser with web requests and dom queries in an attempt to determine if certain browser extensions are installed. The data is then exfiltrated back to LinkedIn.
It is not clear what the data is used for or how it is used by LinkedIn. However, it is clear LinkedIn is targeting certain sales and recruiting tools. It is also common knowledge in the sales and recruiting communities that LinkedIn restricts or outright bans accounts based on the use of unapproved tools.
LinkedIn is within their right to detect malicious user behavior and take action. However, the means they are employing are problematic for a number of reasons:
- LinkedIn doesn't verify you are actually using the extension, they only check if the extension is currently installed and/or enabled.
- A number of the tools LinkedIn does not allow have legitimate uses and can be used on public web pages.
- The exfiltrated data could be further used for nefarious things such as browser finger printing.
Furthermore, the methods employed by LinkedIn to detect extensions take advantage of developer oversights and browser extension limitations. This comes across as shady to me.
I was initially turned on to LinkedIn's data exfiltration when I noticed a large number of failed web requests while visiting a LinkedIn profile. Initially, I thought my adblocker was blocking network requests. However, upon a closer look I noticed the web requests were not being sent across the web. They were actually being sent locally, to the browser itself. This can be seen clearly when viewing the network request's path. The paths were all being made using Chrome's own extension protocol. All requests began with
If you are curious what the spraying looks like, here's what caught my eye originally:
For the uninitiated, the Chrome extension protocol is used to make web requests directly to an installed browser extension. Typically, this is used by the extension itself to retrieve resources or assets. However, any resources listed in an extension's manifest under the
web_accessible_resources key are available to all web page contexts. Ironically, this section of the manifest was designed to minimize browser fingerprinting and protect the privacy of the extension user. Here's an excerpt directly from Google's own documentation regarding the web accessible resources:
Prior to manifest version 2 all resources within an extension could be accessed from any page on the web. This allowed a malicious website to fingerprint the extensions that a user has installed or exploit vulnerabilities (for example XSS bugs) within installed extensions. Limiting availability to only resources which are explicitly intended to be web accessible serves to both minimize the available attack surface and protect the privacy of users.
Unfortunately, Google's changes only minimized the attack surface and did not prevent browser fingerprinting or privacy violations. It is this very issue that is leveraged by LinkedIn to identify installed extensions.
My curiosity got the best of me and I set out to learn more about how LinkedIn was performing the scan and what they were doing with the data. If you want to see how I tracked down LinkedIn's hidden JSON data, check out my original post. The data samples are a bit large to be posting on Dev, but are available on my personal site.
However, I will dive into what LinkedIn is doing below!
The data contained within the hidden JSON is quite revealing. LinkedIn stores basic information for each unapproved extension. The name of the extension is encrypted with a basic Caesar cipher (I wonder who thought that was a good idea). More importantly each extension entry includes a path field and a dom field. These fields shed light on the two methods used by LinkedIn to identify extensions:
- Path Scanning. This is what LinkedIn is doing when they spray your browser with web requests attempting to identify any web accessible resources.
- Dom Scanning. This is what LinkedIn is doing when they search the dom for known extension side effects such as the presence of certain class names or ids.
- Upon page load, LinkedIn dispatches two
setTimeoutfunction calls. One for the dom scan and another for the path scan. The initial timeouts are set to the interval specified in the JSON file that was read from local storage.
- When the dom scan timeout is up LinkedIn scans the browser for known dom patterns. This appears to happen in all web browsers.
- When the path scan timeout is up LinkedIn sprays the browser for web accessible resources. This only happens in Google Chrome. This is likely an oversight by LinkedIn because Chrome extensions are easily ported and supported on Edge and Firefox and those browsers are also vulnerable to the web accessible resources issue.
- Whenever a scan is complete the extension's entry in local storage is updated. The date field is updated to the current time to mark the last time it was scanned. The entire local storage blob is then sent to LinkedIn's server via the /platform-telemetry/contentsecurity endpoint.
- If you delete the local storage data or block the update endpoint the scanning still occurs. However, a default JSON file is used. The default JSON blob is marked by version 0.1.0 (which is available in the original post). If you're interested in a non-default local storage blob you can check out one within the original post.
On a positive note, ad blockers such as uBlock Origin are now blocking web requests to
/platform-telemetry/contentsecurity. At the very least that prevents your extension data from being exfiltrated back to LinkedIn. With that being said LinkedIn could always update their endpoint to a non-blocked path to resume their exfiltration.
I have put together a browser extension that makes it simple to parse LinkedIn's local storage and determine which extensions they are scanning your browser for. The extension is called, "Nefarious LI". The extension is currently available for Firefox and Chrome. It's also open source. You can find the source code on my source code page.
Are there any takeaways for LinkedIn users? At the very least:
- Use an ad blocker for added privacy (especially if you have any extensions installed LinkedIn may not approve of).
- Consider using a browser besides Google Chrome.
If you are a developer of browser extensions there are certainly lessons to be learned from LinkedIn's behavior.
- Do not use web accessible resources. There is really little need for them and they are not worth the added security and privacy issues.
- Avoid side effects when using a content script. Do not modify the dom in a predictable manner. If you must leave side effects on the page at least get creative with it to hide your tracks.
- Consider using a browser (or page) action for complete isolation.
I hope you all enjoyed this post and were able to take something away from it.
What are your thoughts on LinkedIn's scanning of your browser and the exfiltration of your data? Do you know of any other sites doing this? I would love to hear about it!