Introduction
You all probably know previews in a lot of apps. For example, when you share something in Microsoft Teams and you can preview the webpage without actually visiting it within your chat. For a while I have been wondering how the behaviour of preview bots and crawlers differs between the different services and what their possible security and privacy implications would be. End to end encryption would also mean, that the preview generation would need to happen client-side, as the server never has access to the message content to actually fetch a preview, for example.
Content
Experiment Setup
In order to properly track the requests I set up an nginx reverse proxy with the following paths:
- / (just a simple html as a standard usecase)
- /same-site-redirect (redirects to /)
- /external-redirect (redirect to https://example.com)
- /redirect-loop (redirect to /redirect-loop - this was a good one)
- /ssrf-redirect-magic (redirect to the magic ip )
- /ssrf-redirect-local (redirect to good ol' localhost)
For easier testing I have also configured the nginx to log JSON format and set up a small webserver that would read the JSON and display it in a nice data table. All the source code is more or less documented available on github.
Findings summary
1. All of the apps that issued requests, did this from their datacenter, meaning they have access to the message content in some form.
2. Whenever you post a link on twitter (aka X), the AppleBot will pay you a visit within a matter of seconds. At least they are kind enough to check the robots.txt
3.When you add a link to your instagram bio, the facebook bot will crawl it to get to know you better
4.The facebook bot handles the redirect-loop the worst (requested it like 20 times...). Interestingly it used different IP Adresses for handling the redirects.
5. When you click an instagram bio link, facebook automatically adds a parameter "fbclid" to track your interaction
6. Signal was one of the few apps that did not cause any request to the nginx
Results Table
Following some notes from the observations. Feel free to check out the referenced github repo and try it yourself.
Component | Request Source | Simple Request (http, https) | Follows Same Site Redirect | Follows External Redirect | Redirect Loop | SSRF Vulnerable |
---|---|---|---|---|---|---|
Snapchat | Amazon Server | Resolving http and https | Following | Following, resolved in preview | Requested 1x | Not showing magic link response, |
Telegram Android | Telegram Servers | Same as Telegram Web | yes | Unclear, not resolved in preview | Requested 2x | Unclear, not resolved in preview |
TelegramBot (like TwitterBot) | Telegram Servers | Server Side with “TelegramBot (like TwitterBot)” Resolving https only | yes | Unclear, not resolved in preview | Requested 2x | Unclear, not resolved in preview |
X / Twitter | Twitter Inc Servers | Retrieves the robots.txt in addition (Nice!) | yes | Unclear, not resolved in preview | Requested 1x Smartass! | Not following |
AppleBot | Apple Network Range | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot) Retrieves http | Not following | Not following | Requested 1x | Not following |
Gmail | / | No interaction | / | / | / | / |
Instagram Bio | Facebook Inc | Resolves http and https | Yes, with different IP | Unclear | Requested 20x | |
Signal | / | No interaction | / | / | / | / |
Bitwarden | / | No interaction | / | / | / | / |
Top comments (0)