DEV Community

Tomoyuki Aota
Tomoyuki Aota

Posted on • Updated on

When we read Stack Overflow, what does Stack Overflow read from us?

(A Japanese translation is available here.)

Before the Internet, a famous joke is created.

In America, you watch television.
In Soviet Russia, television watches you!

After the Internet, we watch the contents of the web services like Facebook. And, the web services watch us.

Companies have to notify the users of collecting information. This is not only due to the regulations but also to gain trust from the users.

Many web services inform us about the kinds of collected information but not the actual data. However, to gain our trust, Stack Overflow allows the users to download Personalized Prediction Data collected from the users' on-site activities. In this article, I'm going to download and explore my Personalized Prediction Data to see what it looks like.

Downloading and analyzing Personal Prediction Data

To download your Personalized Prediction Data, you need to log in to Stack Exchange and click the blue "Start download" button. (If you are not logged in, values in personalized sections will be null.)

The downloaded file is a JSON file which looks like the following:

JSON file

There is no document which explains what each value mean, so I looked at the data with some guesses.
The interesting data is the { tag: value } style data in root/Data/TagViews. This is a key-value pair where the key is the tag in Stack Overflow. For example, it looks like this.

TagViews : {
  c++: 440,
  javascript: 183,
Enter fullscreen mode Exit fullscreen mode

The top 20 tags will be like this.

Top 20 Stack Overflow tags

The tags and their values in this chart fit pretty well with my feelings of where I'm working on. It seems like Stack Overflow has a good guess of my current skills.

For details of the analysis, please read the Jupyter notebook in my GitHub repository. Clicking the badge below will take you to the notebook run by Binder.


It was interesting to see what Stack Overflow collected from my on-site activities. I hope other companies will do the same.

Top comments (0)