swyx

Posted on Feb 18, 2018

Scraping my Twitter Social Graph with Python and Selenium

#datascience #python #selenium #javascript

I have been on Twitter for 9 years, but only just realized this: Twitter is at its best when used like Messenger or WhatsApp, not when it is used like Facebook.

shawn swyx wang 🇸🇬

@swyx

Twitter is both Facebook AND Messenger.

Twitter’s at its worst when you browse the main feed, polluted by the likes, quote-tweets & follows of people you don’t even follow.

Twitter’s at its best when you connect with people on a shared interest, that you know or will know IRL.

13:53 PM - 16 Feb 2018

0 2

In other words, I get the most out of Twitter when I use it to connect with real people with shared interests, not to keep up with news or companies or celebrities, and definitely not for arguing with random internet strangers.

Finding Dev Twitter

After 9 years (mostly dormant) on Twitter, I had amassed about 4000 Twitter follows. They reflect my background: Some are finance accounts, some musicians, some product/makers, some joke accounts, some devs. But in line with my realization above, I found myself wanting to decrease the noise and turn my Twitter usage into something that helps improve my new career.

For better or worse, a large proportion of the developer community is on Twitter. I only started getting involved in "Dev Twitter" about midway through my career change from finance to software engineering, but quickly got lost in the wild noise.

The state of Dev Twitter

Dev Twitter is wonderful: You can engage with senior developers, get help when you run into trouble, publicize your work, and even get jobs.

However Twitter can also be every bit the dumpster fire people make it out to be: A continuous cacophony of confusing context-light criticism-heavy comments covering everything from sports to politics to celebrities to politics to tech to politics to finance to politics to bots. Even outside of cheap jabs at politics, you also get occasional meltdowns on Dev Twitter that nobody really needs. (Javascript even has Horse_JS, a dedicated but loved troll account that calls things out!) It even prompted SecondCareerDev's Kyle Shevlin to formulate Twitter Rules of Engagement (which I highly recommend).

Now to be clear: I support political involvement. I also believe that people should have a diversity of interests, and should be free to openly disagree with each other. This post isn't about any that.

Twitter, like many social platforms, has a "trust me I know what's best for you" recommendation algorithm. As you scroll down your main feed, you see tweets from people who are followed by the people you follow. If you head to the Search tab (on the mobile app) and hit Connect, you see a list of people suggested by "Because you follow", "People you may know", and "Based on your activity" algorithms (the latter is the worst since it makes recommendations off a single data point). If you have used Twitter a bit, you will recognize the groupings these algorithms are making: Here's the "Women in Tech" group, here's the "massively popular content creators" group. While technically correct, a lot of the options end up just feeling wrong. I follow the ReactJS twitter account, and it suggests that I follow the Angular and EmberJS accounts. They are great frameworks but are simply not accounts I want to follow at this time. I'm no American football fan but I'd hazard that this same algorithm would suggest the Patriots account to a Seahawks fan too, the way it seems to think.

Anyway.

Twitter users complement this automated recommendation by retweeting others for exposure, and also calling them out in special posts. This even got its own special hashtag, known as #FollowFriday. Because bias happens, there are occasionally special posts like these from prominent community members helping out underrepresented groups. But it is very ad-hoc and manual.

So being a developer, the natural question arises: What if I take the recommendation algorithm into my own hands?

The basic idea

Developers are familiar with the idea that everything is a graph. Twitter is a manually explored, social graph of users with varying (even probabilistic) signal quality and an unclear, varying optimization function. The highest signal is a follow, which is more persistent, whereas likes, retweets, and replies are also signals but are more of a one-off nature. If you follow a bunch of people you consider to be high quality follows, then their follows have a better than random chance of being interesting to you too. There's no real term for "follow-squared" so I've taken to calling them "fofollows".

All this of course has more academic grounding than I am qualified to speak about, but basically you will want to look into Network Centrality Algorithms to see how academics formally define various measures of network centrality.

To be honest, I don't like the idea of defining a "good follow" by "number of fofollows". Because people (including myself) follow with a herd mentality, this overly biases towards celebrity culture, and disadvantages those who also put out quality content but for whatever reason have not yet gained recognition for it. So for example, this algorithm would favor someone famous who just sets up their twitter account to crosspost from instagram may get a ton of follows and likes and retweets, even though this person doesn't even use twitter. I would definitely favor someone who actually gives thoughtful replies to people but has far less follows. I have some ideas on how to do this but will only have space for addressing them in a future post. (I just wanted to register upfront that I know this is a very flawed algorithm, and invite constructive suggestions.)

The technical challenges

While I won't be quite able to solve society's ills in this post alone, there are some interesting things that we can do with the info we have:

AUTOMATION: first, we have to scrape our data from Twitter. This will be the majority of the value of this post if you are coding along.
ANALYSIS: second, we have to process the data to surface metrics that we want aka feature engineering
DISPLAY: lastly, we have to show the results in an easily understandable way so I (and interested others) can iterate on it and then finally act on it

These three things are very different skill sets and in a real company would be a bunch of different jobs for different people. But I'm just doing this on my own time to improve my own personal situation. So as ambitious as I'd like to be to produce an authoritative result, I'd frankly be happy with just a 10% better experience (not that that can even be measured).

AUTOMATION - Scraping Twitter

First off: I am no legal expert so proceed at your own caution. But let's just say Twitter has bigger bots to deal with than you if you write one.

Ok. Although I am a professional JS guy, and there are ways to do scraping in NodeJS, the Python scraping and number crunching ecosystem has simply been around for far, far longer, so that's what I'm going with.

To follow along, make sure you have Jupyter Notebook and the Anaconda distribution of Python. If you are completely new to Python/Jupyter Notebook you will need to find another tutorial to guide you through that, we are not doing introductory stuff here. the code snippets that follow correspond directly to Jupyter Notebook cells.

getting started with selenium and python

Now import all the stuff we are going to need (pip install anything you have missing):

%matplotlib inline
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import NoAlertPresentException
import sys

import unittest, time, re
from bs4 import BeautifulSoup as bs
from dateutil import parser
import pandas as pd
import itertools
import matplotlib.pyplot as plt

Now you can see we are going to use Selenium to do the automation. We will use it to automate Firefox so it can go on running in the background while we carry on in our normal browser (I know well over 60% of you use Chrome).

driver = webdriver.Firefox()
driver.base_url = "https://twitter.com/swyx/following"
driver.get(driver.base_url)

Swap out my username for yours. If you run this bit of code, it opens up Firefox to the twitter login page. If you log in with your own credentials, it then goes to your page of follows. The problem with scraping this page is that it is an "infinite scroll" page, so just scraping whatever loads on the first view isn't enough. You have to scroll down, wait for it to load, and scroll down again, and again and again until you load ALL your follows. You could try to get this from the official Twitter API but they only give you 15 requests every 15 minutes. So we scrape.

Once you're logged in, you can use the Firefox devtools inspector to look at the HTML tags and attributes that are of interest to you. If you're new to HTML/Devtools, that's ok too, but again I don't have the space to teach that here. Check out FreeCodeCamp, CodeCademy or MDN.

a basic infinite scroll strategy

The easiest way to automate the infinite scroll is to do something like this:

for i in range(1,230):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(2)
    print(i)

I have 4000 follows so I arrived at range(1,230) by just doing a few test runs and then calculating how many loops I needed to cover all follows. Since other people will have less or more follows than 4000, we will have to make this a dynamic strategy, and I will cover that below.

I use time.sleep(2) to allow for the page load to happen. This is probably longer than I need based on my high speed connection, but I chose to trade off longer automated execution time for a lower risk of not loading all the data I need. I also print my progress just as a way to indicate how far along I am in my process since it can sometimes be hard to tell how close I am to being done. In this case, this only takes about 8 minutes to run but we will be running future stuff for far longer and I wanted to explain the basic intuition.

saving the data

html_source = driver.page_source
sourcedata= html_source.encode('utf-8')
soup=bs(sourcedata)
arr = [x.div['data-screen-name'] for x in soup.body.findAll('div', attrs={'data-item-type':'user'})]
bios = [x.p.text for x in soup.body.findAll('div', attrs={'data-item-type':'user'})]
fullnames = [x.text.strip() for x in soup.body.findAll('a', 'fullname')][1:] # avoid your own name
d = {'usernames': arr, 'bios': bios, 'fullnames': fullnames}
df = pd.DataFrame(data=d)
df.to_csv('data/BASICDATA.csv')

This gives you a dataframe df that has the usernames, fullnames, and bios of everyone that you follow. Woohoo! You're done! right??

Nope. You're just getting started.

We now have to scale up what you just did for one user (you) to ALL your users.

Some quick automation math - say everything we just did took 10 minutes to do. 10 minutes x 4000 users = 40,000 minutes = 666 hours = 28 days!!! That's not impossible but is too high to be reasonable. How can we do this in reasonable time?

Parallelizing

The great thing about this scraping process is they can all happen concurrently. If we had 4000 machines, we could run each on a machine and have all 4000 done in ten minutes. But we don't.

How I addressed this is by splitting it up into 8 blocks of 500 users. Thats approximately 1.4 hours to do 28 days of work. Not too bad?

By the end of this section you will be doing total black magic with selenium:

shawn swyx wang 🇸🇬

@swyx

Automating twitter scraping with selenium... was a rough start but is turning out very well. Open Source is amazing!

08:29 AM - 11 Feb 2018

1 16

Spin up 8 different jupyter notebooks and log in on Twitter on each Firefox instance (see driver = webdriver.Firefox() above). Name them clearly so you dont accidentally confuse each notebook.

Now in each notebook, you can read the data you output from your initial run:

df = pd.read_csv('data/BASICDATA.csv', encoding = "ISO-8859-1")
arr = df.usernames

a dynamic infinite scroll strategy

dont execute this code but just to show you how to make the basic infinite scroll strategy above more dynamic:

    loopCounter = 0
    lastHeight = driver.execute_script("return document.body.scrollHeight")
    while True:
        if loopCounter > 499:
            break; # if the account follows a ton of people, its probably a bot, cut it off
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(2)
        newHeight = driver.execute_script("return document.body.scrollHeight")
        if newHeight == lastHeight:
            break
        lastHeight = newHeight
        loopCounter = loopCounter + 1

essentially, store the document height, and if it stops growing after you scroll to the bottom, then conclude you have reached the end (lastHeight == newHeight) and break out of the loop.

the parallelized code

and then you set your range appropriately for each notebook. So this book covers user 500 - 999:

for i in range(500,1000):
    currentUser = arr[i]
    print('now doing user ' + str(i) + ': ' + currentUser)
    driver.base_url = "https://twitter.com/" + currentUser + "/following"
    driver.get(driver.base_url)
    time.sleep(3) # first load
    loopCounter = 0
    lastHeight = driver.execute_script("return document.body.scrollHeight")
    while True:
        if loopCounter > 499:
            break; # if the account follows a ton of people, its probably a bot, cut it off
        if loopCounter > 0 and loopCounter % 50 == 0:
            print(loopCounter)
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(2)
        newHeight = driver.execute_script("return document.body.scrollHeight")
        if newHeight == lastHeight:
            break
        lastHeight = newHeight
        loopCounter = loopCounter + 1
    print('ended at: ' + str(loopCounter))
    html_source = driver.page_source
    sourcedata = html_source.encode('utf-8')
    soup=bs(sourcedata)
    temparr = [x.div['data-screen-name'] for x in soup.body.findAll('div', attrs={'data-item-type':'user'})]
    tempbios = [x.p.text for x in soup.body.findAll('div', attrs={'data-item-type':'user'})]
    fullnames = [x.text.strip() for x in soup.body.findAll('a', 'fullname')][1:] # avoid your own name
    d = {'usernames': temparr, 'bios': tempbios, 'fullnames': fullnames}
    df = pd.DataFrame(data=d)
    df.to_csv('data/' + currentUser + '.csv')

I want to be very clear what happens when so i err on the excessive site of logging. Every now and then when developing automation like this you will run into an error and you dont want to have to go back and restart hours of automation that ran fine. so the ability to pick up where you crashed is a good thing. (you could also implement better error handling but that would limit your ability to respond when errors happen and fix future errors.)

Collecting deeper data for first degree follows

The first time I did this, the above was all I did, but I soon found I wanted more data for my first-degree follows. So I fired up another notebook. This time I wanted to visit the "with_replies" page of each user to grab some data from their timeline. With this I can get some idea of "engagement" (total amount of comments, likes, and retweets of original content) and their positivity (sentiment score based on automated parsing of tweets to see if the account is primarily positive or negative).

Do the same login in firefox process as above, and then read in the raw data:

df = pd.read_csv('data/BASICDATA.csv', encoding = "ISO-8859-1")
arr = df.usernames

we are just using this for the list of usernames.

then we initialize the dataframe:

main = pd.DataFrame(data = {
        'user': ['swyx'],
        'text': ['text'],
        'tweetTimestamps': ['tweetTimestamps'],
        'engagements': ['engagements'],
        'name': ['name'],
        'loc': ['loc'],
        'url': ['url'],
        'stats_tweets': ['stats_tweets'],
        'stats_following': ['stats_following'],
        'stats_followers': ['stats_followers'],
        'stats_favorites': ['stats_favorites'],
    })

and now we go through each user's profile in the arr array:

def getTimestamps(x):
    temp = x.findAll('span', '_timestamp')
    if len(temp) > 0:
        return temp[0].get('data-time')
    else:
        return None
# now get the user's own timeline
for i in range(0,len(arr)):
    currentUser = arr[i]
    print('doing user:' + str(i) + ' ' + currentUser)
    driver.base_url = "https://twitter.com/" + currentUser + '/with_replies'
    driver.get(driver.base_url)
    html_source = driver.page_source
    dailyemail_links = html_source.encode('utf-8')
    soup=bs(dailyemail_links, "lxml")
    time.sleep(2)
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(1)
    # name
    name = soup.find('a', "ProfileHeaderCard-nameLink").text
    # loc
    temp = soup.find('span', 'ProfileHeaderCard-locationText')
    temp = temp.text if temp else ''
    loc = temp.strip() if temp else ''
    # url
    temp = soup.find('span', 'ProfileHeaderCard-urlText')
    temp = temp.a if temp else None
    temp2 = temp.get('title') if temp else None
    url = temp2 if temp2 else (temp.get('href') if temp else None)
    # stats
    temp = soup.find('a',{'data-nav': 'tweets'})
    stats_tweets = temp.find('span', 'ProfileNav-value')['data-count'] if temp else 0
    temp = soup.find('a',{'data-nav': 'following'})
    stats_following = temp.find('span', 'ProfileNav-value')['data-count'] if temp else 0
    temp = soup.find('a',{'data-nav': 'followers'})
    stats_followers = temp.find('span', 'ProfileNav-value')['data-count'] if temp else 0
    temp = soup.find('a',{'data-nav': 'favorites'})
    stats_favorites = temp.find('span', 'ProfileNav-value')['data-count'] if temp else 0
    # all text
    text = [''.join(x.findAll(text=True)) for x in soup.body.findAll('p', 'tweet-text')]
    # most recent activity
    alltweets = soup.body.findAll('li', attrs={'data-item-type':'tweet'})
    tweetTimestamps = list(map(getTimestamps, alltweets)) if len(alltweets) > 0 else 0
    # engagements
    noretweets = [x.findAll('span', 'ProfileTweet-actionCount') for x in alltweets if not x.div.get('data-retweet-id')]
    templist = [x.findAll('span', 'ProfileTweet-actionCount') for x in alltweets if not x.div.get('data-retweet-id')]
    templist = [item for sublist in templist for item in sublist]
    engagements = sum([int(x.get('data-tweet-stat-count')) for x in templist if x.get('data-tweet-stat-count')])
    main = pd.concat([main, pd.DataFrame(data = {
        'user': [currentUser],
        'text': [text],
        'mostrecentTimestamp': [tweetTimestamps],
        'engagements': [engagements],
        'name': [name],
        'loc': [loc],
        'url': [url],
        'stats_tweets': [stats_tweets],
        'stats_following': [stats_following],
        'stats_followers': [stats_followers],
        'stats_favorites': [stats_favorites],
    })])
    main.to_csv('data/BASICDATA_profiles.csv')

and now our main dataframe has all this more detailed data on each account! it is also exported to the BASICDATA_profiles.csv file.

ANALYSIS

While all that automation is going on, we can keep going on our main dataset!

Spin up a new jupyter notebook, this time just for data analysis. Import the usual stuff but this time we will also use Textblob for sentiment analysis, so go ahead and import TextBlob: from textblob import TextBlob

note that you will also need to download some corpuses for Texblob to work, but the error prompts when you run the below code will guide you to do the download fairly easily (its a one-liner in Anaconda).

We can do a bit of feature engineering on the meager data we get out of Twitter. In particular, we can try to:

categorize the kind of account (developer, maker, founder, etc)
guess the gender of the account (based on full name of the user) - people want to follow women in tech
rate the positivity of the accounts tweets - people want more positivity in their twitter feed.

These are all error prone but still worth a try if they can surface a better signal I can use.

df1 = pd.read_csv('data/BASICDATA.csv', encoding = "ISO-8859-1")
df2 = pd.read_csv('data/BASICDATA_profiles.csv', encoding = "ISO-8859-1").set_index('user')[1:].drop(['Unnamed: 0'], axis=1).drop(['tweetTimestamps'], axis=1)
df2['bios'] = df1.set_index('usernames')['bios']
arr = df1.usernames
jslist = [ 'react', 'webpack', ' js', 'javascript','frontend', 'front-end', 'underscore','entscheidungsproblem', 'meteor']
osslist = [' oss', 'open source','maintainer']
designlist = ['css', 'designer', 'designing']
devlist = [' dev','web dev', 'webdev', 'code', 'coding',  'eng',  'software', 'full-stack', 'fullstack', 'backend', 'devops', 'graphql', 'programming',  'computer', 'scien']
makerlist = ['entrepreneur', 'hacker', 'maker', 'founder', 'internet', 'web']
def categorize(x):
    bio = str(x).lower()
    if any(s in bio for s in jslist):
        return 'js'
    elif any(s in bio for s in osslist):
        return 'oss'
    elif any(s in bio for s in designlist):
        return 'design'
    elif any(s in bio for s in devlist):
        return 'dev'
    elif any(s in bio for s in makerlist):
        return 'maker'
    else:
        return ''
df2['cat'] = list(map(categorize,df2['bios']))
df2['stats_followers'] = list(map(lambda x: int(x), df2['stats_followers']))
df2['stats_following'] = list(map(lambda x: int(x), df2['stats_following']))
df2['stats-ratio'] = df2.apply(lambda x: x['stats_followers']/x['stats_following'] + math.sqrt(x['stats_followers']) if x['stats_following'] > 1 else math.sqrt(x['stats_followers']), axis=1) 
df2['stats-ratio'] = list(map(lambda x: min(200,x), df2['stats-ratio']))
df2['positivity'] = df2['text'].apply(lambda y: sum([x.sentiment.polarity for x in TextBlob(' '.join(y)).sentences]))
df2['eng_ratio'] = df2.apply(lambda x: math.log(int(x['engagements']))/math.log(x['stats_followers']) if int(x['engagements']) > 0 and int(x['stats_followers']) > 1 else 0, axis=1)

So if you check out df2 you now have a few fields that you can use. The 'cat' field represents our efforts to bucket our follows into distinct groups based on keywords in their bios. To the extent that no one person can really ever be put in one bucket this is a Sisyphean task, but we can try :) (if we were to apply some machine learning to this, a K nearest neighbors method might work here since we can break down the keywords using Textblob)

Here's how my categories broke out:

print(len(df2[df2['cat'] == 'maker'])) # 573
print(len(df2[df2['cat'] == 'design'])) # 136
print(len(df2[df2['cat'] == 'oss'])) # 53
print(len(df2[df2['cat'] == 'js'])) # 355
print(len(df2[df2['cat'] == 'dev'])) # 758

Ok, now we're getting somewhere.

We are also engineering a bunch of other metrics, for example the stats-ratio, which is the ratio of followers to following plus the square root of followers, subject to a max of 200. This is an arbitrary formula to allow the influence of high influence people, but to limit the influence of superstars.

eng_ratio is Engagement ratio which attempts to do something similar for the engagement (likes, retweets and comments of original content) as a ratio to followers (if you have more followers you naturally probably have more engagement anyway so its best to look at a ratio).

We're skipping a lot of work on analysis and feature engineering but that is what I have right now :).

DISPLAY

Ok this is actually the toughest bit. If I pull up and merge my fofollower data for the 355 twitter accounts classified as "js" devs, I get over 200,000 edges between source and destination:

import os.path
def getData(x):
    fp = 'data/' + x + '.csv'
    if  os.path.isfile(fp):
        temp = pd.read_csv(fp, encoding = "ISO-8859-1")[['usernames', 'bios']] 
        temp.columns = ['target', 'bios']
        temp['source'] = x
        temp['cat'] = list(map(categorize,temp['bios'])) # categorize the bios of the fofollows
        return temp
temp = list(map(getData, list(df2[df2['cat'] == 'js'].index)))
combined = pd.concat(temp) # all target-source relationships originating from 'js'

I can then display data however I choose:

screened = combined.groupby(by='target').count().sort_values(by='source', ascending=False)[:50][['bios']]
screened.columns = ['fofollow_count'] 
screened_with_combined_info = screened
screened_with_combined_info['bios'] = combined.groupby(by='target').first()[['bios']]
screened_with_combined_info['cat'] = combined.groupby(by='target').first()[['cat']]

formatting for markdown display...


df = screened_with_combined_info.reset_index()[['target','fofollow_count','cat','bios']]
df['target'] = df['target'].apply(lambda x: "[" + x + "](https://twitter.com/" + x + ")")
# Get column names
cols = df.columns

# Create a new DataFrame with just the markdown
# strings
df2 = pd.DataFrame([['---',]*len(cols)], columns=cols)

#Create a new concatenated DataFrame
df3 = pd.concat([df2, df])

#Save as markdown
df3.to_csv("nor.md", sep="|", index=False)

The Top 50 JS Dev Twitter Accounts

target	fofollow_count	cat	bios
dan_abramov	210	js	Working on @reactjs. Co-author of Redux and Create React App. Building tools for humans.
paul_irish	190	maker	The web is awesome, let's make it even better ? I work on web performance, @____lighthouse & @ChromeDevTools. Big fan of rye whiskey, data and whimsy
reactjs	189	js	React is a declarative, efficient, and flexible JavaScript library for building user interfaces.
addyosmani	181	dev	Eng. Manager at Google working on @GoogleChrome & Web DevRel ? Creator of TodoMVC, @Yeoman, Material Design Lite, Critical ? Team @workboxjs ??
sarah_edo	181	design	Award-winning speaker. Sr. Developer Advocate @Microsoft. @vuejs Core Team, Writer @Real_CSS_Tricks, cofounder @webanimworkshop, work: ?
rauchg	173		@zeithq
Vjeux	169	js	Frenchy Front-end Engineer at Facebook. Working on React, React Native, Prettier, Yoga, Nuclide and some other cool stuff...
mjackson	158	js	Thriller, founder @ReactTraining, creator @unpkg, organizer @shape_hq, member @LDSchurch
kentcdodds	157	js	Making software development more accessible · Husband, Father, Mormon, Teacher, OSS, GDE, @TC39 · @PayPalEng @eggheadio @FrontendMasters ?
sebmarkbage	157	js	React JS · TC39 · The Facebook · Tweets are personal
mxstbr	157	js	Cofounder @withspectrum Advisor @educativeinc Makes styled-components, react-boilerplate and micro-analytics Speciality coffee geek,?
ryanflorence	156	js	Owner http://Workshop.me and http://TotalReact.com
TheLarkInn	155	js	Speaker, engineer, #webpack Core Team, Developer Advocate, Farmer. Views are my own. TPM @Microsoft @MSEdgeDev @EdgeDevTools.?
jeresig	149	js	Creator of @jquery, JavaScript programmer, author, Japanese woodblock nerd (http://ukiyo-e.org ), work at @khanacademy.
sebmck	147	js	Australian I write JavaScript Married to @anagobarreto
_developit	145	js	Chrome DevRel at @google. Creator of @preactjs. Do more with less. http://github.com/developit
linclark	144	dev	stuffing my head with code and turning it into @codecartoons. also, tinkering with WebAssembly, @ServoDev, and a little @rustlang at @mozilla
sophiebits	143	js	I like fixing things. eng manager of @reactjs at Facebook. ex-@khanacademy. she/her. kindness, intersectional feminism, music.
floydophone	143	js	Co-founder & CEO @HelloSmyte. Ex-FB and Instagram. Worked on React.js.
jlongster	142	dev	Contracting as Shift Reset LLC. Working on @actualbudget. Created @PrettierCode. Ex-Mozilla. Enjoys functional programming.
ken_wheeler	141	oss	Director of OSS @FormidableLabs ? Professional American ? Manchild ? Dad ? @baconbrix's Dad ? All opinions are the opinions of Miller Lite ? @toddmotto fan
left_pad	140		A volunteer in the community and a steward of @babeljs. @Behance, @Adobe. Soli Deo Gloria
acdlite	140	js	@reactjs core at Facebook. Hi!
nodejs	137	js	The Node.js JavaScript Runtime
jordwalke	135	js	Maker of things: ReactJS. Working on: @reasonml. At: Facebook Engineering.
github	132	dev	"How people build software. Need help? Send us a message at http://git.io/c for support."
leeb	132	js	Making things at Facebook since 2008: React, GraphQL, Immutable.js, Mobile, JavaScript, Nonsense
BrendanEich	130	js	Created JavaScript. Co-founded Mozilla and Firefox. Now founder & CEO @Brave Software (https://brave.com/ ).
cpojer	129	dev	Formerly Pojer · Engineering Manager at Facebook · Metro · Jest · Yarn
rauschma	128	js	"JavaScript: blog @2ality, books @ExploringJS, training, newsletter @ESnextNews. ReasonML: tweets @reasonmlhub, newsletter ?"
wesbos	125	js	Fullstack Dev ? JS CSS Node ? https://ES6.io ? https://LearnNode.com ? http://ReactForBeginners.com ? http://JavaScript30.com ? Tips ? @KaitBos ? @SyntaxFM
wycats	125	oss	Tilde Co-Founder, OSS enthusiast and world traveler.
BenLesh	121	dev	Software engineer at @Google, #RxJS core team. Occasionally I act silly on the @moderndotweb podcast. Views are my own.
sindresorhus	120	oss	Maker of things; macOS apps & CLI tools. Currently into Swift and Node.js. Full-time open sourcerer. Started @AVA__js.
tjholowaychuk	119	dev	Founder & solo developer of https://apex.sh , not a startup. https://github.com/tj https://medium.com/@tjholowaychuk . Asya's.
Una	118	dev	Director of Product Design @bustle, Google Dev Expert, & cohost @toolsday. Prev UI Eng @digitalocean @ibmdesign. Travel life: http://Instagram.com/unakravets
peggyrayzis	117	oss	Exploring the world through code, travel, and music Open Source Engineer @apollographql
elonmusk	117
jaffathecake	115	maker	Googler. I want the web to do what native does best, and fast. No thoughts go unpublished. 'IMO' implicit.
youyuxi	115	js	Design, code & things in between. Full-time open source. Creator @vuejs, previously @meteorjs & @google, @parsonsamt alumnus.
jdalton	113	js	JavaScript tinkerer, bug fixer, & benchmark runner ? Creator of Lodash ? Former Chakra Perf PM ? Current Web Apps & Frameworks PM @Microsoft.
samccone	113		harbourmaster @google
markdalgleish	113	design	CSS Modules co-creator, @MelbJS organiser. Full-stack ECMAScript addict, UI design enthusiast, coffee drinker DesignOps Lead at @seekjobs
thejameskyle	112
tomdale	112	js	JavaScript thinkfluencer
_chenglou	109	js	There's an underscore before my name
mathias	107	js	I work on @v8js at Google and on ECMAScript through TC39. JavaScript, HTML, CSS, HTTP, performance, security, Bash, Unicode, i18n, macOS.
iam_preethi	106	dev	Blockchain Engineer. Building a new company (Schelling). Alum @coinbase @a16z @GoldmanSachs. Passionate about blockchain & crypto. Avid?
threepointone	106	js	Entscheidungsproblem
JavaScriptDaily	105	js	Daily JavaScript / JS community news, links and events. Go to @reactdaily for React news.

These are the top 50 JS devs followed by other devs! Whoo! not a bad place to get after 4100 words, eh?

I of course have much more data analysis to do but I will put up the results in a separate post, with more engagement and follow ratio data split by gender, location, and so on. Shameless plug time: follow me if you want to get notified when I put that out!

What else can you do with your data? Post it up somewhere and I'd love to tweet it out!

This is the first post in a possible series on applying data science to scraping and analysing my Twitter network graph so I might follow up with more on the finer details about doing this. Please let me know your feedback in the comments so I can incorporate it in future post.