Are You a Connector?

Viet Nguyen — Wed, 16 Sep 2020 09:37:29 +0000

TLDR; “Connectors” are people who genuinely want to help others by connecting you to their network.

How did you get that remote job or that awesome apartment in a highly competitive rental market?

Looking back at my career trajectory I can see two patterns emerging, great opportunities came by either pure luck or by knowing someone outside of my immediate circle.

In this post, I want to talk about the latter, the "outsider" network effect.

The Backstory

How I Became a Remote Developer since 2012

I have lived in New York city for seven years. In the end I knew I was ready to move west to Oregon, literally speaking, to a greener place where I can live closer to the mountains.

I started my job search, also asking people I knew well if they knew of anything.

Most of my friends were living and working in the city. They had none to very few contacts at companies out west. Nothing.

One day during a water cooler chat, someone mentioned Maria had left for another company where she's now happily working full-time from home.

Maria and I were in the same department. We were working on different projects under different managers . Besides occasional small talks when we happened to share the elevator, there was little interaction between the two of us. It's safe to say we had very little in common.

Two weeks later, I was able to get a hold of Maria via email, first to congratulate her on the new job, and also to inquire about the work-from-home part. Lo and behold, I learned that her company was still hiring. After several rounds of interviews, I got the job and joined a fully remote team.

The Connector

In his book “The Tipping Point”, Malcolm Gladwell calls people like Maria “connectors”.

Connectors are uniquely important people. Due to their supportive nature they tend to have cultivated a large or diverse network. And the key importance here is: they are willing to introduce us to their network. According to Gladwell, people within our close circles are generally similar to us in terms of interests and social economic background. Therefore, we are likely aware of similar opportunities.

For that reason, my groups of friends in New York while appeared to be diverse, in fact, had a lot in common with me. We enjoyed going to similar restaurants, discussing the same TV series, and sharing similar hobbies on the weekend.

If they knew of a new opportunity, chances were it's going to be a position in tech in the city.

Connectors, on the other hand, are outside of our immediate circles. There is little overlapping between their networks and ours. Connectors are exposed to potential opportunities that we and our friends are not aware of.

Not All Networkings Are Equal

Have you met someone at a party and they said, “Oh, you should definitely talk to X! I'll introduce you two.”

When people offer to make an introduction, they see potential in you, or values in your ideas, and are willing to take a reputation risk to connect you to their network, said Rob Fitzpatrick in his book for startup founders, The Mom Test: How to Talk to Customers and Learn If Your Business is a Good Idea when Everyone is Lying to You.

How to Spot a Connector

Connectors are not unicorns. They can be a friend of a friend or someone in your LinkedIn extended network. While you don't run into them very often, when you do, it's not too difficult to recognize one. They are the conduit to new potentials and resources beyond what are currently available in our immediate network.

Last but not least, what is the best way to thank your connectors? I think besides thanking them directly, one of the best ways is to pay it forward and become a connector yourself.

Mt. Hood, Oregon (photo by adrian)

Cover photo by Noah

Building a Rock Climbing Route Recommendation Engine

Viet Nguyen — Tue, 15 Sep 2020 12:27:05 +0000

“I'm going to Vegas to spend more time in the mountains.” Said no one ever unless you're a rock climber.

A Climber's guide to building recommendation engine in Python

Part I - Introduction

Recommendation engines or recommendation systems are everywhere you turn, from music, movie and product recommendations, to what kind of current events and news shown to us.

Would it be cool if rock climbers ourselves can build a climb recommendation engine to discover similar and interesting climbs?

After reading this tutorial you can build one of such engine yourself in Python.

Note: Jupyter notebook for this tutorial is available on Github.

Before we start, let’s review four common ways to build a recommendation engine:

Popularity-based recommenders: The simplest of all. You make a list of items based on some criteria such as video view count or song play count. Twitter trending topics or Reddit comments are good examples of this approach.
Content-based recommenders: This type of system takes one item as an input and suggests similar items based on a set of characteristics. For example, given a movie, the recommendation engine finds similar movies based on genre, directors and actors.
Collaborative-filtering recommenders: You may have seen “People who did X also did Y” suggestions when browsing Amazon. Based on past behaviors or preferences of other users, the engine can predict similar choices.
Hybrid recommenders: Lastly, there are no reasons why we can’t combine two or more of the above recommenders. In fact, hybrid recommenders are commonly used in real-world applications as research has shown they can produce more accurate suggestions. Spotify’s New music recommendations is one good example of this approach. For example, first we take the list of this month’s popular songs (popularity-based recommender) and suggest only songs that are similar by genre and artists (content-based recommender).

In this tutorial we are going to build a Collaborative-filtering recommendation engine.

About The Dataset

The dataset contains 84,786 ratings from 7,950 users on 3,161 climbs in Nevada, USA where the well-known Red Rock Canyon climbing area is located. The reason Nevada was chosen was simply because I have climbed there for several seasons.

User ratings were extracted from MountainProject.com with user IDs anonymized.

Let’s load our data into a Pandas dataframe.

import pandas as pd 

df = pd.read_csv("./openbeta-ratings-nevada.zip", compression="zip")
df.sample(5)

Output:

Popular Climbs

One basic metric we can look at is how climbs are ranked by popularity by counting the number of ratings they received.

# aggregate climbs and count number of ratings
df.groupby(['route_id','name'])['ratings'] .count()
  .reset_index(name="count")

Output:

Sort the result by count and only show the top 20.

popular = df.groupby(['route_id','name'])['ratings'] .count() 
            .reset_index(name="count")
            .sort_values(by=['count'], ascending=False) 
popular.head(20)

Output:

There we have it! A list of popular climbs in Red Rocks. If you have visited the area before, you would definitely recognize those names.

Understand Collaborative Filtering

In Collaborative Filtering we find “like-minded” climbers based on ratings or preferences they have given to climbing routes.

Let’s consider two multi-pitch climbs, The Nightcrawler and The Dragon that received 4-star ratings from climber Tamara. If you have also given high ratings to them, we can say two climbers have high similarity score. For the sake of this example, if Tamara gives another climb 3 or 4 stars, chances are, you will also like the climb and vice versa.

This is the principle idea behind Collaborative filtering! Intuitive and simple, right? In fact, you can calculate this similarity score by hand or with the help of a Python library. The higher the number, the more “like-minded” the two climbers.

Believe it or not, the table above can be represented in Python as a 2-D array (matrix). We’ll use Cosine similarity, a common and simple method to calculate similarity.

# Be sure to install sklearn library with pip or pipenv 
import sklearn.metrics.pairwise as pw 

tamara = [[4,4]] 
you = [[3,4]] 

pw.cosine_similarity(tamara, you) 
# Result = [[0.98994949]]

We can also calculate cosine similarity for more than two routes. Let’s say Tamara disliked Cat in the Hat and gave it 1-star.

import sklearn.metrics.pairwise as pw

tamara = [[4,4,1]] 
you = [[3,4,3]] 

pw.cosine_similarity(tamara, you)
# Result = [[0.94947403]]

Cosine similarity drops from 0.98994949 to 0.94947403.

Now you see, we can extend the above logic to take into account other thousand climbs and user ratings. The math will, of course, get complicated as the size of the matrix increases. Take a deep breath as if you’re 1 meter above the last bolt! We are going to use Surprise, a Python machine learning library, to help us build our recommendation engine.

Cosine similarity is just one of many ways to find similar items. Surprise recommendation library supports:

Cosine similarity

Pearson’s correlation coefficients

Mean Square Difference

Part II - Building Recommendation Engine

1. Load the data file

This step is a repeat of previous section for continuity.

import pandas as pd, numpy as np 
df = pd.read_csv("./openbeta-ratings-nevada.zip", compression="zip") df.sample(5)

Output:

2. Create a prediction model

In this example, we use K-Nearest Neighbors algorithm to build our prediction model. Essentially, the code does cosine similarity calculation like what we’ve just done previously, but it’s working hard on the dataset and also ranking climbs by similarity scores.

from surprise import Dataset 
from surprise import Reader 
from surprise import accuracy 
from surprise import KNNBasic 
from surprise.model_selection import train_test_split 
from surprise.model_selection import KFold 

reader = Reader(rating_scale=(0, 4)) 

data = Dataset.load_from_df(df[['users', 'route_id', 'ratings']], reader) 

sim_options = {'name': 'cosine', 'user_based': True, 'min_support': 4} 

algo = KNNBasic(sim_options=sim_options) 
kf = KFold(n_splits=5) 

for trainset, testset in kf.split(data): 
   # train and test algorithm 
   algo.fit(trainset) 
   predictions = algo.test(testset) 

   # compute and print Root Mean Squared Error
   accuracy.rmse(predictions, verbose=True)

Output:

Computing the cosine similarity matrix... 
Done computing similarity matrix. RMSE: 0.7172 
Computing the cosine similarity matrix... 
Done computing similarity matrix. RMSE: 0.7057 
...

Understanding Train Set and Test Set
You may be wondering about RMSE values (root-mean-square-errors) and why there is a For loop and the need to split the dataset into train set and test set?

Consider our previous rating example. In a perfect world, the recommendation engine can predict with high accuracy if Tamara gives another route a high rating, you will also like that route.

In order to measure prediction accuracy, the algorithm splits the dataset into multiple smaller sets in which it “pretends” it doesn’t know some of ratings climbers have given, and compare actual ratings vs predicted values. RSME is the measure of the deviation.

Train set: a subset of the dataset used to calculate similarity and perform prediction or “train” the prediction model.
Test set: a subset of the dataset where you apply the prediction model from the train set and test prediction accuracy.

Examples:
Test set 1 — Compare 3 (actual) with predicted value.

Test set 2 — Compare 4 (actual) with predicted value.

3. Make Recommendations

It’s time to answer the great question, climbers who liked Epinephrine also liked …

climb_name = "Epinephrine" 
# look up route_id from human-readable name 
route_id = df[df.name==climb_name]['route_id'].iloc[1] 
print("People who climbed '{}' also climbed".format(climb_name)) 

# get similar climbs 
prediction = algo.get_neighbors( trainset.to_inner_iid(route_id), 50) 

print(prediction)

Output:

People who climbed 'Epinephrine' also climbed [263, 506, 238, 75, 8, 511, 1024, 233, 173, 418, 550, 1050, 478, 2, 379, 596, 1491, 221, 730, 261, 30, 410, 109, 313, 264, 148, 659, 68, 223, 1131, 1283, 428, 272, 354, 496, 143, 737, 1152, 835, 17, 356, 368, 545, 89, 23, 74, 281, 480, 509, 278]

Domain Id vs Surprise Internal Id
For efficiency, the prediction algorithm converts our dataset into another data structure (most likely into some sort of matrix), and work with the data by their internal IDs.

Surprise library provides two helper functions to convert one to another.

trainset.to_inner_iid(route_i) — Convert domain-specific Id to internal Id.
trainset.to_raw_iid(id) — Convert internal Id back to domain-specific Id.

Convert the list of recommended climbs to human-readable names:

# convert Surprise internal Id to MP Id 
recs = map( lambda id: trainset.to_raw_iid(id), np.asarray(pred)) 

results = df[df.route_id.isin(recs)] 

r = results.pivot_table( index=['name', 'route_id', 'type', 'grade'], aggfunc=[np.mean, np.median, np.size], values='ratings') 

print(r)

Output:

Off Belay

That’s it. We have just built a simple climb recommendation engine for Red Rock Canyon with the help of Python Surprise lib. You can further fine-tune the engine by suggesting climbs by difficulty and type (trad vs sport). That’s a topic for a future article. Have fun and be safe!

If you like this tutorial and want to see more like this, make sure to give it a ♥ Thanks!

Jupyter notebook for this tutorial is available on Github.

DEV Community: Viet Nguyen

Are You a Connector?

TLDR; “Connectors” are people who genuinely want to help others by connecting you to their network.

The Backstory

How I Became a Remote Developer since 2012

The Connector

Not All Networkings Are Equal

How to Spot a Connector

Mt. Hood, Oregon (photo by adrian)

Building a Rock Climbing Route Recommendation Engine

A Climber's guide to building recommendation engine in Python

Part I - Introduction

About The Dataset

Understand Collaborative Filtering

Part II - Building Recommendation Engine

1. Load the data file

2. Create a prediction model

3. Make Recommendations

Off Belay

The author and friends climbing The Dragon (5.11a)