Joseph Lim

Posted on Jun 21, 2023 • Originally published at nubela.co

The VC's Guide to Uncovering Stealth Startups

#stealthstartup #venturecapita #datascience #webscraping

How can you identify a likely founder so that you can be the earliest to invest in their success?

In this article, we'll answer this question and show you several specific queries you can plug into the Proxycurl Person Search Endpoint to search for potential founders. Our intent is to demonstrate how you can add sourcing to your stack, rather than needing to have analysts do this manually.

General method

We will make the assumption that founders tend to have strong demographic trends in certain categories, such as:

Attended certain universities.
Worked at certain companies in the past.
Held certain past job titles (such as were previously founders).

You can leverage these trends along with a powerful Person Search API like Proxycurl's to make bulk search queries that return rich lists of potential founders whose careers you can then monitor. If such a person moves to start a new company - possibly a stealth startup - you'll be aware of it.

Searching for employees of "Stealth Startup"

A somewhat naive approach to the problem is to search for employees of "Stealth Startup." Fortunately, the naive approach is valid because LinkedIn (both helpfully and somewhat amusingly) recognizes "Stealth Startup" as a legitimate company. It has job listings and everything.

There are a couple of specific variations on this name that have a lot of employees:

And several others. We're going to run a Person Search Endpoint query with a regex on these, and we want to pick up a lot of results, but we don't want to grab too many false positives; for example Stealth Management Group LLC is not a stealth startup.

For our regex, let's look for either an exact match of Stealth (^Stealth$) or require that both Stealth and Startup be in the name, possibly with some string in between them. We can accomplish that with the following complete regex:

    'current_company_name': '(^Stealth$)|(.*Stealth.*Startup.*)',

We'll leave case sensitivity on.

It turns out that there are too many LinkedIn employees of stealth startups for this to be a restrictive enough condition to be searched on its own. That's exciting because it means we're working with a large sample size. So let's move on to the next part before we run any code.

Segment potential founders based on their background

We can accomplish this task in a few ways, and depending on how selective you want to be in your search, you can use these methods all at once or one at a time. Additionally, you can be more or less selective within each method; for example, when searching for graduates of particular universities, you may decide to look at only a single Ivy League school, whereas here, we'll include them all.

We'll include one constant in every search here: Duration of tenure at the stealth startup. We're primarily interested in the first couple years of a stealth startup's existence, so we'll set current_role_after to 2021-04-01.

Filter by education background

The first method we'll try is filtering for potential founders by their education background. You may have a school or set of schools in mind, but we'll take the set of Ivy League universities as an example. The regex for this is pretty straightforward, if a bit long:

    'education_school_name': '^(Brown University|Columbia University|Cornell University|Dartmouth College|Harvard University|University of Pennsylvania|Princeton University|Yale University)$',

Again, we want case sensitivity here, so we'll omit the optional (?i) flag. And now, with two parameters specified, we are restricting our query enough that we can run some code! Here is an example working Python script. (As usual, with queries that return people, we won't print any results, but you can run it yourself if you like.)

import json, os, requests
api_key = os.environ['PROXYCURL_API_KEY']
headers = {'Authorization': 'Bearer ' + api_key}

api_endpoint = 'https://nubela.co/proxycurl/api/search/person'
params = {
    'current_company_name': '(^Stealth$)|(.*Stealth.*Startup.*)',
    'current_role_after': '2021-04-01',
    'education_school_name': '^(Brown University|Columbia University|Cornell University|Dartmouth College|Harvard University|University of Pennsylvania|Princeton University|Yale University)$',
}
response = requests.get(api_endpoint, params=params, headers=headers)
print(json.dumps(response.json()))

Filter by past experience

After their education, people go to work, and you can use people's work histories to identify interesting people to keep track of. Historically, the acronym in tech has been "FAANG," and you might have your own particular set of companies you're interested in (for example, perhaps you want to include Microsoft). Still, we'll stick with the classic version for this demo.

Now, there are two ways to approach this given that we have a list of companies to OR together:

Use the past_company_linkedin_profile_url parameter.
- Pros: This field guarantees an exact match.
- Cons: It does not accept a regex value, so we can only search for one company at a time. Therefore, we have to perform multiple searches and combine the results.
Use the past_company_name field.
- Pros: This field accepts a regex value, so we can search for multiple companies at a time, using ^ and $ with case sensitivity to ensure the name exact matches.
- Cons: While we can ensure that the name exact matches, we might pick up some false positives; for example, there are a couple other companies named Apple besides the FAANG Apple that we actually mean.

Which method should we choose? It depends on whether there's a reason not to use method 1. Here's why you might want to avoid method 1:

Multiple cases of needing to run multiple queries due to ORs with different fields is a reason to avoid method 1.
Having lots of parameters in your query, so that it's more likely that a case of the "wrong Apple" showing up is more likely user input error than a wrong data point is a reason to avoid method 1.
Running a very small sample size, so that the cost to run any query at all is high, is a reason to avoid method 1.

Given this analysis, I think option 2 is preferable for this particular use case due to the second bullet point. However, this is not a be-all-end-all decision. Sometimes, you will certainly prefer option 1, and you should always consider which option is better for any decision point. When possible, make a query that returns a hundred or so results and step through them manually before committing.

With that decided, we can run the following code:

import json, os, requests
api_key = os.environ['PROXYCURL_API_KEY']
headers = {'Authorization': 'Bearer ' + api_key}

api_endpoint = 'https://nubela.co/proxycurl/api/search/person'
params = {
    'current_company_name': '(^Stealth$)|(.*Stealth.*Startup.*)',
    'current_role_after': '2021-04-01',
    'past_company_name': '^(Facebook|Apple|Amazon|Netflix|Google)$'
}
response = requests.get(api_endpoint, params=params, headers=headers)
print(json.dumps(response.json()))

Notice we're now discarding the segment from the previous section, but if we wanted to we could combine them and specify both an education_school_name and a past_company_name at the same time.

Filter by past role title

Finally, let's look for "serial entrepreneurs." We haven't been requiring that people list themselves as founders of their current stealth startup, just that they work at one. The idea is that anyone currently working at a stealth startup is someone who you might want to keep an eye on - either for their current company or their next one. But we can select for people likely to go on to be a founder based on whether they've been one in the past.

Here's the query, this time using the (?i) flag in case someone was a "Co-founder":

import json, os, requests
api_key = os.environ['PROXYCURL_API_KEY']
headers = {'Authorization': 'Bearer ' + api_key}

api_endpoint = 'https://nubela.co/proxycurl/api/search/person'
params = {
    'current_company_name': '(^Stealth$)|(.*Stealth.*Startup.*)',
    'current_role_after': '2021-04-01',
    'past_role_title': '(?i)founder'
}
response = requests.get(api_endpoint, params=params, headers=headers)
print(json.dumps(response.json()))

Finding founders who don't list "Stealth Startup"

That was all great if the founder actually listed their current company as a recognizable regex matching .*Stealth.*Startup.* somehow, but what if they're calling it something else entirely? For example, they could have written "Working on something cool" or "Can't wait to tell you guys."

In the last section, we've profiled a few segments of likely founders (at least, this is the hypothesis). We can combine these - and, critically, remove the ones we've already looked at by excluding the "Stealth Startup" regex we already looked at - to find candidates likely to be working at "interesting" companies at some point in their careers. The goal here is to curate a short list of people who are:

Possibly working at an interesting company right now; or
Potential founders of the next wave of startups.

There is one limitation here. Proxycurl's Search API was released pretty recently as a version 1.0 product. Therefore, one field we might want to add to our search doesn't exist yet: current_company_employee_count_max. If you're reading this post a while after its publication date, you may be able to disregard this caveat - check our docs for the Person Search Endpoint to see if this parameter exists. It's on our to-do list. But right now it's unavailable, and we'll have to make do without it.

If we combine all the restrictions we looked at before, we only get nine results total. So we'll have to relax the conditions a bit. We'll only look at either graduates of an Ivy League school or alumni of a FAANG company (rather than both simultaneously).

Because the code snippets are very similar, and the condition for the second was provided above, I'll only demonstrate the first here, and the second is left as an exercise to the reader.

import json, os, requests
api_key = os.environ['PROXYCURL_API_KEY']
headers = {'Authorization': 'Bearer ' + api_key}

api_endpoint = 'https://nubela.co/proxycurl/api/search/person'
params = {
    'current_company_name': '^(?!(^Stealth$)|(.*Stealth.*Startup.*))',
    'current_role_after': '2021-04-01',
    'education_school_name': '^(Brown University|Columbia University|Cornell University|Dartmouth College|Harvard University|University of Pennsylvania|Princeton University|Yale University)$',
    'past_role_title': '(?i)founder'
}
response = requests.get(api_endpoint, params=params, headers=headers)
print(json.dumps(response.json()))

What next?

Now you have some candidates. That's a great first step, but if you source candidates and do nothing with this data, you still have nothing. So you will need to continue to monitor your list of candidates and eventually reach out to them.

Monitoring

You can monitor candidates using the Person Profile Endpoint. If their company name changes from Stealth Startup to MyCompany BioLabs then you can reach out. Be sure to include the parameter use_cache=if-recent so that you're always getting the most recent data, and you can get updated data once a month. Here's how a sample query might look with just a little bit of data processing:

import os, requests

api_key = os.environ['PROXYCURL_API_KEY']
headers = {'Authorization': 'Bearer ' + api_key}

api_endpoint = 'https://nubela.co/proxycurl/api/v2/linkedin'
params = {
    'url': 'https://www.linkedin.com/in/johnrmarty/',
    'use_cache': 'if-recent',
}
response = requests.get(api_endpoint, params=params, headers=headers)
result = response.json()
current_companies = [exp['company'] for exp in result['experiences'] if exp['ends_at'] is None]
print(current_companies)

And the result:

['Freedom Fund Real Estate', 'Mindset Reset Podcast', 'Project 1B', 'YouTube', 'YouTube']

Here's what we did:

Query the Person Profile Endpoint.
Extract the JSON result.
Zero in on the "experiences" section.
Select only the current experiences.
Select the company name for each of those current experiences.
Print out a list of them.

If you were doing this as part of a pipeline, you'd have past data to compare to, and you'd probably want to keep track of some additional factors as well like date joined and maybe description, to check if this is likely a name change (e.g. from "Stealth Startup") or a brand-new venture. You'd also want to write this to a file or database rather than printing it.

But as far as a demo goes, not so bad!

Contacting

The final step in your pipeline is going to be contacting the prospect. We have another endpoint dedicated to this purpose, the Personal Email Lookup Endpoint. Here's how you might use it to finally reach out to a lead you've been following:

import json, os, requests
api_key = os.environ['PROXYCURL_API_KEY']
headers = {'Authorization': 'Bearer ' + api_key}

api_endpoint = 'https://nubela.co/proxycurl/api/contact-api/personal-email'
params = {
    'linkedin_profile_url': 'https://sg.linkedin.com/in/williamhgates',
    'email_validation': 'include'
}
response = requests.get(api_endpoint, params=params, headers=headers)
print(json.dumps(response.json()))

Note that for the email_validation parameter we used include. This parameter requires the endpoint to perform email validation. If exclude is used instead (or the parameter is omitted), there will be no email validation.

The endpoint will return personal email addresses of your prospects, and your output will look like this:

{
    "emails": [
        "personal.email1@gmail.com",
        "personal.email2@yahoo.com"
    ],
    "invalid_emails": [
        "personal.email3.doesnotexist@gmail.com"
    ]
}

In case you're wondering why we aren't using the Work Email Lookup Endpoint the answer is: Usually we would! But in this case we're looking at brand-new companies and/or stealth startups. These companies are unlikely to have domains added, much less work emails available. You're better off contacting via personal email in this particular situation.

TL;DR

This article is long enough that it deserves a TL;DR section.

If you're a VC or investment company, you may currently have many manual steps in your pipeline for identifying potential founders and contacting them.

What if you could fully automate this pipeline? With Proxycurl APIs, you can! The Person Search Endpoint is the core of the method. You will use this endpoint to search for current employees of stealth startups, filtered based on various criteria - we suggest some with code samples, but you may have your own that you'd prefer to use.

After that, the rest of the pipeline can be accomplished using the Person Profile Endpoint and the Personal Email Lookup Endpoint. Code samples for these are provided as well.

Conclusion

How else might we use Search or even the other Proxycurl API endpoints to find interesting founders of startups? We have a few tricks left up our sleeves - but maybe we've also inspired you along the way! If you want to be featured in one of our blog posts, reach out at hello@nubela.co. Or if you'd rather keep your discovery to yourself, register an account and make a credit top-up so you can get started right away getting a leg up over the competition.

DEV Community