annielcook for Nylas

Posted on Feb 15, 2018 • Originally published at nylas.com on Feb 14, 2018

Lessons Learned Syncing 800 Million Contacts To Our Database

#api #learning #email #scale

The Nylas APIs allow developers to build email, calendar, and contacts functionality into their applications. Our goal is to support all three of these points fully, but until recently, our Contacts API didn’t have the full functionality that we wanted. Nylas is my first full-time engineering job, and shortly after joining, this need to upgrade the Contacts API surfaced again. It was a big task, but one that I was excited to take point on. In this blogpost, I’ll walk through the problem, the solution we came up with, and a few of the things that I learned as a result.

The Problem

Our Contacts API gave customers (mostly B2B software companies) read-only functionality which allowed them to get a list of contacts with three fields: name, email and phone numbers. We wanted to give them access to more data so they could empower their users to make better connections with peers, colleagues, candidates, and more. To do this, they needed to be able to read, sync, update, and auto-complete data including contact addresses, multiple email addresses, birthdays, job titles and more.

Changing our product to sync and store huge amounts of new data without affecting current customers was a big challenge to tackle for the team in addition to all of the other features we’re building, but one worth taking on for the sake of our customers.

The Solution

The first component of the solution was to expand our contact model to store more fields. Below is a comparison of the old contact model, Contact v1.0, and the new model, Contact v2.0. Both of these have the system assigned fields of id, account_id and object. Apart from those, we significantly increased the number and granularity of the contact fields for our new and improved contact model.

Contact v1.0:

[
    {
        "account_id": "x2x2x2x2x2x2x2x2x2x2x2",
        "email": "john@doe.com",
        "id": "z3z3z3z3z3z3z3z3z3z3z3",
        "name": "John Doe",
        "object": "contact",
        "phone_numbers": [
            {
                "number": "1 800 123 4567",
                "type": "mobile"
            }
        ]
    }
]

Contact v2.0:

[
    {
        "account_id": "x2x2x2x2x2x2x2x2x2x2x2",
        "birthday": "1960-12-31",
        "company_name": "Nylas",
        "emails": [
            {
                "email": "john@doe.com",
                "type": "work"
            }
        ],
        "given_name": "John",
        "id": "z3z3z3z3z3z3z3z3z3z3z3",
        "im_addresses": [
            {
                "im_address": "myaimaddress",
                "type": "aim"
            }
        ],
        "job_title": "Software Engineer",
        "manager_name": "Bill the manager",
        "middle_name": "Jacob",
        "nickname": "JD",
        "notes": "Loves ramen",
        "object": "contact",
        "office_location": "123 North Pole Dr",
        "phone_numbers": [
            {
                "number": "1 800 123 4567",
                "type": "mobile"
            }
        ],
        "physical_addresses": [
            {
                "format": "structured",
                "type": "home",
                "street_address": "200 Santa Clause Ln",
                "city": "North Pole",
                "postal_code": "123123"
                "state": "CA",
                "country": "USA"
            }
        ],
        "picture_url": "https://api.nylas.com/contacts/427abc427abc427abc/picture",,
        "suffix": "Jr.",
        "surname": "Doe",
        "web_pages": [
            {
                "url": "johndoeblog.com",
                "type": "blog"
            }
        ]
    }
]

We also expanded the API endpoints and underlying functionality for contacts. With Contacts v2.0, customers have the ability to create, update, and delete contacts from their own applications.

Here are the new endpoints we’ve included in Contacts v2.0:

This new functionality can be used in many different ways, each of which streamline the problems that our customers are facing on a daily basis. For instance, automatically creating a new contact record in the Contacts API reduces the need for sales, marketing and recruiting teams to manually create new contacts in their CRM, since that data syncs seamlessly in the background from their inbox to their CRM.

We also added the ability to sort contacts by a variety of parameters. For example, by postal code: I want to know all the sales reps working in zip code 94105, by phone number: Who called me?, by email: Show me the contact for this email, etc…

Check out the Contacts v2.0 docs for more information on the new functionality and endpoints that are available.

My Takeaways

Put your product to the test.

Writing good tests is important, but so is really using your product. When I thought I was finished with my Contacts v2.0 work, I started updating our Nylas NodeJS SDK to support the new functionality. This SDK is a NodeJS wrapper for our API which makes integrating the Nylas API into Javascript applications much easier. For the SDK, I had to add functions that enabled it to access the new endpoints, write tests for these functions, and expand the example applications in the SDK to include Contacts v2.0 functionality.

The process of updating the SDK meant that I had to use all of the new features and functionality that I had just added to the API for Contacts v2.0. This opened my eyes to previously unknown bugs. For example, I was trying to populate the contact model for the NodeJS SDK with contact data from our API and everything except the middle_name field was working. It turns out I had forgotten to encode and return the middle_name field for contact GET requests.

Actually using the product also showed me processes that while not broken, had a poor user experience. For example, our API used to require that the JSON for a contact’s birthday be in the form "birthday": { "date": "1995-01-13", "object": "date"}. Wrapping the date into this object was unnecessary and just added more work to represent the contact’s birthday as a Date in the SDKs. I simplified the birthday field to be "birthday": "1995-01-13".

Having to develop on top of the Contacts v2.0 API revealed bugs and put me into the customer’s perspective to see design flaws that I would not have noticed otherwise.

You don’t have to, and often shouldn’t, ship big projects all at once.

Contacts v2.0 was a major fullstack project that affected a significant portion of our code base.

We had to expand the previous contacts table in the database and create many related tables to support the huge volumes of new data we would now be storing.
We had to build out two-way data sync between the providers and our system.
We had to build new API endpoints and underlying functionality to support the new create, update, and delete features.

Because of this, we knew we would need to test extensively on production. The only way to encounter the wide variety of edge cases was to sync real customer data on production, which meant we had to deploy possibly faulty code to prod without affecting our existing customers. In order to accomplish this, we shipped code under a feature flag. This meant that we could deploy code periodically and in manageable pieces without exposing any changes to our customers. Once the project was near completion, we held an open beta to get a few interested customers to try out and test the new features. Before this official release, the code has been running on production for over 4 months and a handful of customers have tested it and given us feedback.

Shipping code under a feature flag throughout development and letting customers beta test gave me confidence in the code before I officially release it.

Tradeoffs of API versioning

The way we handled this feature flag was by versioning our API and keeping the new version we were developing private. The decision to version the API, thus enabling us to make breaking changes, was challenging and interesting.

On one hand, API versioning is intimidating to customers. This is especially true the first time customers have to update versions. They might be wary of the new version and uncertain whether updating is worth the additional engineering effort that comes with updating versions.

On the other hand, versioning our API lets us evolve it to better meet customer’s needs while keeping the product polished and clean. Getting customers used to updating versions of the API will allow our engineering focus to be exclusively on building out new functionality rather than supporting legacy functionality. Versioning also gives our customers an easy upgrade path and puts them in control of updating their applications.

The pros and cons of API versioning differ based on your product and what benefits the new version will bring to customers. Reasoning through this decision forced me to think about API versioning from a few different perspectives:

What would be best for the future of our product?
What would be best for customer experience?
What would be the cleanest and most sustainable engineering solution?

This experience broadened my understanding of how to make design decisions.

This was a big project to tackle during my first six months at Nylas, but I’m really excited about how it turned out. The fact that I was able to work on something that directly impacted our customers allowed me to learn a lot about their needs along the way. As I’ll share in a future post, this experience has helped me grow as an engineer and learn more about myself along the way. Check out Contacts v2.0 and let us know what you think!

This blogpost was originally posted on the Nylas Engineering Blog

Top comments (2)

Frank Carr • Feb 15 '18

Interesting to read about the approach you took to this problem. I was expecting a more database oriented article but you didn't mention SQL or the database backend you're using.

I know how I would handle the database side in Oracle (new schema) and SQL Server (new database) with a new set of stored procedures and views. That would leave the old DB access methods the same while allowing a more or less "green field" to develop the expanded services.

PNS11 • Feb 15 '18

Thanks for sharing, an interesting read.

In picolisp it would be expressed somewhat differently, perhaps similar to this:

(class +Contact +Entity)
(rel account_id (+Ref +Number))
(rel birthday (+Ref +Date))
(rel company (+Sn +IdxFold +String))
(rel emails (+List +Joint) contacts (+Emails))

(class +Emails +Entity)
(rel email (+String))
(rel type (+String))
(rel contacts (+List +Joint) emails (+Contact))

Usually an abstract class would be used instead of +Entity, like +contact and then have +Customer, +Supplier, +CustomerSupport, +ExecutiveOfficer and so on inherit from this more general abstraction. Switching class at runtime is trivial in picolisp, and they could also have abstract parents like +contactavailabletoall, +contactavailableonlytosome or something else entirely.

Typically 'email and 'type in '+Emails would have some index prefix class too, to make them independently indexed and available to search algorithms, not just reachable through the +Contact objects.