DEV Community

Cover image for 6 insights from a System Design expert: How Twilio builds for scale
Hunter Johnson for Educative

Posted on • Edited on • Originally published at educative.io

6 insights from a System Design expert: How Twilio builds for scale

The importance of System Design in today's tech landscape cannot be overstated. Companies need software systems that are resilient, reliable, performant, and scalable to serve customers and achieve business goals. Because of this, System Design has become essential to the software development process, and if you're a software engineer, you should learn System Design. Whether you're new to System Design or have worked on distributed systems and want to master System Design interview questions, it's crucial to learn the fundamentals to advance your career and get an edge in the job market.

To get a tech leader's expert view on System Design and its place in the modern landscape, Educative hosted an October 5 webinar with Mark Gilbert, former VP of Product Management at Twilio and the current co-founder and CEO of Zocks Communications, a venture-backed startup. In a conversation with Steve Yi, Educative’s VP of Growth & Marketing, Mark shared his insights on building for scale and designing performant systems, drawing from over 20 years of industry experience.

Mark Gilbert

Today, we'll revisit the discussion with six critical takeaways about modern System Design.

We'll cover:

A brief introduction to Twilio and its systems

Twilio is a Communications Platform as a Service (CPaaS) company that makes it easy for businesses to communicate with customers and anyone else over text messages, emails, and calls. The company is well known for providing programmable communications tools on communications channels using web service APIs.

During his tenure, Mark oversaw the Twilio Super Network and phone numbers onboarding, among other things. With the Super Network, Twilio combined the communication networks of multiple global carriers to support channels like voice, text messaging, email, and video reliably. Getting customers onto the Super Network involved onboarding numerous phone numbers and working on international infrastructure.

Regarding Twilio’s tech stack, Mark said different teams would work on different system parts. Before Mark joined, teams often gravitated toward their preferred programming languages, but then Twilio started pushing to get most of the systems on Java, valued as a type-safe language. For data storage, Twilio used multiple solutions.

When it started, Twilio had to build a lot of its infrastructure, including things that are now available off-the-shelf from cloud providers like Microsoft, Amazon, and Google. Now, the scale of the systems at Twilio is quite large. According to Mark, on a big day, Twilio sends single-digit billions of emails and hundreds of millions of text messages. Traffic is spiky, and the distribution of all those emails and messages is not even throughout the day. Because of this and other challenges particular to Twilio's technology, scaling was a continuous challenge and top concern for everyone at Twilio as it built out its systems. To meet that challenge, Twilio paid a lot of attention to System Design principles and carefully constructed its architectures.

SDI Prep

6 takeaways about System Design and architecting for scale

During the webinar, Mark spoke candidly about his experience at Twilio and his observations about working on large systems. Below, we'll look at six top takeaways from what he shared.

1. Add capabilities to existing architectures when possible

Leaders and engineers at Twilio debated daily about how to improve systems. All that discussion was for a good reason: Twilio's customer growth was large, and making changes with all those customers on the system was a major challenge. It became essential to figure out the safest and most expedient ways to improve.

To determine system requirements, the teams went back to basics, looking at expected traffic scenarios, doing the math, and penciling out "what's going to get there, what's not going to get there," Mark said. To "get there," the real options involved adding role instances (nodes/servers) where they could. Sometimes it worked. Sometimes it worked, but inefficiently, and they had to look at why.

Architectures

2. Sometimes adding servers isn't the solution

Certain system challenges at Twilio defied the solution of adding extra servers. For example, Twilio works with thousands of mobile operators and providers worldwide. A lot of these providers don't have modern systems that scale. This would lead to problems with downstream capacity, and those providers' systems typically ran into problems when they got more messages than they could handle. Some providers might drop the messages altogether, while some would let them go later. And they didn't return clean errors.

"There's all these different kinds of weird behaviors that will happen," Mark said. In these situations, adding servers didn't solve the problem.

In addition, some of Twilio's systems created a lot of communication between servers. Ultimately, capacity issues would occur. The amount of chatter might become so high that adding role instances wouldn't help scale the system. Then, the answer involved having programmers look at that communication, the services creating it, and partitioning of the servers. To change those things while they were constantly operating was a fairly heavy lift.

When adding servers wouldn't do the trick, the next step was to look at rearchitecting pieces of the system for scale. This became a high priority and a time-intensive effort. Fortunately, Twilio's product was well-received in the marketplace, leading to a solid internal agreement among stakeholders that keeping the system working well and scaling was critical.

Servers

3. Optimize the number and size of your microservices

Twilio used a microservices architecture, which is an architectural style that structures an application using loosely coupled services. As with many large-scale cloud-based services that started in the mid-2000s, Twilio used microservices, but not as much as they would have liked.

Start-ups must experiment and revise designs a lot initially. At this stage, Twilio had to weigh the potential benefits of a microservices architecture against the costs of changing it later.

The trick was finding a sweet spot: not too many microservices, but not too few. If he were talking to someone who wanted to split an app's architecture into microservices to the utmost, Mark said, he'd try to talk them out of it. But if someone had everything sitting on one service, "you probably need to rethink that."

There were two main considerations when determining whether a system's microservices were getting too small or numerous.
Diagnostics and debuggability: You need to plan for investigating errors and failures.
Performance: If you don't tightly couple your service boundaries, you will start seeing performance issues.

Prematurely trying to find the optimal number of microservices is challenging. You need to look at requirements for the front-end (responsiveness) and back-end (reliability) of your system and try to anticipate what services may require splitting up. Then, as you get traffic on your app, you can see which microservices will likely have issues with more confidence.

4. Plan around services' expected lifespans

To scale from prototype to production, you've got to pencil out how long you expect your components to last. To do so:

  1. Break down your app's architecture into services: What are your pieces? How will they communicate with each other? What are the resulting requirements for these services?

  2. Figure out which services you plan to keep and those you expect to replace: Invest time into understanding how you'll scale services you plan to use consistently. Put appropriate service boundaries around them. Don't waste as much design time on pieces you're less confident in.

Mark and his team addressed both areas at Twilio for a large communication service. They identified that the underlying components of the service, text messages and email, were going to be consistent, so they spent more design effort to make sure they could scale. At the same time, they anticipated having to rewrite the user interface multiple times, so they put less architectural thinking into it.

The challenge can be discerning which services are which because expectations often go awry.

System lifetime

5. Companies will increasingly build more flexible systems

Toward the end of the webinar, Mark spoke about the future of System Design. He anticipated that amid the growing dependence on the cloud and software as a service (SaaS), tech companies would increasingly build systems out of small services that they could deploy dynamically.

To give some context, Mark spoke about the period of 2008-2010. He said many start-ups would say they were using microservices, but in reality, they were also leaning heavily on cloud infrastructure, like AWS and Microsoft Azure. The cloud was fast to build on and fast to get to a reasonable scale on, even if companies were also using microservices internally. As a result, they would end up with what Mark called the "start-up ball of wax": systems they would continually have to claw pieces out of to scale differently.

Now companies are paying increasing attention to scaling, deploying into different regions worldwide, and breaking up services to run on different providers. They're creating more thoughtful microservices and being careful about taking on dependencies. Partly, this has resulted from shifts outside the tech industry, such as increased attention to data privacy, geolocation of data, and residency.

Mark suggested companies would increasingly use a model that combined cloud provider services and services they took into their infrastructure and deployed elsewhere. This approach, he said, scales better and provides companies with more control.

System Design Interview

6. In technical interviews, give real examples and explain your thinking

While Mark's comments focused on design principles and processes, the webinar also elicited observations about interviewing at Twilio.

During Mark's tenure, interviews included questions assessing how well candidates understood system design fundamentals and writing code, like in many other engineering organizations. But interviewers also drilled into candidates' experience working on systems, their reasoning behind decisions, and what they'd learned from their choices. Interviewers would keep questioning candidates to determine whether they truly did the work, how well they understood scale limitations, and what they would do differently in the future.

"You can pretty soon tell whether someone understands the higher-level abstraction and whether the underlying 'how this works' is somewhat understood." - Mark Gilbert, former VP of Product Management at Twilio

For more experienced candidates, interviewers would look for designing and redesigning for scale in their backgrounds. Improving scaling on existing services is a measurably harder problem that engineers might face as they gained experience. The interviewers might then present a similar problem at Twilio for the candidates to solve, looking for how they anticipated and solved bottlenecks.

"The black-belt ninja moves are taking large systems in flight and trying to scale those more." - Mark Gilbert

For candidates with fewer years of experience, questions relating to Twilio's values might be more critical than the seasoned System Design know-how. For example:

  • Are you adding positive value?
  • Are you curious?
  • Are you learning?

Even candidates who were fresh out of school or self-taught would get questions about what they had built and how it had worked. Regardless of their experience, then, candidates who could provide concrete examples and explain their thinking stood to perform better.

The lessons Mark shared about interviews at Twilio apply more generally, too. For instance:

  • Show up to interviews prepared to discuss not only the projects you've worked on but also your thought process and lessons learned from those experiences.
  • So that you can answer questions, any project you bring up should be something you played a significant role in and understand thoroughly.
  • For the System Design Interview, be prepared to devise solutions to hypothetical problems.
  • Ensure you've studied the company's values beforehand and be ready to demonstrate your alignment with them.

Take the next step to master System Design

From practical lessons about building for scale to interview advice, our System Design webinar with Mark Gilbert covered a lot of ground. But our six takeaways from Mark all come back to a central theme: thinking strategically about System Design has become vital in the modern tech landscape, from small start-ups to FAANG companies.

If you want to dive deeper into this essential area of software engineering, we've created the course Grokking Modern System Design Interview for Engineers & Managers. Its modern perspective on building component-based systems in a microservices architecture will prepare you for on-the-job problem-solving and coding interviews. You’ll review modules about the building blocks of System Design and the RESHADED approach to any System Design problem with this interactive course. Then, you'll apply your new knowledge to solve real-world design problems.

To reserve a spot to tune into future conversations with industry experts, visit Educative's webinar homepage.

Happy learning!

Continue learning about System Design on Educative

Start a discussion

What topic do you want to see discussed in an upcoming webinar? Was this article helpful? Let us know in the comments below!

Top comments (0)