This interview was done for our Microservices for Startups ebook. Be sure to check it out for practical advice on microservices. Thanks to David for his time and input!
For context, how big is your engineering team? Are you using microservices and can you provide a general overview of how you’re using them?
Right now, the engineering department is formed by 10 people. In our company we also count designers and support engineers in the engineering department, but they write code a little less, in a lower capacity. They help with product design, internal tools and so on.
We deploy three microservices in each node in our CDN network. Then we have what we call the delivery network, which is the combination of API servers and origin servers. We deploy two services in each of of those nodes, plus a background worker that performs asynchronous jobs. It's a service extracted from a bigger application. We also have a build and deploy pipeline. There we basically have one service that handles everything, and it's fairly small. So I would say that's a microservice, but it's just one. All those components are what we call the core service. We also have additional services, like our Identity service that were created from the beginning as microservices. They run independently in a Kubernetes cluster.
Did you start with a monolith and later adopt microservices?
We started with a decoupled architecture. They weren't microservices per se, they were bigger services. For example, the two microservices services we deploy in delivery network between API and Origin servers now were one service before. Same happens in the CDN layer, we started the CDN layer with one service running on the edge nodes and we extracted some of the logic from that bigger service into smaller microservices.
How did you approach the topic of Microservices as a team/engineering organization? Was there discussion on aligning around what a Microservice is?
When we try to decide if something should live in a microservice or inside a service that already exists, the first question that we ask ourselves is if this something that needs to be coupled with a service that we already have or not. If not, then we ask ourselves if it’s something that we can consider a standalone that could actually live by itself if we just needed to use that service. Those are the main questions. We have learned that from experience. For example, two of the services that we have at the edge node are very complementary. They cannot live without each other. We are currently discussing that if they need each other, we should we just have one service instead of having two. Because, at the end of the day, they talk to each other via HTTP, on localhost, but they could also be one service, and we’d avoid a network connection.
Did you change the way your team(s) were organized or operated in response to adopting microservices?
Yeah, definitely. Right now we have some people working full time in what we call platform, and that's the CDN infrastructure, and the build and deploy pipeline. Other engineers work full time on the product’s API and origin servers. Depending on which of both teams you are on, you will work on one microservice or the others. Both teams work a lot together because there is some glue in the middle that actually needs to connect the platform and the product’s API. We can see people from platform implementing things in the API layer and vice versa, engineers that are working on the product and need to implement something in one of the microservices in the edge node or in the deploy pipeline. They also try to jump into those projects in different capacity, depending on what they are more comfortable with.
How much freedom is there on technology choices? Did you all agree on sticking with one stack or is there flexibility to try new? How did you arrive at that decision?
Even though our stack right now has pieces of Ruby, C++ and Go, we are consolidating new microservices in Go. We have a bootstrap template to start new Microservices with Go. However, as I mentioned, we use other programming languages too. The API that started the company uses Ruby on Rails. It was like the monolithic proof of concept for Netlify. We use Apache Traffic Server in the CDN network as proxy cache. It’s a big C++ service that can be extended with plugins. We built a custom plugin in C++ to handle routing and cache invalidation.
Have you broken a monolithic application into smaller microservices? If so, can you take us through that process? How did you approach the task? What were some unforeseen issues and lessons learned?
One of the core principles that we have when we create new microservices, or when we have to change some functionality, is that no matter what we do, it cannot impact the current production traffic, and we cannot have any downtime. The way that we usually do this is by using feature flags, circuit breakers and abstracting logic by branches. Every component knows about several features that we can enable and disable at any time. We can do that at runtime, so we don't have to restart any service if we want to change traffic patterns in any layer of our infrastructure. We also use some libraries that allow us to detect failures early when we do these feature flag splits. Depending on the expected behavior, those libraries can retry the same request to the new service, or fall back to the original service.
How do you determine service boundaries? What was that discussion like within your team? Can you give some examples?
We try to make sure that what we extract into a microservice is a standalone feature or service. Something that can live by itself and has only one purpose. For instance, in the origin servers, we have a service we call Netlify Server. It has only one purpose, which is serving web content, it doesn't do anything else. In the edge nodes, we have a service called cache control. Its only purpose is to invalidate caches. It receives messages from the network to invalidate website caches. It doesn't do anything else.
We discuss at length different implementations for the cache controller. It makes sense as a microservice, but it could also be embedded into any of the other services that we deploy at the edge. The boundaries between this service and the other two are well defined from the feature point of view, but they work together. By keeping it separated, we can also evaluate different implementations without modifying any of the other two services.
How have microservices impacted your development process? Your ops and deployment processes? What were some challenges that came up and how did you solve them? Can you give some examples?
We always try to work on those services in isolation when we develop them. Unless you need to test boundaries between services, you don’t need to have the full platform running on your computer. Internally, every single service has different interfaces to talk with different services. We always provide a mock implementation for those interfaces that ignores network requests, or replies to certain requests with specific responses.
Something that we have found very useful is the ability to record requests and responses from real network traffic. We do this, for example, to talk with external services that we use, like NS1 or Braintree. We use an open source project called VCR for this purpose; there are implementations in different programming languages. This allows us to record request and response messages from two services connected together like they were in production and reply them for development and testing.
We’re currently centralizing all deployments in different Kubernetes clusters. We still have a handful of services that we deploy with Ansible, but we’re migrating everything to a more CI/CD friendly workflow. The services that we deploy on Kubernetes use a Blue/Green deployment system with container images. This also give us more visibility inside the deployment pipeline. We send messages to a deploy channel in Slack, where people can see when a deploy happens.
How have Microservices impacted the way you approach testing? What are lessons learned or advice around this? Can you give some examples?
As I mentioned before, we use VCR to record real interactions between services that we can replay during testing. We record good and bad service interactions, and update those recordings periodically to guarantee nothing changes in the external services we use.
We also test many things in production directly. Our services vary depending on traffic patterns, and sometimes we cannot observe behaviors outside production. We use feature flags, logs and metrics to track what we have to test with isolated traffic patterns or websites.
How have Microservices impacted security and controlling access to data?
Have you run into issues managing data consistency in a microservice architecture?
No. It's not really an issue. Right now, only our API has write access to our main database. If a service needs to write something to that database, it needs to send requests to our API. Some microservices have their own persistent data store, and their use is very localized to the service.
Thanks again to David for his time and input! This interview was done for our Microservices for Startups ebook. Be sure to check it out for practical advice on microservices.