This interview was done for our Microservices for Startups ebook. Be sure to check it out for practical advice on microservices. Thanks to Stephen for his time and input!
Stephen Blum is a founder and the CTO of PubNub. PubNub is a secure global Data Stream Network (DSN) and easy-to-use API that enables its customers to connect, scale, and manage realtime applications and IoT devices.
For context, how big is your engineering team?
We have 80 employees here at PubNub. Most of them are software engineers. We've got marketing, HR, Finance and sales. Primarily we're providing connectivity for over 350 million mobile devices, which is greater than the population of the United States with over 2 trillion transactions going over our network every month.
Are you using Microservices and can you give a general overview of how you’re using them?
When PubNub started, microservices didn't exist. It wasn't a thing. This is back in 2012. So what is that? Six years ago.
So enter microservices and our deployment architecture. At our scale we need to be able to -- we are currently routing 22 million transactions per minute -- which if you throw that on a little NodeJS server, it would instantly destroy itself and OOM. To handle that load at scale,we need a way for the process to automatically scale up and down. If we just had a whole bunch of running processes just ready at stand by to handle that traffic, we'd be paying for idle capacity just for memory. It'd just be so much memory. It ended up being like 6 billion dollars per year in EC2 costs. Impossible. This is where microservices really helps you out. Automatically have it spin up or spin down.
Did you start with a monolith and later adopt Microservices?
Commonly you start with a massive codebase, and you deploy the architecture across servers. You must do this using parallel distributed computing spates practices for any sort of scale multi-tenant infrastructure. Then your codebase keeps growing and you start needing to have more and more teams because you can't just have one big 20 person team successfully. This approach has historically been unsuccessful and does not function well. Things collide. You need to have the industry standard approach with sprint teams around the size of five. When you have too many people working all in the same code repo you will see challenges in deployment and people have to step over each other. So we really have to split the infrastructure into separate components. We did that early on in PubNub. Our borders are available with a REST API. So every API within PubNub is very RESTful. You're able to interact with different components and within PubNub's network. The network itself interacts through those REST endpoints. It was great that we have built this way. This approach allowed really easy for migration of our services straight into Kubernetes.
If so, what was the motivation to adopt Microservices? How did you evaluate the tradeoffs?
We haven't fully adopted Kubernetes yet. We only have I would say 30 percent of our infrastructure in Kubernetes right now and we are making headway to rush toward...here's another benefit that we saw: cost savings. A lot of our compute is non-homogeneous. Most of our services are network heavy or some are CPU bound. Wouldn't it be nice if you could just sort of shove all that stuff into one box and have things load balance a little bit better, you could save potentially millions of dollars per year. Why would you not do that? And so that's another motivation for us is the savings and opportunity.
How did you approach the topic of Microservices as a team/engineering organization? Was there discussion on aligning around what a Microservice is?
We started with a small group, the set of architects here at PubNub. We discussed pros and cons. Then we took it to a bigger team, then opened it up to the floor. We really wanted to highlight why wouldn't we want to go to something like Kubernetes? Everyone. Even when you're developing and you love new technology, you have a little bit of aversion. You're like, "Eh, maybe, eh." So you're trying to find problems with the migration. Very basic stuff, all logistical really. How do we have one running here and another running here? How do we do the migration? How do we make sure there's zero downtime because PubNub actually makes phones ring and if the phone doesn't ring our customers lose revenue. How do we deal with that? Really, at the end of the day all the arguments of the cons, they're achievable. We can surpass all those problems. Yeah, it's going to take a little bit of time to ensure zero downtime. But we're absolutely on the path to moving to 100 percent Kubernetes at this point.
Did you change the way your team(s) were organized or operated in response to adopting microservices?
A team is responsible for a particular service or particular set of services.
How much freedom is there on technology choices? Did you all agree on sticking with one stack or is there flexibility to try new? How did you arrive at that decision?
We're totally open with it. We want to continue to push forward with new open source technologies that are available and we only have a couple constraints with the team that are very fair: must run in container environment and it has to be cost effective.
I want to make sure that everyone here at PubNub has the opportunity to use the technologies that they're passionate about, that they've heard a lot about, and they believe based on the communities, "Hey this will work even at scale."
Have you broken a monolithic application into smaller microservices? If so, can you take us through that process?
Luckily we haven't had to. It's sort of like you're thinking maybe different endpoints have different code bases type situation. We've already started with that essentially. We distributed all the code and all the logic between sprint teams and they're able to manage it within the team.
How do you determine service boundaries?
It's a choice between two options. You've got two options. One of the options is REST, like we talked about. Because it's really easy, it's transactional, it happens when something else happens, an event where you just need to pull some data out of some location, small snippet of information like key configuration and vaulting and those types of things are all RESTful, transactional. The second kind is an Event Driven Architecture streaming-based system where the data is constantly knitting. There's no need to initiate a request response session. You are able to transmit data fetch requests and events asynchronously allowing systems to work together and communicate in a fault tolerant manner. The failover and retry is built into the architecture, rather than this logic being coded for every REST call with retry on failure.
How have Microservices impacted your development, ops, and deployment processes? What were some challenges that came up and how did you solve them?
Deployment should be in increments and continuously throughout the week; except for Thursday and Friday and Saturday and Sunday. Avoid those days. If we do decide to deploy those days we will stay the weekend to make sure it works. So nobody wants to deploy those days. But it's all about how convenient it is and we want to make sure that our developers are happy in terms of how fast they are able to deploy.
[Our deployment process] is auto-build. So CI/CD. All of that's integrated. Once you deploy, to say the master branch, it depends on the team and how it's set up. But we all follow a standard process. It has to pass all the unit tests, the integration tests, it has to build of course. We use Travis. We still have some Jenkins in there that we're trying to get rid of. Everyone else is trying to move off that too. It's got to pass all the tests, run, the product manager has to verify it in staging, and then all the green lights are ready. We put in a change management that's on the calendar. Everyone can see the deployment calendar. And then we click the button at the time and we make sure everyone's on deck for if any problems happen.
Additionally some of our components operate using a BLUE / GREEN code deployment model. This allows new code that was recently deployed to be automatically revert to a working state if problems are detected by the system itself. Kubernetes is great at this.
How have Microservices impacted the way you approach testing? What are lessons learned or advice around this? Can you give some examples?
What's that deployment saying? It's immutable infrastructure. We want to maintain this sort of approach. This approach allows us to deploy a whole new cluster with some new logic that has changed and then run a tests. And that's something that was very hard for us to do before. Usually what we did was we deployed to a running cluster in EC2, off in another account somewhere, that wasn't touching production. We would deploy our updated component and we'd do the smoke test. With this approach we have mutated the infrastructure. After a while that one cluster would be so mutated, and stuff changed so constantly that it became a nightmare to maintain a copy of that in data center . So it's better to use a microservices approach where you spin up brand new infrastructure on the spot, every component is provisioned with the provision script. So there's no crutch. There's nothing left over. You get the opportunity to test your upgrades from clean code, clean configuration. And we mirror that easily in production now. So that's the biggest change for our approach to testing. An immutable approach to our infrastructure deployment in operations and testing methods.
How have Microservices impacted security and controlling access to data?
We're already using IPtables. . We were kind of already using other ways to connect and authenticate services key. It was built into the API. We still cryptographically sign and run data ownership checks for administration commands. I don't know. From my perspective we haven't changed anything for our move to using Kubernetes.
Thanks again to Stephen for his time and input! This interview was done for our Microservices for Startups ebook. Be sure to check it out for practical advice on microservices.