I've been working predominately with the frontend for more than 10 years now. While i've been contributing to different services and backends, I never spend the time to see how microservices tie together. How does it all doesn't fall apart?
To me this is what a backend looks like. An app that has some API endpoints defined and returns a JSON object. But the problem is that this is how the world was 10 years ago.
I had to look closer and find out how that old image i have in my head translates to todays golden standard. Microservices. I broke it down to 4 sections.
This bit is easy. This is the micro service itself. You take all that code and you move it to its own repo. Easy. Use the framework of your choice.
So doing that we should end up with two services, User service and Transfer service.
OK. This is now a bit complicated. Where should this annotation live? Inside the micro service? Outside? If outside where? How would the system know where is what?
We need something that knows how to translate a url to a meaningful route and talk with the right service. Kinda like a traffic controller.
- We need a gateway. This is one server that works as a single entry point to the system. This is basically the facade design pattern. Netflix has an open source project for that called Zuul.
Cool, solved it! Nope. How does the gateway know where the server is? It needs an IP! Yeah but what happens if the IP changes? Someone needs to let it know.
- We need a discovery service. Blame the cloud, not me. If this was 10 years ago we would have hardwired locations. Now we need a service registry though. A service registry is basically a database of services, their instances and their locations. You call it, you ask where service A is, it tells you. Simples!
There are various tools for that:
- Netflix Eureka (TW uses that)
- Apache zookeeper
- Consul (We use that too)
OK! We are done. Almost…
How do you deal with failures? What do you return when one service fails? Worst, what happens when a service doesn’t fail but it keeps running?
In the old days of monoliths you should have implemented a timeout. You still need to, but that alone isn’t enough. A useful pattern is that of a circuit breaker. When things go wrong for a long time the circuit breaks and you know in a sensible time that you need to take some action or try again. Netflix Hystrix is a useful tool that deals with that. Alternatives to that is Envoy (that we use), Akka and maybe more that I can’t possibly know. Note that most of these tools are a lot more powerful that just circuit breaking.
That should be good for now.
In an old monolithic system having shared resources was easy. Put them in a folder and share them wherever you want. No problem. Now sharing things is not straightforward. How do you share configuration across services?
We need a configuration server. You can think of this as yet another service that just stores configuration we should share between all our other services. There are various tools that do that:
- Netflix archaius
- Spring cloud offers a solution (we use that currently)
- AWS offers its own config service
- Kubernetes also offers a config service too. (we are heading there at TW)
Yeah cool, but why share configuration?
“One of the key ways to identify what should be constant from service to service is to define what a well-behaved, good service looks like. What is a “good citizen” service in your system? What capabilities does it need to have to ensure that your system is manageable and that one bad service doesn’t bring down the whole system? “ - Sam Newman, Building Microservices
Configuration is only one aspect of the answer to the above. It’s just one thing that makes our services consistent and well behaved (hopefully). There is a lot more obviously.
This to me was the trickiest concept to get in terms with. How do I share functions between micro services? Also, how do micro services talk with each other? How does the Transfer service talk with the User service now?
OK, regarding sharing functions this is a matter of principle. Here is another quote from Sam Newman and his book Building Microservices:
My general rule of thumb: don’t violate DRY within a microservice, but be relaxed about violating DRY across all services
So the answer to my first question is, copy functions across if you need to, but don’t copy them within your micro service. Keep it DRY. Unfortunately school made us feel guilty for copy/pasting things around. It’s not always bad.
The second part of my questions is more interesting though. How do services talk with each other? To answer that we need to go back to service discovery. When Transfer service wants to talk to User service the following need to happen:
- Transfer service goes to the discovery service.
- Asks, give me the address of the User service
- Discovery replies with an IP or an address.
- Transfer service does an API call to that address.
This is called client-side discovery.
Client-side discovery is good because there are less moving parts. You are just utilising the existing parts of your system. It’s also bad. The bad thing is that you are tightly coupled with that discovery service. What happens if discovery dies? Also, you need to implement that discovery logic I described above to all of your services. That’s a lot of boilerplate.
We have an alternative. Server-side discovery. In server-side discovery the client makes a request via a router (a.k.a load balancer) that runs at a well known location. The router then calls the service registry (no you can’t avoid that completely) and you get the address you were looking for. Solution for that method are the following:
* AWS Elastic Load Balancer (ELB)
* Kubernetes proxy
* Marathon proxy
This is simpler by comparison with the client-side discovery because the service now has to only do a call to the router. No implementation of the discovery logic. Also, if you are on AWS it makes sense to just use ELB. The negative is that you add yet another moving part, yet another network hop in your system. Also whatever you choose to use has to support all the protocols you use (gRPC, Thrift, HTTP etc).
If monitoring is important on a monolith it’s twice as important on a distributed system with many micro services lying around. The challenge of finding why things failed can be a nerve breaking problem. To be able to get answers and avoid failure you need to have solid monitoring in place. There are a lot of options here.
The most important thing is visualisation of those metrics that we collect. On that front Graphite is a monitoring tool a lot of companies (including TW) use. You can use for example a tool like logstash to push metrics into Graphite and then get interesting graphs. That can help teams know when their services perform bad. Collecting metrics like inbound response, error rates and application level metrics are only a start. On top of that you can add alarms that can let your team know when things went bad.
While this is a great solution it’s not without problems. The biggest problem with Graphite is that services need to push to it, so it can be aware of what’s happening. If there are networking issues or the service is dead then your team won’t even know what’s wrong. Plus it adds another burden to the service.
Another alternative to that is Prometheus. With Prometheus instead of services pushing to it, Prometheus pulls the metrics. That immediately alleviates the burden from the services. Also Prometheus offers alarms without the need of extra software. So one less thing to worry about.
In my eyes there are no clear winners on this. The only clear thing is that if you are not monitoring things you will have a horrible time.
Basic concepts that you need in a micro service architecture
- Automate everything, and if the technology you have doesn’t allow this, get some new technology!
- If possible, move to a single-service per host/container. you need one CI build per microservice
- Focus on maintaining the ability to release one service independently from another, and make sure that whatever technology you select supports this.
- At scale, “partially broken” is the normal state of operation.
- CAP theorem (Consistency, Availability, Partition Tolerance). At its heart it tells us that in a distributed system, we have three things we can trade off against each other: consistency, availability, and partition tolerance. Specifically, the theorem tells us that we get to keep two in a failure mode.
- Consistency is the system characteristic by which I will get the same answer if I go to multiple nodes
- Availability means that every request receives a response
- Partition tolerance is the system’s ability to handle the fact that communication between its parts is sometimes impossible.
So here’s the ingredients you need for a micro service architecture:
- Gateway + Discovery service. Software that might be of interest:
- Some circuit breaker to deal with failures.
- Some configuration sharing service/server
- AWS config
- Spring cloud
There is clearly a lot more to this. This is just one little piece of the puzzle. There are a lot of tools to deal with provisioning of servers, availability based on traffic, orchestration of containers etc.
Need to keep up with a fast-moving software industry?