Sharon Grossman

Posted on May 20, 2022

Level up your microservices — to production grade

#node #typescript #microservices #programming

In a world full of microservices — written in different languages, using various frameworks and technologies, how do we keep a clear standard of what a service should have in order to be production grade ready?
In this post, I will share my key takeaways from creating a system consisting of a dozen microservices with clear standards and consistency.

Motivation — What’s in it for me? 😵

Follow key principles to make sure every service is ready for production
Distributed over tech stacks, and different teams
Visibility for errors, anomalies and bugs
Analyze service performance over time
Follow & Debug multiple services

Shall we?

Enforcing client headers ✋

This one is pretty straightforward — services should be able to identify the clients or other services that are sending requests to it and their versions.
We will do that by enforcing certain headers, thus limiting the surface area of the service for security or business reasons by enforcing certain clients and/or versions.

In this example I am using @osskit/enforce-client-headers as an express middleware to enforce a list of headers (By default x-api-client, and x-api-client-version)

Every request that won’t send the required headers, will be denied and get a Bad Request error.

Monitoring 📊

Metrics

Dashboards, dashboards, dashboards.
The ability to view the service’s performance, success and error counters is irreplaceable.
You can create, collect and display your metrics in many ways, but I’ve chosen to use @osskit/monitor which is a tool that creates Prometheus counters and histograms.

An example for a mongo scoped operations

The results can be viewed in Grafana — every time the create or update functions are called, we are creating more metrics that we can aggregate and display.

Success counter

Execution Times Histogram

Errors counter

To summarize, metrics are an important piece of monitoring your services, as they give you visibility on the service, can often visualize performance over time, provide SLA for operation centers, developers, and stakeholders.

Logging

I could not stress enough the importance of logging and how it correlates metrics, alerts & tracing.
There are tons of logging frameworks and tools, but the question is what do we want to log?

Error & Exceptions — as a gateway to finding and understanding bugs, anomalies and use cases
Info logs for critical operations or requests
Add tracing headers and metadata to enrich your logs when trying to retrieve context about the log and finding it quickly
@osskit/monitor Can also log success or error results with the context you provide the operation with:

The above is an example JSON of a log with provided context, and informative message

Alerts 💤

Alerting is an important step in understanding when your services are degrading, having errors, and just not working the way they should — thus alerting the relevant people and let them take a look perhaps before the customers are having a bad time.

You can use whichever alert engine that works for you, for example Grafana’s Cortex (Now known as Mimir), can result in yaml files that you can include in your project and keep them synced to Cortex.

I like to differentiate between 2 types of alerts.

Kubernetes Alerts: Kubernetes resources of the service

Health checks
CPU, Memory metrics
Networking

Service Specific Alerts

Business logic, operational
HTTP Requests long durations
HTTP Requests 5xx or 4xx errors passing thresholds

This can help you distinguish between different types of alerts, and understand quickly the type of alert you are dealing with, and using Prometheus queries can result in complex queries that allows you to monitor and alert for specific things.

Tracing 🔃

One could say this is coming from a disadvantage of microservices — tracing logs across services is difficult.
Tooling around tracing of microservices is improving, from a basic correlational Id that is sent with every request, to more advanced usage with Jaeger and Istio.
Here’s an example of retrieving certain headers from every incoming request, and forwarding them to outgoing requests using express-http-context.

Tracing is important to understand and debug flows that involve multiple microservices, simplifies the search for correlation between different requests over spanning microservices and can even visualize it for superb comprehension.

Honorable Mentions 👑

Tests — A bit obvious I think, but unit tests, closed-box tests, integration tests, smoke tests and periodic monitors are just a couple of ways you can fortify your applications and services from bugs and edge cases you might want to cover when running in production environment

Gradual Rollout — There are many ways to gradually rollout software, from load balancing to Argo Rollouts in combination with Istio, and this could dramatically increase your confidence in releasing software, by letting you get a glimpse of the new software running in production.

Service Ownership— Microservices can be split across different teams and domains, so ensuring ownership on each service is important for code reviews, answering questions and general responsibility & maintainability.
There are many ways to define and lay ownership, but here are some of my thoughts:

CODEOWNERS file per repository (microservice)
Branch protection
Request code reviews automatically from code owners

Final Thoughts 👋

So what did we have?

Monitoring — keep your friends close and your metrics closer
Headers — make sure you identify those requesting your assets
Alerts — the earlier, the better
Tracing — a bug that resides in a 50-services flow? easy.
Logging — because who needs live debugging?

It’s an incremental process — enhancing and improving your microservices standards, but hopefully these key principles will guide you in the right direction fit for your organization’s needs, and strive towards an easy life debugging your services at night. 😳

DEV Community

Level up your microservices — to production grade

Motivation — What’s in it for me? 😵

Enforcing client headers ✋

Monitoring 📊

Metrics

Logging

Alerts 💤

Tracing 🔃

Honorable Mentions 👑

Final Thoughts 👋

Top comments (0)

Read next

Enhancing LRU Cache with Configurable Data Persistence

The Easiest Way to Package Your Python Files(Turn to .exe Files)

🚀 Introducing JSON Viewer Extension 2025: The Modern Way to Visualize JSON

Stop Trying to Learn Everything -Focus on These 5 Key Skills Every Developer Needs