Microservices make life easier by breaking big applications into smaller, focused services that can be built, deployed, and scaled independently. You can work on just one part of the system without touching the rest, and teams can move faster without stepping on each other’s toes.
However, microservices also come with challenges, particularly when handling errors. With so many moving parts, it becomes harder to pinpoint exactly where something went wrong. In most cases, you’re dealing with multiple connected layers, which raises the chances of an error getting lost as it moves through the system.
This becomes even more critical in fintech, where you're handling real money. A missed error can trigger a ripple effect that impacts multiple customers and might lead to revenue loss.
In this guide, you’ll learn how to build a centralized pipeline to capture, diagnose, and respond to errors across services.
Before diving into the technical details, let’s walk through why traditional error handling doesn't hold up well in a microservices environment and what that means for your application.
Why Traditional Error Handling Falls Short in Microservices
In monolithic applications, error handling is usually more straightforward. You can often wrap a large portion of the logic in a try-catch
block, and all the logs typically land in one place. That makes it easier to track and manage errors. But that simplicity comes with trade-offs. For example:
- Deployment cycles are slower. A change in one module means retesting and redeploying the entire application.
- When one part of the app (say, KYC verification) is under heavy load, the whole system scales to match. That leads to inefficient resource usage and higher infrastructure costs.
- A single breach can compromise the entire system.
- Compliance checks that require data separation for audits become harder to manage, since monolithic systems don’t naturally isolate data.
- A bug in one part of the app can affect all components, even those that aren't directly involved. That raises the risk of downtime across the board.
While this list isn’t exhaustive, it’s clear why monoliths struggle to keep up with the needs of modern fintech systems. That’s part of what led to the rise of microservices as the go-to architectural approach. Microservices solved many of the scaling and flexibility issues, but they also introduced new challenges, especially when it comes to handling errors across distributed services.
The most common pattern in microservices is to localize errors. Each service is responsible for managing its own errors and handling issues specific to its functionality. This isn’t necessarily a bad approach, but it creates gaps when you're trying to understand how an error ripples through the system. When something breaks in a transaction that spans multiple services, you’ll find yourself asking questions like:
- Which service encountered the initial issue?
- Did that error affect downstream services?
- Were compensating transactions triggered correctly?
- Was the payment gateway API down, rate-limiting you, or rejecting malformed requests?
Without a centralized way to capture and trace these errors, your team ends up spending more time figuring out what went wrong than actually fixing the problem.
Building your Centralized Error Handling Pipeline
Now that you understand the risk, let's walk through how to implement a comprehensive error-handling system for your microservices architecture. This approach works particularly well with payment gateway integrations.
Note: This implementation uses the Flutterwave v3 API.
Step 1: Standardize Error Formats Across Services
The first step is to standardize error structures across all your services. Your structure can be as exhaustive as you need, but it should capture key details like the affected service, the context that helps explain the error, and any message or data returned by the payment API.
For example, imagine a virtual account management service that uses Flutterwave’s virtual account API to create accounts. If you forget to include the customer’s BVN in your request, Flutterwave responds like this:
{
"status": "error",
"message": "BVN is required for static account number",
"data": null
}
You can propagate and enrich that response into a standardized error like:
{
"errorId": "uuid-for-tracking",
"timestamp": "2025-05-16T14:30:45Z",
"service": "virtual-account-management",
"endpoint": "/api/payments/virtual",
"statusCode": 400,
"errorType": "FLUTTERWAVE_API_ERROR",
"details": {
"flutterwaveMessage": "BVN is required for static account number",
"flutterwaveErrorCode": "400",
"flutterwaveData": null
},
"context": {
"userId": "user-123",
"transactionId": "tx-456",
"requestBody": "{...}"
}
}
Step 2: Implement Consistent Logging Middleware
Next, add middleware that handles logging across your services. Logging both errors and key successful responses helps with debugging, monitoring, auditing, and compliance. You don't have to log every successful response, but log the ones that help trace behavior or track critical operations.
For example, you can create a reusable middleware to log all finished requests (success or error) and catch and report errors. Using it with Flutterwave API will look similar to this:
// loggerMiddleware.js
const { v4: uuidv4 } = require('uuid');
const { ErrorReporter } = require('./centralized-error-service');
const { LogReporter } = require('./centralized-log-service');
function requestLogger(req, res, next) {
// Capture start time
const start = Date.now();
// When response is done, log success or error based on statusCode
res.on('finish', () => {
const durationMs = Date.now() - start;
const logLevel = res.statusCode >= 400 ? 'error' : 'info';
const logPayload = {
timestamp: new Date().toISOString(),
level: logLevel,
service: process.env.SERVICE_NAME,
correlationId: req.headers['x-correlation-id'] || uuidv4(), // for end-to-end request tracing
method: req.method,
endpoint: req.originalUrl,
statusCode: res.statusCode,
durationMs,
context: {
userId: req.user?.id,
transactionId: req.body?.transactionId,
},
};
// Send to centralized log service
LogReporter.report(logPayload);
});
next();
}
function errorHandler(err, req, res, next) {
const errorId = uuidv4();
// Pull out Flutterwave specifics if available
const fw = err.response?.data || {};
const flutterwaveDetails = {
flutterwaveRequestId: fw.flw_ref,
flutterwaveErrorCode: fw.status,
};
const errorPayload = {
errorId,
timestamp: new Date().toISOString(),
level: 'error',
service: process.env.SERVICE_NAME,
correlationId: req.headers['x-correlation-id'] || uuidv4(),
method: req.method,
endpoint: req.originalUrl,
statusCode: err.statusCode || 500,
errorType: err.name || 'InternalServerError',
message: err.message,
details: {
...flutterwaveDetails,
stack:
process.env.NODE_ENV === 'production' ? undefined : err.stack,
},
context: {
userId: req.user?.id,
transactionId: req.body?.transactionId,
requestBody: JSON.stringify(req.body),
},
};
// Report to your error-tracking service
ErrorReporter.report(errorPayload);
// Respond with minimal info
res.status(errorPayload.statusCode).json({
errorId,
status: 'error',
message: err.message,
});
}
module.exports = { requestLogger, errorHandler };
Step 3: Route Logs to a Central Store
With standardized logs in place, you should now send them to a centralized system like:
- Datadog: For end-to-end observability and dashboards.
- AWS CloudWatch: Tightly integrated with AWS services for log aggregation and alerts.
- ELK Stack (Elasticsearch, Logstash, Kibana): For powerful search and visualizations.
- Grafana Loki: For log aggregation with minimal infrastructure overhead.
These tools let you:
- Search logs by fields like
userId
orrequestId
. - Visualize trends and correlations.
- Alert on error spikes or patterns.
- Create a unified view across microservices.
As a rule of thumb, log enough metadata to trace the origin and impact of an error. Include fields like:
-
requestId
: For end-to-end traceability. -
userId
: Useful for debugging and support. -
service
andoperation
: So you know what failed. -
timestamp
,severity
, andregion
: For filtering and alerting.
While it is advisable to log as much information about your request, don’t log sensitive information like API keys and secrets.
Step 4: Set up Real-time Monitoring Dashboards
The purpose of collecting and storing these logs is not just for storage, but to gain insight. Set up a dashboard to visualize:
- Application response times.
- Error breakdowns by type or HTTP status.
- Error rate per service.
- Most frequent external API failures.
Tools like Kibana and Grafana make it easy to filter by region, environment, and service. A good dashboard helps you answer questions like:
- Which services are triggering the most errors?
- Are errors increasing after a deployment?
- Which endpoints are the most fragile?
Step 5: Use Alerts That Tell the Whole Story
Monitoring without alerting is incomplete. Set up alerts that trigger based on:
- Critical failures, like failed payments or service outages.
- Performance warnings, such as increased latency.
- Informational issues that may indicate a degraded experience.
Categorizing alerts helps avoid alert fatigue. One failure might not matter, but five in a row might point to a major issue.
Consider routing alerts to tools your team already uses, like Slack or Microsoft Teams, to boost visibility and get a faster response.
Step 6: Measure and Track Success
As the popular saying goes, "What gets measured gets improved." To know your system is working, track metrics like:
- Mean Time to Detect (MTTD): How long it takes to identify an issue after it occurs.
- Mean Time to Resolve (MTTR): How long it takes to fix the issue once detected.
- Error Reduction Post-Deployment: Compare the volume and severity of errors before and after new releases.
- False-Positive Alert Ratio: How many alerts didn’t require action, which helps you tune your alerting thresholds.
A strong centralized error-handling setup should lead to faster detection, quicker resolution, and fewer user-facing incidents.
Wrapping Up
Errors are inevitable. Whether they come from your own service or a third-party API, they’re bound to happen. What matters is how quickly you can find them, understand them, and fix them.
If you're still relying on scattered logs and random alerts, it's time to rethink your approach. Start small, log consistently, and route intelligently. Always set alerts with context that gives you and your team the visibility they need to support a reliable, resilient microservices platform.
Top comments (1)
This is nice
Like you pointed out Context matters a lot with microservices error handling. A unified request id to trace everything is pretty neat.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.