Monitoring in web apps is crucial element in terms of maintaining stability and ability to quickly react to critical incidents.
When an incident happens, you or even worse - your customer - finds a bug, you drop your feature work and hop on a debugging train. Having metrics and monitoring in place is incredibly helpful to trace where the issue lies, what's the impact and how it can be fixed quickly.
In this blog post I'd like to share some ideas around health metrics that can be added to crucial parts of the app to increase visibilty of how our code works, what's working as expected and what code paths fail most often. I'm not gonna dive into tools like Airbrake or Sentry, which are used to catch errors and report them. I'll explore what we can implement in our code to have as much useful information in our logs (or tools like Datadog) instead.
Health metrics and where to put them
Start with finding crucial parts of the app without which the project is broken. For me, the first thing that comes to my mind is payment processing. If we fail to process a payment, business is not making money, which means that we're losing it. We pay for hosting the app, but fail to satisfy users needs, which may lead to users abandoning our app - reducing our income, increasing our costs.
attempt, success, failure
So we have our most vulnerable area - payment processing. Implementing health metrics is extremely easy, but also extremely useful. Let's have a look on this example class.
class PaymentProcessor
...
def perform(amount, user)
payment = create_payment(amount, user)
payment.process_payment
payment.mark_as_finished
rescue Payments::Errors::PaymentFailed => e
payment.mark_as_failed(e)
ensure
set_result(payment.status)
end
...
end
We can define three phases of what's going on here.
- A payment is being created, we attempt to process a payment.
- If payment processing succeeds, we mark it as
finishedand return payment's status. - If payment processing fails, we mark it as
failed, save error message and return payment's status.
This gives us an idea of what health metrics we can implement here. And this rule is rather universal. We attempt to perform an action, then we either receive success or failure from the operation.
Here's how the metrics may look like in the most simple way.
class PaymentProcessor
...
def perform(amount, user)
metric_recorder.record_attempt
payment = create_payment(amount, user)
payment.process_payment
payment.mark_as_finished
metric_recorder.record_success
rescue Payments::Errors::PaymentFailed => e
payment.mark_as_failed(e)
metric_recorder.record_failure(e.error_code)
ensure
set_result(payment.status)
end
private
def metric_recorder
@metric_recorder ||= MetricRecorder.new(domain: 'payment_processing')
end
end
This would already give us some insight on how much payment processing attempts we had, how many of them succeeded and how many failed.
attempts = success + failed
If the numbers don't add up, it means we have some issue in-between the metrics being recorded. Maybe we get an error in payment.mark_as_finished that is not handled, hence we don't get ending metric properly? That's also valuable information for debuggin purposes. We can see how far the code execution went.
Wrapper
This was the simpliest version, but it pollutes the code, and we need to remember about adding it each time we implement new service. Ruby let's us make it much more cleaner way and makes it possible to reuse the code. We can add a method to our MetricRecorder module that will accept a block and handle the rest.
module MetricRecorder
def record_health(&block)
record_attempt
block.call
record_success
rescue StandardError => e
record_failure(e)
raise e # we want to re-raise error to handle it outside of the recorder
end
end
And here's how record_health wrapper method can be used in the PaymentProcessor.
class PaymentProcessor
include MetricRecorder
def perform(amount, user)
record_health do
payment = create_payment(amount, user)
payment.process_payment
payment.mark_as_finished
rescue Payments::Errors::PaymentFailed => e
payment.mark_as_failed(e)
raise e # we re-raise the error to propagate it to recorder and further
ensure
set_result(payment.status)
end
end
end
This makes our class clean again, reduces need to think where we should put metric loggers, and moves metric recorders outside of this class, making it reusable in the whole project. Once we want to add some extra data to metrics, we can update just MetricRecorder instead of all classes that use record_... methods.
The recorder module
Okay, we know how this should look from the code execution perspective, but what the record_attempt, record_success and record_failure should actually do?
Metrics are designed to just give an idea of the traffic in the app. If we want to have more detailed data, we should incorporate loggers probably.
If we're using Datadog, then we'd probably just increment attempt, success, failure metrics, with adding error_code to failure probably. It's due to costs and metrics design in DD - we can only send defined set of metrics. We can't propagate IDs, amount, or some specific user data in those metrics, as those are infinite.
module MetricRecorder
def record_health(metric_name, &block)
record_attempt(metric_name)
block.call
record_success(metric_name)
rescue StandardError => e
record_failure(metric_name, e)
raise e
end
private
def record_attempt(metric_name)
DatadogService.increment(metric_name, 'attempt')
end
def record_success(metric_name)
DatadogService.increment(metric_name, 'success')
end
def record_failure(metric_name, e)
DatadogService.increment(metric_name, 'failure', error_code: e.error_code)
end
end
To get more data and really monitor our app, we could also write a logger module.
module LogsRecorder
def record_log(method_name, &block)
log(method_name, { action_type: 'attempt' })
result = block.call
log(method_name, { action_type: 'success', result: result })
result
rescue StandardError => e
log(method_name, { action_type: 'failure', result: { error_class: e.class.name.demodulize, error_message: e.message } })
raise e
end
private
def log(method_name, additional_tags)
Rails.logger.info(
class_name: self.class.name.demodulize.underscore,
method_name: method_name,
additional_tags: additional_tags
)
end
end
And here's how we could incorporate both modules - having metrics and logs in place. This would let us quickly monitor metrics from Datadog web dashboard for example, and if something feels off - we can jump into detailed logs, grep for action_type: 'failure' and see what went wrong.
class PaymentProcessor
include MetricRecorder
include LogsRecorder
def perform(amount, user)
record_health('payment_processing') do
record_log('perform') do
payment = create_payment(amount, user)
payment.process_payment
payment.mark_as_finished
rescue Payments::Errors::PaymentFailed => e
payment.mark_as_failed(e)
raise e
ensure
set_result(payment.status)
end
end
end
end
Top comments (0)