DEV Community

Phylis Korir
Phylis Korir

Posted on

Monitoring vs Evaluation — What's the Difference (and Why It Matters)

Ask ten people working in international development what the difference is between monitoring and evaluation, and at least six will pause before answering.

They're always mentioned together. Abbreviated together (M&E). Budgeted together. Often assigned to the same person. And yet they are fundamentally different activities — different questions, different timing, different methods, different purposes.

Conflating them doesn't just create conceptual confusion. It produces M&E systems that are expensive to run, produce reports nobody reads, and fail to help anyone make a better decision.

This article draws the line clearly — and shows what breaks when you don't.


The one-sentence version

Monitoring asks: is the program running as planned?
Evaluation asks: did the program cause the change we intended?

Monitoring is continuous, operational, and descriptive. Evaluation is periodic, analytical and causal. One tells you the engine is running. The other tells you whether you're going somewhere worth going.


Monitoring: the continuous heartbeat

Monitoring is the ongoing collection and review of data throughout a program's life. It tracks inputs, activities and outputs — the things the program does and produces — against a pre-agreed plan.

Think of it as the program's vital signs. It answers:

  • Are activities happening on schedule?
  • Are resources being used as planned?
  • Are we reaching the people we intended to reach?
  • Are outputs being produced at the expected rate?

Monitoring data is typically collected at regular intervals — weekly field reports, monthly attendance registers, quarterly budget reviews. The data is mostly quantitative, mostly operational, and mostly about delivery fidelity.

What monitoring is not: it does not tell you whether your program is working. A training program can run on time, reach its target number of participants and produce every planned output — and still fail to change a single behaviour. Monitoring will not catch that. That's evaluation's job.


Evaluation: the periodic deep dive

Evaluation is a structured, time-bound assessment of a program's effectiveness, efficiency, relevance and impact. It happens at defined points — midterm, endline or post-program — not continuously.

Where monitoring tracks what happened, evaluation asks why and so what:

  • Did the program cause the intended outcomes?
  • Would the outcomes have happened anyway, without the intervention?
  • Was this the most efficient way to produce this change?
  • What should we do differently next time?

Evaluation requires a different kind of rigour. To answer causal questions, you need a comparison — a baseline, a control group or a counterfactual. You need evaluation design, not just data collection. The methods range from randomised control trials (the most rigorous, most expensive) to pre/post comparisons and qualitative case studies.

What evaluation is not: it is not a performance review of the M&E officer. It is not a box-ticking exercise for the donor. And it is not a substitute for ongoing monitoring — if you've been flying blind for two years and only look at impact at endline, it's too late to course-correct.


The project lifecycle view

Here's how the two functions sit across a typical program cycle:

Project Lifecycle

The baseline sits at the start of implementation — it's the "before" picture that makes the "after" picture meaningful. Without a baseline, an endline evaluation can describe what exists, but cannot measure change.


Four evaluation types — and when they happen

Not all evaluations are the same. The type you run depends on the question you need to answer and when you need the answer.

Type When Core question Analogy
Formative During implementation How can we improve delivery? Sprint retrospective
Process During or after Was it implemented as designed? Code / deployment audit
Summative At the end Did it achieve its objectives? Post-mortem review
Impact At the end (with counterfactual) Did the program cause the outcome? A/B test

Impact evaluation is the gold standard — and the most misunderstood. Many programs report outcome data at endline and call it an impact evaluation. It isn't, unless you've established what would have happened without the program. Without a counterfactual, you're describing correlation, not causation.


What breaks when you conflate them

This is where it gets practical. Here are the four failure modes that show up repeatedly when programs don't hold the distinction clearly.

1. Treating output data as evidence of impact

A program collects monthly attendance records (monitoring data) and presents them in a donor report as proof that the program is working. The outputs look good — high attendance, activities completed on schedule. But nobody has measured whether participants changed their behaviour, improved their livelihoods, or experienced any actual outcome.

The monitoring system is functioning. The evidence of effectiveness does not exist. The report conflates the two.

2. Running an evaluation without adequate monitoring data

A program reaches endline and commissions an impact evaluation. The evaluators ask for historical program data — attendance records, beneficiary lists, activity logs, budget actuals. The data is incomplete, inconsistently formatted, and partially missing.

The evaluation can only assess what happened at a single point in time. It cannot explain why outcomes did or didn't materialise, because there's no process data to draw on. A strong evaluation depends on strong monitoring. You can't retrospectively instrument a system you didn't instrument from the start.

3. Using evaluation questions to guide daily management

A program manager, anxious about whether the program is having impact, commissions a qualitative study every quarter to check. This is expensive, slow, and uses the wrong tool. The question "is our attendance rate dropping?" is a monitoring question — answer it with a dashboard, not an evaluation.

Evaluation methods (surveys, focus groups, qualitative interviews) are expensive to run well. Applying them to operational questions that monitoring should handle burns budget and generates noise.

4. Skipping baseline data collection

This is the most irreversible mistake. A program launches, runs for two years, and then wants to evaluate whether participants' income improved. But nobody collected income data at the start. There is no baseline. The endline figures describe the current situation — but against what do you compare them?

You cannot retrospectively collect baseline data. Once the program has started, the "before" picture is gone. Every M&E system needs baseline data collection built into the design phase, before implementation begins. This is a monitoring responsibility, but it exists to make evaluation possible.


A clean way to think about the split

If you're ever unsure which function a given activity belongs to, run it through these questions:

Is it continuous or periodic?
Continuous = monitoring. Periodic = evaluation.

Is it about delivery or about change?
Delivery (activities, outputs, reach) = monitoring. Change (outcomes, impact, causation) = evaluation.

Does it answer "what's happening?" or "did it work?"
What's happening = monitoring. Did it work = evaluation.

Will the findings be used to manage the program or to judge it?
Manage = monitoring. Judge (and learn, and redesign) = evaluation.


The relationship: not either/or, but sequential

Monitoring and evaluation aren't competing functions — they depend on each other.

Good monitoring data makes evaluation possible. The baseline, activity logs, output records, and beneficiary data that monitoring produces are the raw material an evaluator works with. An evaluation conducted on a program with weak monitoring is like a post-mortem on a system with no logs — you can describe the outcome, but you can't explain it.

Good evaluation findings improve future monitoring. After an evaluation, you understand which indicators actually predicted outcomes, which data points were noise, and where your results chain had faulty assumptions. That informs a sharper, leaner monitoring framework for the next program cycle.

The two functions are sequential and mutually reinforcing. The mistake is running them as if they're the same thing — or as if only one of them matters.


The practical checklist

Before a program launches, an M&E system should be able to answer yes to all of these:

  • Do we have a results chain (inputs → outputs → outcomes → impact)?
  • Have we defined indicators at each level — output and outcome?
  • Have we collected baseline data before implementation started?
  • Is there a monitoring plan — who collects what, when, and how?
  • Is there an evaluation plan — what type, at what point, with what comparison?
  • Are monitoring and evaluation budgeted separately?
  • Does someone own monitoring as an operational function (not just at reporting time)?
  • Does the evaluation plan include a counterfactual strategy?

If monitoring and evaluation are both assigned to one person with no dedicated budget, neither will be done well. They require different skills, different timing, and different outputs. A field officer building a dashboard is not running an evaluation. An external evaluator arriving at endline is not a substitute for two years of monitoring.


Summary

Monitoring Evaluation
When Continuous, throughout Periodic — baseline, midterm, endline
Question Is the program running as planned? Did the program cause the intended change?
Focus Inputs, activities, outputs Outcomes, impact, causation
Methods Routine data collection, dashboards Surveys, qualitative studies, experimental design
Used by Program managers (daily decisions) Donors, leadership, future program designers
Output Progress reports, dashboards, flags Evaluation reports, lessons learned
Requires Consistent systems and discipline Design rigour and a comparison point

They are not interchangeable. They are not redundant. And a program that has one but not the other is, in important ways, flying blind.


What's the worst M&E conflation you've seen in practice? Drop it in the comments.

Top comments (0)