Uptime Architect

Posted on Jun 22 • Edited on Jul 8 • Originally published at uptimearchitect.com

Data Guard Switchover vs Failover: Which Role Transition, and When

#oracle #database #dataguard #sre

The two words get used interchangeably in incident bridges, and that confusion costs people data. A
switchover and a failover both end with your standby running as the primary — but they are not
the same operation, they don't carry the same risk, and they leave your old primary in very different
states. Pick the wrong one under pressure and you either lose data you didn't have to, or you stall a
healthy database for no reason.

Here's the distinction that matters, what actually happens to each database, and how to automate the
one you can't afford to do by hand.

The short version. A switchover is a planned, graceful role reversal: primary and standby
swap roles with zero data loss, and it's fully reversible — it's for maintenance, rolling
upgrades, and DR tests. A failover is what you do when the primary is gone: a standby is
promoted, possibly with some data loss (your protection mode decides how much), and the old
primary drops out of the configuration until you reinstate it (Flashback Database) or rebuild it.
Switchover is a choice; failover is a response. Fast-Start Failover (FSFO) automates the
response via an Observer.

The one-sentence test

Is the primary still healthy and reachable? If yes and you want to move off it deliberately, that's
a switchover. If no — it's crashed, the site is gone, it's unreachable — that's a failover. A
switchover negotiates a clean hand-off with a primary that's still talking; a failover promotes the
standby precisely because the primary isn't.

The role-transition decision. A switchover requires a living primary to hand off cleanly; a failover promotes the standby because the primary is gone — manually, or automatically via the Fast-Start Failover Observer.

What actually happens in a switchover

A switchover is a coordinated role reversal. The primary stops accepting new transactions, ships its
final redo, and becomes a standby; the chosen standby applies that last redo and becomes the
primary. Because the old primary participates in the hand-off, there is no data loss, and nothing
is thrown away — your old primary is now a perfectly good standby, already in the configuration,
already protecting the new primary. You can switch back whenever you like.

That's why switchover is the workhorse of planned availability: rolling patching, hardware
maintenance, OS upgrades, and — most importantly — DR rehearsals. If you've never run a switchover,
you don't actually know your standby works.

What actually happens in a failover

A failover is a promotion under duress. The primary is gone, so there's no graceful hand-off — the
standby is told to become the primary now, applying whatever redo it has already received. Two
consequences follow that catch people out:

You may lose data. How much depends entirely on your protection mode and whether the standby was synchronized at the moment of failure (next section). In the default Maximum Performance mode, "a few seconds of redo" is typical; in a synchronous mode, it can be zero.
The old primary is out. After a failover, the former primary is disabled and can no longer participate in the Data Guard configuration. When it comes back, its timeline has diverged from the new primary, so you can't just plug it back in. If you enabled Flashback Database, you can reinstate it (flash it back and turn it into a standby of the new primary) in minutes. Without Flashback, you're rebuilding it from a backup or a fresh copy — hours, not minutes.

This is the single biggest reason to run Data Guard with Flashback Database on both databases: it
turns "rebuild the old primary" into one REINSTATE command.

Fast-Start Failover: automating the response

A failover is the operation you least want to perform by hand at 3am, so Data Guard can do it for you.
Fast-Start Failover (FSFO) uses the Broker plus a separate process called the Observer to detect
that the primary is gone and promote the standby automatically, with no DBA intervention — and then
automatically reinstate the old primary when it returns (if Flashback is enabled).

The non-negotiable details:

The Observer should run on a separate, independent host — ideally a third location, and never on the primary itself: if the observer lives on the primary, the thing that's supposed to notice the primary died dies with it. (Oracle recommends a third site; a host in the standby's data center is an accepted fallback when one isn't available.)
FSFO promotes only when the failure is real and the standby is recoverable — it respects a configurable threshold so a brief blip doesn't trigger a needless failover.
Run it through the Broker (DGMGRL); FSFO is not a manual-SQL feature.

Protection mode decides your failover data loss

Switchover is always zero-loss. Failover loss is set long before the incident, by your protection
mode:

Protection mode	Redo transport	Failover data loss	Trade-off
Maximum Performance (default)	ASYNC	possibly seconds of redo	no commit latency on the primary
Maximum Availability	SYNC	zero if synchronized at failure	small commit latency; degrades to ASYNC if the standby is unreachable
Maximum Protection	SYNC	zero, guaranteed	the primary shuts down rather than commit without a standby ack

The mode is a business decision — what is a transaction worth? — not a technical default to accept
blindly. Most estates run Maximum Availability with Fast-Start Failover for the zero-loss-without-the-
hard-stall sweet spot. (FSFO is supported in both Maximum Availability and, with a configured lag
limit, Maximum Performance.)

How to run each (the Broker way)

The Broker turns both transitions into one verb each, with built-in validation:

-- Planned: swap roles, no data loss, fully reversible
DGMGRL> SWITCHOVER TO 'standby_db';

-- Unplanned: promote the standby because the primary is gone
DGMGRL> FAILOVER TO 'standby_db';

-- After a failover, bring the old primary back as a standby (needs Flashback Database)
DGMGRL> REINSTATE DATABASE 'old_primary';

-- Turn on automatic failover (after setting the target, protection mode, and starting the observer)
DGMGRL> ENABLE FAST_START FAILOVER;

You can do role transitions with raw SQL — ALTER DATABASE SWITCHOVER TO <db> on the primary, and
ALTER DATABASE FAILOVER TO <db> on the standby (12c+ syntax; the older ACTIVATE PHYSICAL STANDBY DATABASE is a legacy, last-resort path) — but the Broker validates prerequisites, orders the steps, and
handles the observer and reinstate for you. For anything beyond a learning exercise, use the Broker.

Want to practice this? The Data Guard switchover/failover forensics
lab gives you five Broker
situations to read — decide switchover vs failover, quantify the data loss, and handle the old primary
— with a grade.sh self-check. No standby required; it's transcripts and bash.

What teams get wrong

Confusing the two under pressure — calling a failover when the primary is fine (and needlessly losing data), or attempting a switchover against a primary that's already dead (it can't hand off). The one-sentence test prevents both.
No Flashback Database, so every failover means rebuilding the old primary from scratch instead of a one-command reinstate.
The observer co-located on the primary — it dies with the very failure it exists to detect.
Never testing a switchover. An untested standby is a hope, not a DR plan. Switchover is the test; run it on a schedule.
Accepting the default protection mode without deciding what a lost transaction actually costs.

Role transitions are one piece of the bigger picture — see where Data Guard sits against RAC and
backups in The Oracle HA Decision Tree. To drill the
decision itself — switchover vs failover, the data loss, the reinstate — work through the five Broker
situations in the no-Docker Data Guard switchover/failover forensics
lab. And to stand up a
real physical standby and run an actual switchover end-to-end (your own Enterprise Edition binaries),
the opt-in Data Guard module walks
through it.

Frequently asked questions

What is the difference between switchover and failover in Oracle Data Guard?

A switchover is a planned, graceful role reversal between a healthy primary and a standby, with no data loss, and it is fully reversible. A failover promotes a standby to primary because the original primary is gone or unreachable; it may involve data loss depending on the protection mode, and the old primary must be reinstated or rebuilt afterward.

Does a Data Guard switchover lose data?

No. A switchover is a coordinated hand-off in which the primary ships its final redo before giving up the primary role, so there is no data loss. The old primary becomes a standby and remains in the configuration.

How much data does a failover lose?

It depends on the protection mode and whether the standby was synchronized at the moment of failure. In the default Maximum Performance (asynchronous) mode, typically a few seconds of redo can be lost. In Maximum Availability or Maximum Protection (synchronous) modes, a failover can be zero data loss when the standby was synchronized.

What happens to the old primary after a failover?

It is disabled and can no longer participate in the configuration because its timeline has diverged from the new primary. If Flashback Database was enabled, you can reinstate it as a standby of the new primary with a single REINSTATE command. Without Flashback, you must rebuild it from a backup or a fresh copy.

What is Fast-Start Failover and when does it trigger?

Fast-Start Failover (FSFO) uses the Data Guard Broker and a separate Observer process to automatically fail over to the standby when the primary is lost and conditions are met, with no DBA intervention, then automatically reinstate the old primary when it returns if Flashback is enabled. It respects a configurable threshold so a brief outage does not cause a needless failover.

Where should the Fast-Start Failover observer run?

On a separate host from both databases — ideally a third, independent location, and never on the primary itself, since it would fail along with the primary and could not initiate the failover it exists to perform.

Do I need Flashback Database for Data Guard?

It is not strictly required, but it is strongly recommended. Flashback Database lets you reinstate the old primary as a standby after a failover with one command instead of rebuilding it, and it is what makes automatic reinstatement under Fast-Start Failover possible.

Can I reverse a failover?

Not directly. After a failover the standby is the new primary and the old primary is out of the configuration. You bring the old primary back by reinstating it (with Flashback Database) or rebuilding it, after which you can switch back if you want the original roles.

Originally published at uptimearchitect.com.

DEV Community