<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Uptime Architect</title>
    <description>The latest articles on DEV Community by Uptime Architect (@uptimearchitect).</description>
    <link>https://dev.to/uptimearchitect</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3973084%2Fdc838b73-4aab-46b5-8272-3b9ced38ce77.png</url>
      <title>DEV Community: Uptime Architect</title>
      <link>https://dev.to/uptimearchitect</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/uptimearchitect"/>
    <language>en</language>
    <item>
      <title>The Oracle HA Decision Tree: RAC vs Data Guard vs Both</title>
      <dc:creator>Uptime Architect</dc:creator>
      <pubDate>Sun, 07 Jun 2026 23:02:33 +0000</pubDate>
      <link>https://dev.to/uptimearchitect/the-oracle-ha-decision-tree-rac-vs-data-guard-vs-both-27ln</link>
      <guid>https://dev.to/uptimearchitect/the-oracle-ha-decision-tree-rac-vs-data-guard-vs-both-27ln</guid>
      <description>&lt;p&gt;"We have RAC, so we're covered for DR." It's one of the most expensive sentences in Oracle operations,&lt;br&gt;
and I've watched variations of it play out more than once. Real Application Clusters (RAC) and Data&lt;br&gt;
Guard both live under the "high availability" umbrella, so it's easy to assume they're interchangeable&lt;br&gt;
— or that having one means you don't need the other. They are not interchangeable. They solve&lt;br&gt;
&lt;em&gt;different&lt;/em&gt; failures, and the cost of confusing them is usually discovered at the worst possible time.&lt;/p&gt;

&lt;p&gt;This is the long version of how I think about the choice. We'll start where every good HA design&lt;br&gt;
starts — not with a feature, but with the failure you're trying to survive — then work through what RAC&lt;br&gt;
and Data Guard each actually do, what they cost (in licensing and in complexity), how to reason about&lt;br&gt;
RTO and RPO, and finally a decision tree you can apply to a real system. Everything here targets&lt;br&gt;
&lt;strong&gt;Oracle 19c&lt;/strong&gt;, the enterprise workhorse, with notes on where the newer releases — &lt;strong&gt;23ai&lt;/strong&gt; and the&lt;br&gt;
current &lt;strong&gt;26ai&lt;/strong&gt; — change the picture. It's written from general industry practice and lab work — your&lt;br&gt;
environment will differ, so test before you trust.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The short version.&lt;/strong&gt; &lt;strong&gt;RAC&lt;/strong&gt; keeps you running through a &lt;em&gt;node&lt;/em&gt; failure — but it's one copy of your&lt;br&gt;
data on shared storage, so it is &lt;strong&gt;not&lt;/strong&gt; disaster recovery. &lt;strong&gt;Data Guard&lt;/strong&gt; keeps you running through&lt;br&gt;
&lt;em&gt;site loss and corruption&lt;/em&gt; by maintaining an independent standby you fail over to. &lt;strong&gt;Neither&lt;/strong&gt; saves&lt;br&gt;
you from a bad &lt;code&gt;DELETE&lt;/code&gt; — only &lt;strong&gt;backups and Flashback&lt;/strong&gt; do. Set RTO and RPO with the business, then&lt;br&gt;
buy the cheapest combination that meets them.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Start with the failure, not the feature
&lt;/h2&gt;

&lt;p&gt;Before you evaluate any technology, write down the failure modes you actually need to survive. There&lt;br&gt;
are four that matter for an Oracle database, and they are genuinely different problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instance or node failure&lt;/strong&gt; — a database instance crashes, or the server it runs on dies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Site or region loss&lt;/strong&gt; — a data center, availability zone, or whole region becomes unavailable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data corruption&lt;/strong&gt; — physical block corruption (bad storage, lost writes) or logical corruption.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human error&lt;/strong&gt; — an accidental &lt;code&gt;DROP TABLE&lt;/code&gt;, a bad deploy, a &lt;code&gt;DELETE&lt;/code&gt; without a &lt;code&gt;WHERE&lt;/code&gt; clause.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No single feature covers all four. That is the entire reason this article exists. Here is the map we'll&lt;br&gt;
spend the rest of the post justifying:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Failure mode&lt;/th&gt;
&lt;th&gt;RAC&lt;/th&gt;
&lt;th&gt;Data Guard&lt;/th&gt;
&lt;th&gt;Backups + Flashback&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Instance / node failure&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Partial (failover)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Site / region loss&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Partial (slow, if offsite)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Block corruption&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human / logical error&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Notice that the bottom row — human error — is covered by &lt;em&gt;neither&lt;/em&gt; RAC nor Data Guard. Hold that&lt;br&gt;
thought; it's the mistake I see most often.&lt;/p&gt;
&lt;h2&gt;
  
  
  What RAC actually solves
&lt;/h2&gt;

&lt;p&gt;RAC runs &lt;strong&gt;multiple database instances on multiple servers (nodes) against one shared copy of the&lt;br&gt;
database&lt;/strong&gt;. The instances coordinate through Oracle Grid Infrastructure (Clusterware) and a private&lt;br&gt;
interconnect, using Cache Fusion to ship blocks between node memories. Clients connect through the SCAN&lt;br&gt;
listener and node VIPs, so a failed node's sessions are redirected to survivors.&lt;/p&gt;

&lt;p&gt;What that buys you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instance and node resilience.&lt;/strong&gt; If a node dies, the surviving instances keep serving the &lt;em&gt;same&lt;/em&gt;
database. There's no "restore" and no "fail over to a copy" — the data was already open on the other
nodes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Online scale-out for reads and writes.&lt;/strong&gt; Add a node, add capacity, without re-architecting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rolling maintenance.&lt;/strong&gt; Patch or relocate one node at a time while the service stays up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Brownout masking.&lt;/strong&gt; With application services and Application Continuity / TAF, in-flight work can
be replayed or transparently redirected during a node loss.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You check on it with Clusterware and &lt;code&gt;srvctl&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Cluster resource overview&lt;/span&gt;
crsctl status resource &lt;span class="nt"&gt;-t&lt;/span&gt;

&lt;span class="c"&gt;# Is the database up, and on which instances?&lt;/span&gt;
srvctl status database &lt;span class="nt"&gt;-d&lt;/span&gt; ORCLCDB

&lt;span class="c"&gt;# Service placement (services are how you steer connections across nodes)&lt;/span&gt;
srvctl status service &lt;span class="nt"&gt;-d&lt;/span&gt; ORCLCDB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the part that the "RAC is our DR" crowd misses: &lt;strong&gt;every RAC instance points at the same storage.&lt;/strong&gt;&lt;br&gt;
There is exactly one copy of your data. A storage array failure, a site outage, or a corrupt block is&lt;br&gt;
seen identically by all nodes. RAC gives you redundancy of &lt;em&gt;compute&lt;/em&gt;, not redundancy of &lt;em&gt;data&lt;/em&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A composite scenario (illustrative).&lt;/strong&gt; Picture a shop running a healthy 3-node RAC cluster. Uptime&lt;br&gt;
dashboards are green for two years; leadership is told the database is "fully redundant." Then a SAN&lt;br&gt;
controller pushes bad firmware and the shared LUNs go offline. All three nodes go down at once,&lt;br&gt;
because all three were reading the same storage. The cluster did exactly what it was designed to do —&lt;br&gt;
it just was never designed for &lt;em&gt;that&lt;/em&gt; failure. That's not a RAC flaw; it's a design gap.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  Licensing and complexity (the honest cost)
&lt;/h3&gt;

&lt;p&gt;RAC is a &lt;strong&gt;separately licensed option on top of Oracle Database Enterprise Edition&lt;/strong&gt;, priced per&lt;br&gt;
processor (or in the cloud, baked into certain shapes/editions). On top of license cost you're taking&lt;br&gt;
on real operational weight: Clusterware, a redundant private interconnect, shared storage (typically&lt;br&gt;
ASM), and the skills to run all of it. That complexity is itself a source of outages if the team isn't&lt;br&gt;
staffed for it — a &lt;a href="https://uptimearchitect.com/blog/oracle-rac-node-eviction-troubleshooting/" rel="noopener noreferrer"&gt;RAC node eviction&lt;/a&gt;, where&lt;br&gt;
Clusterware fences a node it can't verify is healthy, is the canonical 3am example.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RAC One Node&lt;/strong&gt; is the pragmatic middle ground: a single active instance that Clusterware can fail&lt;br&gt;
over (or you can online-relocate) to another node, with online rolling patching — most of the&lt;br&gt;
availability benefit, far less of the multi-instance complexity, and you can scale up to full RAC later.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# RAC One Node: relocate the running instance to another node, online&lt;/span&gt;
srvctl relocate database &lt;span class="nt"&gt;-d&lt;/span&gt; ORCLCDB &lt;span class="nt"&gt;-node&lt;/span&gt; racnode2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What Data Guard actually solves
&lt;/h2&gt;

&lt;p&gt;Data Guard maintains one or more &lt;strong&gt;standby databases&lt;/strong&gt; — independent, physically separate copies of&lt;br&gt;
your primary — kept in sync by shipping redo and applying it. A &lt;em&gt;physical&lt;/em&gt; standby applies redo&lt;br&gt;
block-for-block (Redo Apply); a &lt;em&gt;logical&lt;/em&gt; standby reconstructs SQL (SQL Apply). For HA/DR, physical&lt;br&gt;
standby is the default and the one I'll focus on. The Data Guard Broker (&lt;code&gt;dgmgrl&lt;/code&gt;) is how you should&lt;br&gt;
manage it — it removes most of the manual &lt;code&gt;ALTER DATABASE&lt;/code&gt; foot-guns.&lt;/p&gt;

&lt;p&gt;What it buys you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Site and region survival.&lt;/strong&gt; The standby is a &lt;em&gt;different&lt;/em&gt; database on &lt;em&gt;different&lt;/em&gt; storage, usually
in a &lt;em&gt;different&lt;/em&gt; location. Lose the primary site and you fail over to the standby.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Corruption protection.&lt;/strong&gt; Because the standby is an independent copy with its own writes, it doesn't
inherit the primary's physical block corruption. With Active Data Guard, &lt;strong&gt;Automatic Block Media
Recovery&lt;/strong&gt; can transparently repair a corrupt block on either side from the other.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A real failover/switchover target.&lt;/strong&gt; Planned role transitions (switchover) for maintenance, and
unplanned ones (failover) for disasters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read offload and more&lt;/strong&gt; (with Active Data Guard): an open read-only standby for reporting, offloaded
backups, and snapshot standbys you can open read-write for testing and then flip back.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You watch role and lag with SQL and the broker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Where am I, and what mode am I in?&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;database_role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;open_mode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;protection_mode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;switchover_status&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;   &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;database&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- How far behind is apply? (the number that matters during an incident)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time_computed&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;   &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;dataguard_stats&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;  &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'transport lag'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'apply lag'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dgmgrl sys@ORCLCDB
DGMGRL&amp;gt; SHOW CONFIGURATION&lt;span class="p"&gt;;&lt;/span&gt;
DGMGRL&amp;gt; SHOW DATABASE &lt;span class="s1"&gt;'ORCLCDB_STBY'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nt"&gt;--&lt;/span&gt; Planned role swap &lt;span class="o"&gt;(&lt;/span&gt;maintenance&lt;span class="o"&gt;)&lt;/span&gt;: primary and standby trade places
DGMGRL&amp;gt; SWITCHOVER TO &lt;span class="s1"&gt;'ORCLCDB_STBY'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nt"&gt;--&lt;/span&gt; Unplanned &lt;span class="o"&gt;(&lt;/span&gt;disaster&lt;span class="o"&gt;)&lt;/span&gt;: promote the standby
DGMGRL&amp;gt; FAILOVER TO &lt;span class="s1"&gt;'ORCLCDB_STBY'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Protection modes set your RPO
&lt;/h3&gt;

&lt;p&gt;Data Guard's protection mode is the dial that trades data-loss risk against primary performance:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Protection mode&lt;/th&gt;
&lt;th&gt;Redo transport&lt;/th&gt;
&lt;th&gt;Data loss (RPO)&lt;/th&gt;
&lt;th&gt;Effect on primary&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Maximum Performance (default)&lt;/td&gt;
&lt;td&gt;ASYNC&lt;/td&gt;
&lt;td&gt;Possible — seconds of redo&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maximum Availability&lt;/td&gt;
&lt;td&gt;SYNC&lt;/td&gt;
&lt;td&gt;Zero while in sync; falls back to ASYNC if the standby is unreachable&lt;/td&gt;
&lt;td&gt;Small commit latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maximum Protection&lt;/td&gt;
&lt;td&gt;SYNC&lt;/td&gt;
&lt;td&gt;Zero, guaranteed&lt;/td&gt;
&lt;td&gt;Primary &lt;strong&gt;stalls&lt;/strong&gt; if no standby can acknowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most enterprises run &lt;strong&gt;Maximum Availability&lt;/strong&gt; with SYNC transport to a nearby standby — zero data loss&lt;br&gt;
in normal operation, without the "halt production if the standby is down" behavior of Maximum&lt;br&gt;
Protection.&lt;/p&gt;
&lt;h3&gt;
  
  
  Going further: Fast-Start Failover and Far Sync
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fast-Start Failover (FSFO)&lt;/strong&gt; adds automatic failover. A lightweight &lt;strong&gt;Observer&lt;/strong&gt; process (run it on
a third, independent host) watches both databases and promotes the standby automatically if the
primary disappears — turning a 2am page into an event you read about in the morning.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  DGMGRL&amp;gt; ENABLE FAST_START FAILOVER&lt;span class="p"&gt;;&lt;/span&gt;
  DGMGRL&amp;gt; START OBSERVER&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Far Sync&lt;/strong&gt; solves the distance problem. SYNC gives you zero data loss but adds latency proportional
to distance, so a DR site 2,000 km away can't be SYNC without hurting production. A Far Sync instance
— a tiny control-file-and-redo-only instance placed &lt;em&gt;near&lt;/em&gt; the primary — receives redo SYNC (zero
loss, low latency) and forwards it ASYNC to the distant standby. You get RPO ≈ 0 &lt;em&gt;and&lt;/em&gt; geographic
distance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fpako%3AeNpdj8EKwjAQRH9lybniXaRghZ7EFvck1UOabGmgbWQTLbX0343Rg3jcYefNzCyU1SQ2IJrOjqqV7OFwugwAZVWy6SVP25rXKRpPsLvCagV4Pu6BSdsEnsQWOutc0FPIscolA06Dip6BwnX7QK5vZI5v_-4HYB_EoI3zclAUIViV7eSMkh1gUHX9k59FSpGFHMwLKGpHHACx1Si9aulTpPy-_esoEhA9cS-NDpNn4Vvq43hNjbx3XizLCwaGVuY" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fpako%3AeNpdj8EKwjAQRH9lybniXaRghZ7EFvck1UOabGmgbWQTLbX0343Rg3jcYefNzCyU1SQ2IJrOjqqV7OFwugwAZVWy6SVP25rXKRpPsLvCagV4Pu6BSdsEnsQWOutc0FPIscolA06Dip6BwnX7QK5vZI5v_-4HYB_EoI3zclAUIViV7eSMkh1gUHX9k59FSpGFHMwLKGpHHACx1Si9aulTpPy-_esoEhA9cS-NDpNn4Vvq43hNjbx3XizLCwaGVuY" alt="Far Sync gives you zero data loss over distance: synchronous redo to a nearby Far Sync ins" width="1195" height="141"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Far Sync gives you zero data loss over distance: synchronous redo to a nearby Far Sync instance, then asynchronous onward to a far-off standby. A Fast-Start Failover Observer in a third location promotes the standby automatically.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Licensing note
&lt;/h3&gt;

&lt;p&gt;Plain Data Guard (a physical standby in mount mode, doing Redo Apply) is &lt;strong&gt;included with Enterprise&lt;br&gt;
Edition&lt;/strong&gt; — there's no excuse not to have one. &lt;strong&gt;Active Data Guard&lt;/strong&gt; — the open read-only standby,&lt;br&gt;
Automatic Block Media Recovery, Far Sync, and friends — is a &lt;strong&gt;separately licensed option&lt;/strong&gt;. Decide&lt;br&gt;
deliberately which capabilities you're actually licensed for.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A composite scenario (illustrative).&lt;/strong&gt; A team has a standby and a green broker status, so DR is&lt;br&gt;
"done." Nobody has ever run a switchover. During a real failover they discover apply has been lagging&lt;br&gt;
for weeks behind a quietly-stuck archive gap, the network team never opened the ports for client&lt;br&gt;
redirection, and the runbook references a host that was decommissioned. The technology worked; the&lt;br&gt;
&lt;em&gt;operational readiness&lt;/em&gt; didn't. A standby you've never failed over to is a hope, not a plan.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  The combined topology: RAC + Data Guard
&lt;/h2&gt;

&lt;p&gt;When you genuinely need both local zero-downtime &lt;em&gt;and&lt;/em&gt; cross-site survival, you run &lt;strong&gt;RAC at each site&lt;br&gt;
with Data Guard between them&lt;/strong&gt;. This is the heart of Oracle's Maximum Availability Architecture (MAA):&lt;br&gt;
local node failures are absorbed by RAC with no failover at all, while a site loss triggers a Data&lt;br&gt;
Guard role transition.&lt;/p&gt;

&lt;p&gt;It's the gold standard, and it's also the most expensive and most complex thing on the menu — you're&lt;br&gt;
paying for (and operating) RAC &lt;em&gt;and&lt;/em&gt; Active Data Guard, in two locations. The honest question is&lt;br&gt;
whether your RTO/RPO targets and the business cost of downtime justify it. MAA frames this as tiers, so&lt;br&gt;
you can match spend to requirement:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;MAA tier&lt;/th&gt;
&lt;th&gt;Adds&lt;/th&gt;
&lt;th&gt;Protects against&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bronze&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single instance + RMAN backups + Flashback&lt;/td&gt;
&lt;td&gt;Corruption, human error (slow recovery)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Silver&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+ RAC or RAC One Node&lt;/td&gt;
&lt;td&gt;Instance/node failure (near-zero RTO locally)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gold&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+ Active Data Guard&lt;/td&gt;
&lt;td&gt;Site loss, corruption; read offload&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Platinum&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+ GoldenGate, Application Continuity, Edition-Based Redefinition&lt;/td&gt;
&lt;td&gt;Zero-downtime maintenance, app-transparent failover&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A useful way to read this table: &lt;strong&gt;you don't start at Gold.&lt;/strong&gt; You start at Bronze and climb only as far&lt;br&gt;
as your RTO/RPO and budget require.&lt;/p&gt;
&lt;h3&gt;
  
  
  What MAA Gold actually looks like
&lt;/h3&gt;

&lt;p&gt;It helps to picture the topology. RAC handles failures &lt;em&gt;inside&lt;/em&gt; each site; Data Guard handles losing a&lt;br&gt;
site; and the Observer — deliberately in a &lt;em&gt;third&lt;/em&gt; location — is what makes failover automatic without&lt;br&gt;
becoming a casualty of the outage it's supposed to detect.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fpako%3AeNp1kbFuwjAURX_lynMzxGOHSk4RnapUcTfD8BI_SCSSVLYDQoh_rxMFWqAdfe_xO9bzSVS9ZfEMsdn1h6omF_CZrTrAD-XW0VcNZT5c05I7wjeB12MHFKkp1Cu6eBnpGkmSoJA_kbxiU7VIjY6j2cKH3tGWofT7BZEzMh65szfuzOhAnS1v3PrRrf9w69kt_3fr2S1_uVWMoKrQ7BkLCoS3gZxFHNDH5gXTcvJMm6Ve5shLz27PbnwGDhSqmv2EqRm7zzPxBNGya6mxce0nEWpupw-wvKFhF8T5_A3bl3fy" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fpako%3AeNp1kbFuwjAURX_lynMzxGOHSk4RnapUcTfD8BI_SCSSVLYDQoh_rxMFWqAdfe_xO9bzSVS9ZfEMsdn1h6omF_CZrTrAD-XW0VcNZT5c05I7wjeB12MHFKkp1Cu6eBnpGkmSoJA_kbxiU7VIjY6j2cKH3tGWofT7BZEzMh65szfuzOhAnS1v3PrRrf9w69kt_3fr2S1_uVWMoKrQ7BkLCoS3gZxFHNDH5gXTcvJMm6Ve5shLz27PbnwGDhSqmv2EqRm7zzPxBNGya6mxce0nEWpupw-wvKFhF8T5_A3bl3fy" alt="MAA Gold: RAC at each site for local node resilience, Active Data Guard between sites for " width="957" height="548"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;MAA Gold: RAC at each site for local node resilience, Active Data Guard between sites for DR + corruption protection + read offload, and an FSFO Observer in a third location for automatic failover.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Read it as two independent failure domains: lose a &lt;em&gt;node&lt;/em&gt; and RAC absorbs it with no role change at all;&lt;br&gt;
lose a &lt;em&gt;site&lt;/em&gt; and Data Guard promotes the standby. The reporting team can run on the open Active Data&lt;br&gt;
Guard standby, and backups can be offloaded there too — so the DR copy earns its keep every day, not&lt;br&gt;
just during a disaster.&lt;/p&gt;
&lt;h2&gt;
  
  
  Don't forget the two failure modes nobody licensed for
&lt;/h2&gt;

&lt;p&gt;Look back at that first table. RAC and Data Guard together still leave two rows uncovered well, and one&lt;br&gt;
of them is the most common cause of "lost data" incidents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Block corruption&lt;/strong&gt; is partly handled by Data Guard (independent copy, Automatic Block Media Recovery)&lt;br&gt;
but your baseline defenses are configuration and backups: enable &lt;code&gt;DB_BLOCK_CHECKING&lt;/code&gt; and&lt;br&gt;
&lt;code&gt;DB_LOST_WRITE_PROTECT&lt;/code&gt;, run periodic &lt;code&gt;RMAN VALIDATE&lt;/code&gt;/&lt;code&gt;BACKUP VALIDATE&lt;/code&gt;, and keep recoverable RMAN&lt;br&gt;
backups.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human and logical error is the trap.&lt;/strong&gt; A &lt;code&gt;DELETE&lt;/code&gt; with no &lt;code&gt;WHERE&lt;/code&gt; clause is a perfectly valid&lt;br&gt;
transaction — so Data Guard faithfully ships it to the standby and applies it in milliseconds. Your&lt;br&gt;
"redundancy" just replicated the mistake to every copy. The defenses here are a different toolset&lt;br&gt;
entirely:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Flashback Database: rewind the whole database to just before the mistake&lt;/span&gt;
&lt;span class="c1"&gt;-- (requires flashback logging / a guaranteed restore point)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;flashback_on&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;database&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;FLASHBACK&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;RESTORE&lt;/span&gt; &lt;span class="n"&gt;POINT&lt;/span&gt; &lt;span class="n"&gt;before_bad_deploy&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Or recover a single object after an accidental drop&lt;/span&gt;
&lt;span class="n"&gt;FLASHBACK&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="k"&gt;BEFORE&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Guaranteed restore points before risky changes, Flashback Database/Table/Query, and RMAN&lt;br&gt;
point-in-time recovery are what save you here — &lt;strong&gt;not&lt;/strong&gt; replication. If you take one thing from this&lt;br&gt;
article beyond "RAC ≠ DR," take this: &lt;em&gt;replication is not a backup.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Make the decision with RTO and RPO first
&lt;/h2&gt;

&lt;p&gt;Every choice above maps cleanly onto two numbers you should set &lt;em&gt;with the business&lt;/em&gt;, not in IT:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RTO (Recovery Time Objective):&lt;/strong&gt; how long can you be down? RAC handles node failure in ~seconds with
no failover. Data Guard with FSFO recovers a site loss in seconds-to-minutes. Backups mean hours.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RPO (Recovery Point Objective):&lt;/strong&gt; how much data can you lose? RAC: zero (same data). Data Guard:
zero with SYNC/Far Sync, seconds with ASYNC. Backups: back to your last backup plus available redo.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Get those two numbers agreed and most of the architecture chooses itself. Here's the tree I walk:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fpako%3AeNplUk1PwzAM_StWTyCYuE8IVNZ17LAOFYQEpYesdZeINJnysUlU--84GaxIXFpH9nt-z_aQNLrFZApJJ_Wh4cw4eMk-FEB6UWXYCYVQvqyBqRbKpzUchOPgOMLGW8pZW1_CZHIHD8PKWwfWm73YI0hN2e3txtzcMThwLRGscAjagMGt0Or-GHo8EBbe0EaKWVUgttQJhGpxh_RRjsp3UjRsGrky5hgsPDNtHeCziMuGVFoNKoC_0OhJqw_KiR4jxHGj_ZaTooZJUGQWOiakN3iSkP2VMK_KdAZXkDYuuBjbRapVmsJCy1PviCt0hOXVWAlXsTZn1k2eXRhnTu30Hk19dvwDWwzFP83wq5dFmqD3RijrmGrOwoE50Oo00ZOJxV8Tj9EETTr81lRXEEl9LvtpvqyeaUO0l1_2WDGPufcQ5mP4OIbLMXy_qJYK5q_z8g02hij4FMpVWsCGNZ9-Z2mOuWSWh2c005GmRhvjd44uIJ4U9z3tG43Rpr5MriHp0fRMtHSQQ0Jn1sfTbLFjXrrkePwG6XPbdw" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2Fpako%3AeNplUk1PwzAM_StWTyCYuE8IVNZ17LAOFYQEpYesdZeINJnysUlU--84GaxIXFpH9nt-z_aQNLrFZApJJ_Wh4cw4eMk-FEB6UWXYCYVQvqyBqRbKpzUchOPgOMLGW8pZW1_CZHIHD8PKWwfWm73YI0hN2e3txtzcMThwLRGscAjagMGt0Or-GHo8EBbe0EaKWVUgttQJhGpxh_RRjsp3UjRsGrky5hgsPDNtHeCziMuGVFoNKoC_0OhJqw_KiR4jxHGj_ZaTooZJUGQWOiakN3iSkP2VMK_KdAZXkDYuuBjbRapVmsJCy1PviCt0hOXVWAlXsTZn1k2eXRhnTu30Hk19dvwDWwzFP83wq5dFmqD3RijrmGrOwoE50Oo00ZOJxV8Tj9EETTr81lRXEEl9LvtpvqyeaUO0l1_2WDGPufcQ5mP4OIbLMXy_qJYK5q_z8g02hij4FMpVWsCGNZ9-Z2mOuWSWh2c005GmRhvjd44uIJ4U9z3tG43Rpr5MriHp0fRMtHSQQ0Jn1sfTbLFjXrrkePwG6XPbdw" alt="A practical RAC vs Data Guard vs Both decision tree. Backups + Flashback are mandatory in " width="996" height="1233"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;A practical RAC vs Data Guard vs Both decision tree. Backups + Flashback are mandatory in every branch.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  A side-by-side, for the architecture review
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;RAC&lt;/th&gt;
&lt;th&gt;Data Guard&lt;/th&gt;
&lt;th&gt;RAC + DG&lt;/th&gt;
&lt;th&gt;Backups only&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Node/instance failure&lt;/td&gt;
&lt;td&gt;Yes (instant)&lt;/td&gt;
&lt;td&gt;Partial (failover)&lt;/td&gt;
&lt;td&gt;Yes (instant)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Site/region loss&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Partial (slow)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Block corruption&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (ADG repair)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (restore)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human/logical error&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (Flashback/PITR)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Typical RTO&lt;/td&gt;
&lt;td&gt;seconds&lt;/td&gt;
&lt;td&gt;seconds–minutes&lt;/td&gt;
&lt;td&gt;seconds&lt;/td&gt;
&lt;td&gt;hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Typical RPO&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0 (SYNC) / seconds (ASYNC)&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;last backup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read offload&lt;/td&gt;
&lt;td&gt;Yes (all nodes)&lt;/td&gt;
&lt;td&gt;Yes (Active DG)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rolling patching&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (standby-first)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scale-out writes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost beyond EE&lt;/td&gt;
&lt;td&gt;RAC option ($$)&lt;/td&gt;
&lt;td&gt;included; ADG extra&lt;/td&gt;
&lt;td&gt;both ($$$)&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operational complexity&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;medium&lt;/td&gt;
&lt;td&gt;highest&lt;/td&gt;
&lt;td&gt;low&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Where GoldenGate fits
&lt;/h2&gt;

&lt;p&gt;GoldenGate is the other tool people reach for, and it's worth knowing why it's &lt;em&gt;not&lt;/em&gt; usually the answer&lt;br&gt;
to this particular question. It does &lt;strong&gt;logical&lt;/strong&gt; replication — capturing changes and applying them&lt;br&gt;
elsewhere — which makes it brilliant for things Data Guard can't do: heterogeneous targets, cross-version&lt;br&gt;
and near-zero-downtime migrations and upgrades, active-active multi-master, and replicating a &lt;em&gt;subset&lt;/em&gt;&lt;br&gt;
of the data. But it's a separately licensed option, it's operationally heavier, and for plain "keep an&lt;br&gt;
identical standby for DR," physical Data Guard is simpler and tighter. Use GoldenGate when you need its&lt;br&gt;
logical flexibility (it's a Platinum-tier component for a reason) — not as a default DR mechanism.&lt;/p&gt;
&lt;h2&gt;
  
  
  A worked switchover (planned, zero data loss)
&lt;/h2&gt;

&lt;p&gt;Choosing the architecture is half the job; the other half is being able to &lt;em&gt;operate&lt;/em&gt; it under pressure.&lt;br&gt;
A &lt;strong&gt;switchover&lt;/strong&gt; is a planned, lossless role reversal — the primary becomes a standby and a standby&lt;br&gt;
becomes the primary. You'll do this for site maintenance, hardware refreshes, and — critically — as the&lt;br&gt;
rehearsal that proves your DR actually works. Always drive it through the Broker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1 — Validate before you touch anything.&lt;/strong&gt; Modern Broker gives you a pre-flight check that catches&lt;br&gt;
gaps, missing standby redo logs, and flashback problems &lt;em&gt;before&lt;/em&gt; you commit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;DGMGRL&amp;gt; SHOW CONFIGURATION&lt;span class="p"&gt;;&lt;/span&gt;          &lt;span class="nt"&gt;--&lt;/span&gt; expect: Status SUCCESS
DGMGRL&amp;gt; VALIDATE DATABASE &lt;span class="s1"&gt;'ORCLCDB_STBY'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A healthy result looks roughly like this (trimmed):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Database Role:       Physical standby database
  Primary Database:    ORCLCDB
  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)
  Flashback Database Status:
    ORCLCDB       : On
    ORCLCDB_STBY  : On
  Transport-Related Information:
    Transport lag:   +00 00:00:00
  Apply-Related Information:
    Apply lag:       +00 00:00:00
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If "Ready for Switchover" isn't &lt;strong&gt;Yes&lt;/strong&gt;, stop and fix that first — usually an archive gap, missing&lt;br&gt;
standby redo logs, or apply lag.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2 — Switch over.&lt;/strong&gt; One command; the Broker orchestrates both databases:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;DGMGRL&amp;gt; SWITCHOVER TO &lt;span class="s1"&gt;'ORCLCDB_STBY'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3 — Verify the new roles and that redo is flowing the other way:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- On the NEW primary (formerly the standby)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;database_role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;open_mode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;switchover_status&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;database&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- DATABASE_ROLE should now be PRIMARY, OPEN_MODE READ WRITE&lt;/span&gt;

&lt;span class="c1"&gt;-- Confirm the configuration is healthy again&lt;/span&gt;
&lt;span class="c1"&gt;-- DGMGRL&amp;gt; SHOW CONFIGURATION;   -&amp;gt; Status SUCCESS&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 4 — Redirect the application.&lt;/strong&gt; This is the step people forget. Clients need to land on the new&lt;br&gt;
primary — via a role-based service that only starts in the PRIMARY role, or via a connect string that&lt;br&gt;
lists both hosts. Test it, don't assume it.&lt;/p&gt;
&lt;h3&gt;
  
  
  Failover (unplanned) and reinstate
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;failover&lt;/strong&gt; is what you run when the primary is &lt;em&gt;gone&lt;/em&gt; and not coming back soon. It's faster and more&lt;br&gt;
decisive than a switchover, and with asynchronous transport it may cost you a small amount of redo (your&lt;br&gt;
RPO):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;DGMGRL&amp;gt; FAILOVER TO &lt;span class="s1"&gt;'ORCLCDB_STBY'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With &lt;strong&gt;Fast-Start Failover&lt;/strong&gt; enabled, you don't type that at all — the Observer detects the outage and&lt;br&gt;
promotes the standby automatically, typically in seconds. Either way, when the old primary comes back to&lt;br&gt;
life, you don't rebuild it from scratch: if it had Flashback Database enabled, the Broker can rewind and&lt;br&gt;
re-enrol it as the new standby in one step:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;DGMGRL&amp;gt; REINSTATE DATABASE &lt;span class="s1"&gt;'ORCLCDB'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That Flashback-Database prerequisite is exactly why "enable Flashback on both databases" belongs in your&lt;br&gt;
standard build — without it, a failover turns a returning primary into a full rebuild.&lt;/p&gt;
&lt;h2&gt;
  
  
  Monitoring: what to watch, and when to page
&lt;/h2&gt;

&lt;p&gt;A standby silently falling behind is the classic way DR rots. You need two numbers alarmed at all times —&lt;br&gt;
&lt;strong&gt;transport lag&lt;/strong&gt; (redo not yet received) and &lt;strong&gt;apply lag&lt;/strong&gt; (redo received but not yet applied) — plus&lt;br&gt;
the health of the apply process and, if you use it, the FSFO state.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- The two numbers that define your real-world RPO/RTO right now&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time_computed&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;   &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;dataguard_stats&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;  &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'transport lag'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'apply lag'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Is the apply process actually running? (run on the standby)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sequence&lt;/span&gt;&lt;span class="o"&gt;#&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;   &lt;span class="n"&gt;gv&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;managed_standby&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;  &lt;span class="n"&gt;process&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'MRP%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Fast-Start Failover health (run on the primary)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;fs_failover_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fs_failover_current_target&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fs_failover_observer_present&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;   &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;database&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sensible starting thresholds — tune them to &lt;em&gt;your&lt;/em&gt; RPO/RTO, not these defaults:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;Warning&lt;/th&gt;
&lt;th&gt;Critical&lt;/th&gt;
&lt;th&gt;Why it matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Transport lag&lt;/td&gt;
&lt;td&gt;&amp;gt; 60s&lt;/td&gt;
&lt;td&gt;&amp;gt; your RPO&lt;/td&gt;
&lt;td&gt;Redo isn't reaching the standby — data-loss exposure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apply lag&lt;/td&gt;
&lt;td&gt;&amp;gt; 5 min&lt;/td&gt;
&lt;td&gt;&amp;gt; your RTO&lt;/td&gt;
&lt;td&gt;Standby is "behind"; failover would replay slowly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MRP process&lt;/td&gt;
&lt;td&gt;not running&lt;/td&gt;
&lt;td&gt;absent after retry&lt;/td&gt;
&lt;td&gt;Apply has stopped — lag will grow unbounded&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FSFO status&lt;/td&gt;
&lt;td&gt;not SYNCHRONIZED / not within lag limit&lt;/td&gt;
&lt;td&gt;observer absent&lt;/td&gt;
&lt;td&gt;Automatic failover is &lt;em&gt;not&lt;/em&gt; currently possible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Archive gap&lt;/td&gt;
&lt;td&gt;any persistent gap&lt;/td&gt;
&lt;td&gt;growing&lt;/td&gt;
&lt;td&gt;A missing sequence blocks all further apply&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two operational notes: run the &lt;strong&gt;Observer on a third, independent host&lt;/strong&gt; (not on either database server —&lt;br&gt;
otherwise the thing that watches for failure can die &lt;em&gt;with&lt;/em&gt; the failure), and if you run Oracle Enterprise&lt;br&gt;
Manager, its Data Guard metrics wrap all of the above in alerting so you're not hand-rolling every check.&lt;/p&gt;

&lt;p&gt;One subtlety worth calling out: when &lt;strong&gt;apply lag&lt;/strong&gt; grows but transport is healthy and there's no archive&lt;br&gt;
gap, the standby itself is usually the bottleneck — redo is arriving but the apply can't keep up because&lt;br&gt;
the standby is I/O- or CPU-bound. That's not a Data Guard problem, it's a performance problem, and you&lt;br&gt;
diagnose it the same way you'd diagnose any slow database: pull an AWR report on the standby and read it.&lt;br&gt;
If that's unfamiliar territory, start with &lt;a href="https://uptimearchitect.com/blog/how-to-read-an-awr-report/" rel="noopener noreferrer"&gt;How to Read an AWR Report Without&lt;br&gt;
Drowning&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Troubleshooting the usual suspects
&lt;/h2&gt;

&lt;p&gt;When Data Guard misbehaves, it's almost always one of a handful of patterns. The Broker surfaces these as&lt;br&gt;
&lt;strong&gt;ORA-16xxx&lt;/strong&gt; messages — always read the Broker's StatusReport for the specific code and its recommended&lt;br&gt;
action rather than guessing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;DGMGRL&amp;gt; SHOW CONFIGURATION&lt;span class="p"&gt;;&lt;/span&gt;                 &lt;span class="nt"&gt;--&lt;/span&gt; look &lt;span class="k"&gt;for &lt;/span&gt;WARNING/ERROR
DGMGRL&amp;gt; SHOW DATABASE &lt;span class="s1"&gt;'ORCLCDB_STBY'&lt;/span&gt; StatusReport&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Symptom&lt;/th&gt;
&lt;th&gt;Likely cause&lt;/th&gt;
&lt;th&gt;Where to look&lt;/th&gt;
&lt;th&gt;Typical fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Apply lag climbing, sequence stuck&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Archive gap&lt;/strong&gt; — a missing redo sequence&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;v$archive_gap&lt;/code&gt;, &lt;code&gt;gv$archived_log&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Broker/FAL usually auto-resolves; if not, ship the missing logs and re-register&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Standby block corruption after a bulk load&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;NOLOGGING&lt;/strong&gt; operation on the primary&lt;/td&gt;
&lt;td&gt;alert log, &lt;code&gt;v$database.force_logging&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ALTER DATABASE FORCE LOGGING&lt;/code&gt;; restore affected datafile from primary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transport lag grows under load&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Network throughput &amp;lt; redo rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;v$dataguard_stats&lt;/code&gt;, redo generation rate&lt;/td&gt;
&lt;td&gt;Tune TCP/socket buffers, enable redo transport compression, or use Far Sync&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-time apply won't start&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Standby redo logs missing/undersized&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;v$standby_log&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Add standby redo logs (one more group than online, same size)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apply stopped after a failover test&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Flashback off&lt;/strong&gt;, can't reinstate&lt;/td&gt;
&lt;td&gt;&lt;code&gt;v$database.flashback_on&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Enable Flashback Database; reinstate via the Broker&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The meta-lesson: most "Data Guard is broken" tickets are really &lt;em&gt;forcing logging wasn't set&lt;/em&gt;, &lt;em&gt;standby&lt;br&gt;
redo logs were never created&lt;/em&gt;, or &lt;em&gt;the network can't keep up with peak redo&lt;/em&gt;. Get those three right at&lt;br&gt;
build time and you'll prevent the majority of incidents.&lt;/p&gt;
&lt;h2&gt;
  
  
  Test it for real: a DR game-day
&lt;/h2&gt;

&lt;p&gt;A standby you have never failed over to is a hope, not a plan — so put it on a schedule. A practical&lt;br&gt;
cadence is a &lt;strong&gt;switchover every quarter&lt;/strong&gt; (it's lossless and reversible) and a &lt;strong&gt;full failover drill at&lt;br&gt;
least annually&lt;/strong&gt;. To exercise the &lt;em&gt;application&lt;/em&gt; against standby data without disturbing replication, use&lt;br&gt;
a &lt;strong&gt;snapshot standby&lt;/strong&gt;: it opens read-write for testing, then discards its changes and catches back up.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nt"&gt;--&lt;/span&gt; Open the standby read-write &lt;span class="k"&gt;for &lt;/span&gt;application testing
DGMGRL&amp;gt; CONVERT DATABASE &lt;span class="s1"&gt;'ORCLCDB_STBY'&lt;/span&gt; TO SNAPSHOT STANDBY&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nt"&gt;--&lt;/span&gt; ... run your app &lt;span class="nb"&gt;test &lt;/span&gt;suite against it ...
&lt;span class="nt"&gt;--&lt;/span&gt; Roll it back and resume keeping pace with the primary
DGMGRL&amp;gt; CONVERT DATABASE &lt;span class="s1"&gt;'ORCLCDB_STBY'&lt;/span&gt; TO PHYSICAL STANDBY&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A repeatable game-day runbook:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Announce&lt;/strong&gt; the window and the rollback plan.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-check&lt;/strong&gt; with &lt;code&gt;VALIDATE DATABASE&lt;/code&gt; (Ready for Switchover = Yes).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execute&lt;/strong&gt; the switchover (or failover, for the annual drill).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify the application&lt;/strong&gt; actually reconnects through your role-based service — this is the test, not
the database role itself.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Measure&lt;/strong&gt; the real RTO and RPO and compare them to target. Numbers, not vibes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Switch back&lt;/strong&gt; and confirm the configuration returns to SUCCESS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Report&lt;/strong&gt;: measured RTO/RPO, every gap you hit, and the owner/date for each fix.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That report is also the artifact that turns "I think we're covered" into something leadership can&lt;br&gt;
actually rely on — and it's how you find the decommissioned-host-in-the-runbook problem in a drill&lt;br&gt;
instead of during a real outage.&lt;/p&gt;
&lt;h2&gt;
  
  
  Patching and upgrading without downtime
&lt;/h2&gt;

&lt;p&gt;Here's the payoff most teams undersell: the biggest &lt;em&gt;day-to-day&lt;/em&gt; return on HA isn't surviving disasters&lt;br&gt;
— it's making &lt;strong&gt;planned&lt;/strong&gt; maintenance nearly invisible. The same building blocks let you patch and&lt;br&gt;
upgrade with little or no downtime, and that benefit cashes in every single patch cycle.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rolling patches with RAC.&lt;/strong&gt; Most quarterly Release Updates are &lt;em&gt;RAC-rolling&lt;/em&gt;: you patch one node at
a time while the others keep serving the database. Connections drain off the node you're working on
(via services with a drain timeout, or Application Continuity) and return when it rejoins. No outage,
just a brief capacity dip.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standby-first patching.&lt;/strong&gt; For patches that aren't RAC-rolling, Data Guard gives you another route:
apply the patch to the &lt;strong&gt;standby&lt;/strong&gt; first, verify it there, switch over to the patched standby, then
patch the old primary. The application sees one short switchover instead of a maintenance window.
(Oracle marks which patches are "Standby-First Installable.")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Major upgrades with &lt;code&gt;DBMS_ROLLING&lt;/code&gt;.&lt;/strong&gt; A full release upgrade (say 19c → 23ai) normally means real
downtime. &lt;code&gt;DBMS_ROLLING&lt;/code&gt; converts your physical standby into a &lt;em&gt;transient logical standby&lt;/em&gt;, upgrades
it while the primary keeps running, and then switches over — so the application's downtime collapses
to a single switchover rather than the whole upgrade window:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- sketch of a DBMS_ROLLING upgrade, driven from the primary&lt;/span&gt;
&lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="n"&gt;DBMS_ROLLING&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INIT_PLAN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;future_primary&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'ORCLCDB_STBY'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="n"&gt;DBMS_ROLLING&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BUILD_PLAN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="n"&gt;DBMS_ROLLING&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;START_PLAN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;-- standby becomes a transient logical standby&lt;/span&gt;
&lt;span class="c1"&gt;-- ... upgrade the transient logical standby to the new release ...&lt;/span&gt;
&lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="n"&gt;DBMS_ROLLING&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SWITCHOVER&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;-- the application flips to the upgraded database&lt;/span&gt;
&lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="n"&gt;DBMS_ROLLING&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FINISH_PLAN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The thread tying all three together: &lt;strong&gt;planned downtime is a choice, not a law of physics.&lt;/strong&gt; If your&lt;br&gt;
SLA can't spare a maintenance window, the HA you built for disasters quietly pays for itself every time&lt;br&gt;
you patch.&lt;/p&gt;
&lt;h2&gt;
  
  
  Try it yourself: a runnable lab
&lt;/h2&gt;

&lt;p&gt;Reading about recovery is one thing; &lt;em&gt;doing&lt;/em&gt; it is what builds the reflex. I put together a small lab&lt;br&gt;
you can run on a laptop with nothing but Docker — no Oracle account required — so you can feel the most&lt;br&gt;
important lessons here first-hand. It uses the community &lt;strong&gt;Oracle Database Free&lt;/strong&gt; image and runs&lt;br&gt;
every command inside the container, so you don't even need a local Oracle client.&lt;/p&gt;

&lt;p&gt;A quick honesty note about scope, because it maps exactly to this article:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RAC isn't something you can meaningfully run on a single laptop.&lt;/strong&gt; It needs shared storage, a private
interconnect, and clusterware across nodes — a real cluster, not a container trick. So the lab doesn't
pretend to.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Guard is an Enterprise Edition feature&lt;/strong&gt;, and the zero-login Free image doesn't include it. So
the no-setup lab focuses on the failure modes you &lt;em&gt;can&lt;/em&gt; reproduce — and which this post argues are the
most commonly mishandled: &lt;strong&gt;human error, media loss, and corruption.&lt;/strong&gt; A separate, opt-in Enterprise
Edition module covers a real primary/standby switchover and failover for when you want to rehearse
those too.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Getting started is three commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./run.sh up        &lt;span class="c"&gt;# pulls the image and creates the database (first run takes a few minutes)&lt;/span&gt;
./run.sh setup     &lt;span class="c"&gt;# enables archivelog and creates a small demo schema&lt;/span&gt;
./run.sh all       &lt;span class="c"&gt;# runs all three drills end to end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The three drills, and the lesson each one drives home:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Human-error recovery.&lt;/strong&gt; The lab deletes every row (committed) and then drops the table — two
perfectly valid statements a standby would have replicated in milliseconds — and recovers both
&lt;em&gt;locally&lt;/em&gt; with Flashback Query and Flashback Table. This is the "replication is not a backup" point
you can now prove to yourself (and to a skeptical colleague) in thirty seconds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RMAN backup &amp;amp; restore.&lt;/strong&gt; Take a backup, take a datafile offline and delete it from disk to simulate
media failure, then restore and recover just that file while the rest of the database stays open.
That's the restore-drill muscle this post keeps insisting you build.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Block-corruption detection &amp;amp; recovery.&lt;/strong&gt; Write garbage into a single on-disk block, detect it with
&lt;code&gt;RMAN VALIDATE CHECK LOGICAL&lt;/code&gt;, and repair it with block media recovery — no full restore needed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The full lab — the &lt;code&gt;docker compose&lt;/code&gt; file, the &lt;code&gt;run.sh&lt;/code&gt; driver, every drill script, and the optional&lt;br&gt;
Enterprise Edition Data Guard module — is the &lt;code&gt;ha/&lt;/code&gt; lab in&lt;br&gt;
&lt;a href="https://github.com/pyaroslav/oracle-labs" rel="noopener noreferrer"&gt;github.com/pyaroslav/oracle-labs&lt;/a&gt;. Clone it, run it, break&lt;br&gt;
things on purpose. (No spare RAM on your laptop? The repo includes a guide to run the whole thing&lt;br&gt;
&lt;strong&gt;free&lt;/strong&gt; on an OCI Always Free cloud VM.) Discovering that your runbook references a decommissioned host&lt;br&gt;
is a great thing to learn in a lab on a Tuesday afternoon — and a terrible thing to learn at 2am.&lt;/p&gt;

&lt;h2&gt;
  
  
  What about 23ai and 26ai?
&lt;/h2&gt;

&lt;p&gt;If you're on or moving to a newer release — &lt;strong&gt;23ai&lt;/strong&gt;, or the current &lt;strong&gt;26ai&lt;/strong&gt; — the good news is that&lt;br&gt;
none of the &lt;em&gt;decision framework&lt;/em&gt; above changes: the failure modes are the same, RAC still protects&lt;br&gt;
compute, Data Guard still protects data, and backups + Flashback still own corruption and human error.&lt;br&gt;
The "ai"-era releases continue the same Maximum Availability Architecture lineage and add incremental&lt;br&gt;
improvements across the stack (redo transport/apply efficiency, manageability, and — notably in 23ai —&lt;br&gt;
new in-database capabilities like AI Vector Search that change &lt;em&gt;what&lt;/em&gt; you run, not &lt;em&gt;how&lt;/em&gt; you protect&lt;br&gt;
it). What does shift between releases is the small print: default parameter values, which features are&lt;br&gt;
enabled, and option licensing. So when you implement on 23ai or 26ai, confirm the exact behavior and&lt;br&gt;
licensing against that release's documentation rather than assuming 19c defaults carry over — and, if&lt;br&gt;
you want a free place to check, the &lt;strong&gt;Oracle Database Free&lt;/strong&gt; image (currently 26ai) and &lt;strong&gt;OCI Always&lt;br&gt;
Free&lt;/strong&gt; Autonomous Database both let you verify on a real instance at no cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  What teams get wrong (the short list)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Treating RAC as DR.&lt;/strong&gt; It isn't. One copy of data, one storage, one site.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An untested standby.&lt;/strong&gt; If you haven't done a real switchover, you don't have DR — you have a theory.
Schedule game-days.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assuming replication protects against mistakes.&lt;/strong&gt; A bad &lt;code&gt;DELETE&lt;/code&gt; reaches the standby before you can
cancel it. Flashback and backups are your safety net, every time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Buying Gold when Bronze/Silver was the requirement.&lt;/strong&gt; Match the MAA tier to a &lt;em&gt;stated&lt;/em&gt; RTO/RPO, not
to fear. Complexity you can't operate is a liability, not insurance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ignoring the licensing line.&lt;/strong&gt; RAC and Active Data Guard are paid options. Design within what you're
actually licensed for, or get the budget approved on purpose.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Oracle RAC a disaster recovery solution?
&lt;/h3&gt;

&lt;p&gt;No. RAC protects against instance and node failure by running multiple instances against one shared copy of the database. Because there is only one copy of the data on shared storage, a site outage, storage failure, or block corruption affects all RAC nodes at once. Disaster recovery requires an independent copy, which is what Data Guard provides.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I still need Data Guard if I already have RAC?
&lt;/h3&gt;

&lt;p&gt;Yes, if you need to survive losing a site or region, or to protect against data corruption. RAC and Data Guard solve different failures: RAC handles local node failure, while Data Guard maintains a separate standby database for site loss and corruption protection. Many mission-critical systems run both.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Data Guard protect against accidental data deletion?
&lt;/h3&gt;

&lt;p&gt;No. An accidental DELETE or DROP is a valid transaction, so Data Guard faithfully ships and applies it to the standby within seconds. Protection against human and logical errors comes from Flashback Database, Flashback Table, guaranteed restore points, and RMAN point-in-time recovery — not from replication.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between a switchover and a failover?
&lt;/h3&gt;

&lt;p&gt;A switchover is a planned, lossless role reversal between the primary and standby, used for maintenance and DR testing. A failover is an unplanned promotion of the standby when the primary is lost; with asynchronous transport it may incur a small amount of data loss. Fast-Start Failover can perform failovers automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Data Guard included with Oracle Enterprise Edition?
&lt;/h3&gt;

&lt;p&gt;Basic Data Guard — a physical standby in mount mode doing Redo Apply — is included with Enterprise Edition. Active Data Guard, which adds a read-only open standby, Automatic Block Media Recovery, and Far Sync, is a separately licensed option. RAC is also a separately licensed option.&lt;/p&gt;

&lt;h3&gt;
  
  
  What RPO can Data Guard achieve?
&lt;/h3&gt;

&lt;p&gt;Zero data loss is achievable using synchronous redo transport in Maximum Availability or Maximum Protection mode, optionally with a Far Sync instance to preserve zero RPO over long distances. Asynchronous transport (Maximum Performance) typically loses only seconds of redo but adds no commit latency on the primary.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between RAC and RAC One Node?
&lt;/h3&gt;

&lt;p&gt;Full RAC runs multiple active instances across nodes for both high availability and scale-out. RAC One Node runs a single active instance that Oracle Clusterware can fail over or online-relocate to another node, with rolling patching. RAC One Node offers most of the availability benefit with less complexity, and can be scaled up to full RAC later.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Oracle Maximum Availability Architecture (MAA)?
&lt;/h3&gt;

&lt;p&gt;MAA is Oracle's set of best-practice reference architectures for high availability and disaster recovery, organized into tiers: Bronze (a single instance with RMAN backups and Flashback), Silver (adds RAC or RAC One Node for local failure), Gold (adds Active Data Guard for site loss and corruption), and Platinum (adds GoldenGate, Application Continuity, and Edition-Based Redefinition for zero-downtime maintenance). You choose the lowest tier that meets your RTO and RPO targets.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is an Oracle Data Guard Far Sync instance?
&lt;/h3&gt;

&lt;p&gt;A Far Sync instance is a lightweight Data Guard member — just a control file and redo, no datafiles — placed close to the primary. The primary ships redo to it synchronously (zero data loss, low latency), and Far Sync forwards that redo asynchronously to a distant standby. This achieves zero-data-loss protection (RPO near zero) across long geographic distances without the commit latency that synchronous transport directly to a far-away standby would impose.&lt;/p&gt;

&lt;h2&gt;
  
  
  The one-paragraph version
&lt;/h2&gt;

&lt;p&gt;Set RTO and RPO with the business. Use &lt;strong&gt;RAC&lt;/strong&gt; (or RAC One Node) to survive instance and node failure&lt;br&gt;
at a site with no downtime. Use &lt;strong&gt;Data Guard&lt;/strong&gt; to survive site loss and corruption, with Fast-Start&lt;br&gt;
Failover for automatic recovery and Far Sync if you need zero data loss over distance. Use &lt;strong&gt;both&lt;/strong&gt; —&lt;br&gt;
MAA Gold — only when your targets genuinely demand it. And in &lt;em&gt;every&lt;/em&gt; design, no exceptions, keep RMAN&lt;br&gt;
backups and Flashback Database, because that's the only thing that saves you from the failure RAC and&lt;br&gt;
Data Guard can't: the human one.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://uptimearchitect.com/blog/oracle-ha-decision-tree-rac-vs-data-guard/" rel="noopener noreferrer"&gt;uptimearchitect.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>oracle</category>
      <category>rac</category>
      <category>dataguard</category>
      <category>highavailability</category>
    </item>
  </channel>
</rss>
