Series Week 10/52 — Integrated Change Management in Oracle Services

#nabhaas #cto #oracle #thoughtleadership

{ Abhilash Kumar Bhattaram : Follow on LinkedIn }

Regulatory compliance in Oracle environments is not a paperwork exercise — it is operational discipline. I had blogged about Week 9 Blog here that both Banks and Insurance companies RBI and IRDAI explicitly demand tight control over database security, audit trails, patching hygiene, DR readiness, and privileged access governance.

Most compliance failures happen NOT because teams lack tools — but because Oracle operations, business peaks, and regulatory clauses are not mapped to each other.

Some situations of bad change management in Oracle Databases.



 +--------------------------------------------------------------------------------------+
| Examples of Bad Change Management in Oracle Environments                             |
+--------------------------------------------------------------------------------------+
| 1. Uncoordinated Patching Cycles                                                     |
|--------------------------------------------------------------------------------------|
| - DEV on RU 19.29, UAT on RU 19.28 , PROD on 19.17 with missing PSUs                 |
| - SQL plans behave differently across environments                                   |
| - Bugs appear only in PROD because lower tiers are not aligned                       |
| >> Result: No predictable testing, no predictable performance                        |
+--------------------------------------------------------------------------------------+
| 2. Parameter Changes Without Governance                                              |
|--------------------------------------------------------------------------------------|
| - Sessions/processes updated by DBA during peak load without CAB approval            |
| - Requires restart → but business denies downtime                                    |
| - Leads to half-implemented changes, inconsistent runtime behavior                   |
| >> Result: Configuration drift and unexpected outages                                |
+--------------------------------------------------------------------------------------+
| 3. Schema Deployments Done Directly in Production                                    |
|--------------------------------------------------------------------------------------|
| - Developers run DDL directly on PROD to "fix an issue fast"                         |
| - Invalid objects, wrong grants, missing synonyms                                    |
| - No rollback scripts, no deployment evidence                                        |
| >> Result: PROD instability and audit non-compliance                                 |
+--------------------------------------------------------------------------------------+
| 4. Unreviewed Performance Fixes During Incidents                                     |
|--------------------------------------------------------------------------------------|
| - AWR-based patching of SQL hints directly in PROD                                   |
| - Lack of regression testing for critical business workloads                         |
| - Fixes become permanent without validation                                          |
| >> Result: Later month-end/quarter-end failures due to untested execution plans      |
+--------------------------------------------------------------------------------------+
| 5. DR & HA Configurations Drifting Out of Sync                                       |
|--------------------------------------------------------------------------------------|
| - Data Guard not patched when primary is patched                                     |
| - Lag increases, redo transport errors ignored                                       |
| - Switchover tests fail when actually needed                                         |
| >> Result: RTO/RPO commitments fail during real events                               |
+--------------------------------------------------------------------------------------+
| 6. No Unified Change Calendar Across Infra – App – DB                                |
|--------------------------------------------------------------------------------------|
| - App team deploys a new feature                                                     |
| - Middleware upgraded on a different weekend                                         |
| - DB parameters changed by DBA on another day                                        |
| >> Result: Outages caused by independent, conflicting changes                        |
+--------------------------------------------------------------------------------------+
| 7. Ad-Hoc Storage or ASM Rebalancing During Peak Hours                               |
|--------------------------------------------------------------------------------------|
| - Storage team expands LUNs without notifying DB team                                |
| - ASM starts auto-rebalance during business peak                                     |
| - Spikes I/O latency → application slowdown                                          |
| >> Result: “Database is slow” escalations with no clear RCA                          |
+--------------------------------------------------------------------------------------+
| 8. No Version Control for Database Objects                                           |
|--------------------------------------------------------------------------------------|
| - Different object definitions between DEV / UAT / PROD                              |
| - Invalid or outdated packages deployed unknowingly                                  |
| - No tracking of who changed what and when                                           |
| >> Result: Debugging failures becomes impossible                                     |
+--------------------------------------------------------------------------------------+
| >> Poor change management doesn’t cause incidents — it *accumulates* them.           |
+--------------------------------------------------------------------------------------+

1. Ground Zero: Uncontrolled Change Management

Let us understand the uncontrolled chaos

+--------------------------------------------------------------------------------------+
| 1. Ground Zero: Where Change Chaos Begins                                            |
+--------------------------------------------------------------------------------------+
| - DB, App, Infra teams follow separate change cycles — no shared calendar            |
| - Changes about to go live without dependency mapping                                |
| - Emergency fixes bypass standard review & approvals                                 |
| - No unified deployment history or audit trail across stack                          |
| - Pre-checks or impact analysis often skipped, especially under time pressure        |
| - Rollback plans missing or untested — causing extended downtime on failures         |
+--------------------------------------------------------------------------------------+

2. Underneath Ground Zero: The root cause of change drift

There needs to be change manegement system that is relevant to the organization , using ready products to plug play for database changes is not the right way. The change management should be in line with techinical teams who are able to follow them.

+--------------------------------------------------------------------------------------+
| 2. Underneath Ground Zero: Understanding the Hidden Risk                             |
+--------------------------------------------------------------------------------------+
| - Risk not visible until after change — no predictive risk rating                    |
| - No correlation between change events and past incidents                            |
| - Multiple independent change tools & ticketing systems — no traceability            |
| - Environments (Dev/QA/Prod) inconsistent — tests pass in lower env, fail in Prod    |
| - Lack of schema / DB / infra / network dependency maps                              |
| - No baseline comparison (performance, capacity, configs) before/after changes       |
|                                                                                      |
| >> The problem isn’t changes themselves — it's unmanaged change process that         |
|    turns changes into outages.                                                       |
+--------------------------------------------------------------------------------------+

3. Working Upwards: Building a Managed Change-First Oracle Operation

To identify a change process one needs to understand how the bussiness works and ot do that leadership at every layer should provide inputs for maintanbility of the systems.

+--------------------------------------------------------------------------------------+
| 3. Working Upwards: Building a Managed Change-First Oracle Operation                 |
+--------------------------------------------------------------------------------------+
| - Establish unified Change Calendar (DB, App, Infra, Cloud)                          |
| - Require Dependency Mapping & Impact Analysis before any change                     |
| - Implement Automated Pre-Change Checks (space, backups, code quality, configs)      |
| - Use Version-controlled Deployments + Rollback-ready packages                       |
| - Enforce Post-Change Validation (performance baseline, functionality smoke tests)   |
| - Maintain central Change + Audit History for traceability and compliance            |
| - Introduce “Change Risk Scoring” based on past impact & complexity                  |
| - Align change windows with business load calendar — avoid high-traffic periods      |
|                                                                                      |
| >> Change management should not be an afterthought — it must be the backbone         |
|    of stable, predictable Oracle operations.                                         |
+--------------------------------------------------------------------------------------+

How Nabhaas helps you

At Nabhaas, we work closely with teams to uncover dependencies, knowledge gaps, and process inefficiencies to ensure the patching cycle is smooth and predictable.

TAB ( Total Automation Box ) is how we automate patching lifecycles. https://www.nabhaas.com/tab

There is no staright answer to the points mentioned above but all of them needs to be addressed as best fits the organization.
At Nabhaas we ensure we identify all the above before beginning a patch cycle. Feel free to download our whitepaper here