Ryan Giggs

Posted on Nov 7

MySQL HeatWave: Creating High Availability DB Systems and Understanding Limitations

#mysqlheatwave #highavailability #oraclecloudinfrastructure

MySQL HeatWave offers enterprise-grade high availability features that ensure your database remains operational even during hardware failures or maintenance events. Understanding how to properly configure high availability and its associated limitations is crucial for designing resilient database architectures.

This guide walks you through creating a high availability DB system and explores the important limitations you should consider when planning your deployment.

Creating a High Availability DB System

Overview of the Creation Process

Creating a high availability DB system in MySQL HeatWave follows a structured workflow with three main phases:

Set Up Basic DB System Configuration
Specify Database and Infrastructure Details
Configure Backup and Advanced Options

Let's explore each phase in detail.

Phase 1: Set Up Basic DB System Configuration

Step 1: Access the Console

Step 2: Navigate to DB Systems

Open the navigation menu
Under Databases, select HeatWave MySQL (OCI) or MySQL HeatWave (AWS)
Click DB Systems
Click Create DB System

Step 3: Choose Deployment Template

Select the Production template, which automatically configures your system for high availability with recommended settings for production workloads.

Alternatively, you can start with Development or Testing and manually enable high availability later.

Step 4: Basic Configuration

Compartment

Select the compartment where the DB system will reside
Compartments help organize resources and control access

Name Your DB System

Display Name: Provide a meaningful name (e.g., "prod-mysql-ha-primary")
Description: (Optional) Add details about the system's purpose

Database System Type

Ensure "High Availability" is selected if using the Production template
For manual setup, you can enable this option here

Phase 2: Specify Database Information

Administrator Credentials

Username Configuration

Specify the administrator username (default is typically "admin")
Avoid reserved usernames like "root", "mysql", "sys"
Username can be up to 32 characters

Password Requirements

Must be 8-32 characters long
Include at least one:
- Uppercase letter
- Lowercase letter
- Number
- Special character from: ,.~-+*;:_!
Store credentials securely (consider OCI Vault or AWS Secrets Manager)

Network Configuration

Virtual Cloud Network (VCN)

Select an existing VCN or create a new one
Subnet: Choose a private subnet for enhanced security
Public subnets are not recommended for production databases

Availability Domain Selection

Choose the preferred availability domain for your primary instance
This becomes your preferred placement
The system will automatically distribute secondary instances across different ADs or fault domains

Security Configuration

Ensure security lists or network security groups allow:
- Port 3306 for MySQL connections
- Port 33060 for MySQL X Protocol
Restrict source CIDR blocks to known application IP ranges

Hardware Configuration (Shape)

Select DB System Shape

Choose a shape that meets your performance requirements:

ECPU Shapes (Recommended)

Architecture-agnostic and future-proof
Examples: MySQL.8, MySQL.16, MySQL.32, MySQL.64, MySQL.128, MySQL.256
Can be changed after deployment
Support high availability and HeatWave clusters

Shape Characteristics

Defines CPU count, memory, and network bandwidth
Higher shapes provide better performance but cost more
Use Autopilot Shape Advisor for recommendations based on workload

Important: All OCPU shapes are deprecated and unavailable to new users created after September 5, 2024. Existing OCPU shapes will be unavailable after March 13, 2026.

Storage Configuration

Data Storage Size

Minimum: 50 GB
Maximum: 131,072 GB (128 TB)
Plan for current data plus growth projections
Storage can only be increased, never decreased

Automatic Storage Expansion

Recommended: Enable this feature
Automatically increases storage when usage approaches limits
Prevents downtime due to storage exhaustion
Based on Health Monitor read-only thresholds

Storage Performance

Higher storage allocations provide better IOPS
Storage is backed by high-performance block storage

Phase 3: Configure Backups

Enable Automatic Backups

Why Backups Matter

Protects against data loss
Enables point-in-time recovery
Required for disaster recovery planning

Backup Configuration Options

Enable Automatic Backups

Toggle: ON (highly recommended for HA systems)
Backups run automatically at scheduled times

Backup Retention Period

Range: 1 to 35 days
Recommended: At least 7 days for production
Longer retention = higher storage costs but better recovery options

Backup Window Selection

Choose a time when database activity is lowest
Typically during off-peak hours (e.g., 2:00 AM - 4:00 AM)
Avoid peak transaction periods to minimize performance impact

Point-in-Time Recovery (PITR)

Enable to restore to any point within retention period
Uses binary logs for granular recovery
Essential for recovering from accidental data modifications

Regional Backup Copy (OCI only)

Schedule automatic backup copies to another OCI region
Provides geographic redundancy
Protects against regional disasters

Advanced Backup Options

Backup Validation

Verifies backup integrity
Ensures backups are restorable
Available on supported MySQL versions

Soft Delete for Backups

Enabled by default
Backups remain in DELETED state before permanent deletion
Provides grace period for recovery of accidentally deleted backups

Additional Configuration Options

HeatWave Cluster Configuration (Optional)

If you need analytics capabilities:

Enable HeatWave Cluster

Checkbox: Select to add analytics processing
Choose HeatWave node shape (HeatWave.512GB, HeatWave.256GB, HeatWave.128GB)
Specify number of nodes (1 to 512 depending on shape and region)

Important HA Consideration: On OCI, HeatWave clusters are fully supported with high availability. On AWS, HeatWave features (including Lakehouse, GenAI, and AutoML) are currently not supported on HA DB systems.

MySQL Configuration

Configuration Selection

Use default HA-compatible configuration (recommended)
Configurations with "HA" suffix support high availability
Custom configurations must be based on HA-compatible templates

Configuration Characteristics

Defines system variables and initialization parameters
Can be changed after deployment with rolling restarts
Affects memory allocation, connection limits, and performance tuning

Tagging (Optional)

Resource Tags

Add metadata for organization
Enable cost tracking and allocation
Support automation and bulk operations

Tag Types

Defined Tags: Created by administrators with controlled vocabularies
Free-form Tags: Simple key-value pairs for flexible labeling

Finalizing Creation

Review Configuration

Before creating:

Verify all settings are correct
Check estimated monthly cost
Ensure network connectivity is properly configured
Confirm backup settings meet compliance requirements

Create DB System

Click Create to provision your high availability DB system.

Provisioning Timeline

DB system enters CREATING state
Typical creation time: 15-30 minutes for HA systems
Three instances are provisioned across availability/fault domains
Status changes to ACTIVE when ready

What Happens During Creation

Primary instance is provisioned in preferred placement
Two secondary instances created in different domains
MySQL Group Replication is configured automatically
Initial backup is taken (if enabled)
Monitoring and health checks are activated

Understanding High Availability Limitations

While MySQL HeatWave high availability provides robust fault tolerance, it's important to understand its limitations to design systems appropriately.

1. Rolling Upgrades and Brief Downtime

Characteristic: High availability DB systems require rolling upgrades for maintenance, during which there is a brief period of downtime before the newly promoted MySQL instance resumes connections.

What This Means:

During maintenance, instances are upgraded one at a time
When the primary instance is upgraded, a failover occurs
Connection strings remain the same, but applications must reconnect
Downtime is typically measured in seconds, not minutes

Best Practice:

Implement connection retry logic in applications
Use connection pooling with automatic reconnection
Schedule maintenance during low-traffic periods
Test application behavior during failovers

2. MySQL Version Requirements

Limitation: High availability is only supported for MySQL version 8.0.24 or higher.

Implications:

Older MySQL versions cannot use HA features
Must upgrade to 8.0.24+ before enabling high availability
Consider upgrading to MySQL 8.4.x (LTS) or 9.x (Innovation) for latest features

Migration Path:

Upgrade standalone DB system to 8.0.24 or higher
Ensure all prerequisites are met (primary keys, etc.)
Enable high availability after upgrade completes

3. Direct Access Limitations

Limitation: You can only access the primary MySQL instance directly; the two secondary instances cannot be accessed using MySQL Shell or any other client.

Why This Matters:

Cannot distribute read load across secondary instances for HA systems
All connections go through the primary instance
Read replicas are separate feature for read scaling (not part of HA)

Alternative Solutions:

Use Read Replicas for read scaling (separate from HA)
Read replicas can be created on HA or standalone systems
Offload analytics queries to HeatWave cluster instead

4. Transaction Size Limits Based on Shape Memory

Limitation: The maximum transaction size depends on the amount of memory of the shape.

Transaction Size Limits by Shape:

Smaller shapes have lower transaction size limits
Larger transactions require more memory for Group Replication
Exceeding limits causes transaction failure

Example Limits (approximate):

MySQL.8 (8 ECPU, 128 GB RAM): ~2 GB transaction limit
MySQL.16 (16 ECPU, 256 GB RAM): ~4 GB transaction limit
MySQL.32 (32 ECPU, 512 GB RAM): ~8 GB transaction limit

Workarounds:

Break large transactions into smaller batches
Use bulk loading features with smaller commit sizes
Select appropriate shape for your transaction patterns
Consider upgrading shape if hitting limits regularly

5. HeatWave Cluster Operations and HA Changes

Limitation: If a HeatWave cluster load or reload operation is ongoing, enabling or disabling high availability can fail.

Why This Happens:

HeatWave cluster operations lock resources during data loading
HA changes require cluster reconfiguration
Concurrent operations can cause conflicts

Best Practice:

Wait for HeatWave operations to complete before HA changes
Monitor HeatWave operation status
Schedule HA changes during maintenance windows
Avoid simultaneous infrastructure and data operations

6. HeatWave Lakehouse Compatibility

Previous Limitation: Prior to MySQL 8.3.0-u2, you could not enable high availability on a DB system with HeatWave Lakehouse enabled.

Current Status: This limitation has been lifted in MySQL 8.3.0-u2 and later versions.

Upgrade Path:

Systems on older versions must upgrade to use HA with Lakehouse
After upgrade, both features can coexist
On OCI, full support is available
On AWS, HeatWave features remain unsupported on HA systems

7. Platform-Specific Limitations

MySQL HeatWave on AWS

On AWS, MySQL HeatWave Clusters and features requiring a cluster (MySQL HeatWave Lakehouse, GenAI, AutoML) are not currently supported on high availability DB systems.

Impact:

HA systems on AWS are limited to OLTP workloads
Analytics features require standalone systems
Must choose between HA and advanced analytics on AWS

Query PrivateLink Considerations

When using Query PrivateLink endpoints, failovers and switchovers take longer than on public endpoints.

Mitigation:

Set connect_timeout to small value (e.g., 5 seconds)
Implement aggressive connection retry logic
Monitor failover times
Consider public endpoints for latency-sensitive applications

8. Configuration and Infrastructure Limitations

Bulk Ingest: The Bulk Ingest feature is not supported on high availability DB systems.

Alternative: Use standard data loading methods like LOAD DATA INFILE or MySQL Shell utilities.

PrivateLink Management: High availability cannot be enabled or disabled on a DB system with PrivateLink(s) configured.

Workaround: Delete PrivateLinks, make HA changes, then recreate PrivateLinks.

9. Rolling Restart for Configuration Changes

Behavior: When updating configuration of a high availability DB system that changes only dynamic variables, a rolling restart is performed, which shortens downtime and causes a change in current primary placement.

What Happens:

Configuration updated on primary instance
Controlled failover to secondary instance
Original primary updated and becomes secondary
System automatically rebalances if needed
HeatWave data reloaded automatically if cluster attached

Implications:

Brief connection interruption during rolling restart
Current placement may differ from preferred placement
Applications must handle reconnections
HeatWave queries may be briefly unavailable during reload

Best Practices for HA DB Systems

1. Application Design

Connection Management

Implement exponential backoff for reconnection attempts
Use connection pooling with health checks
Set appropriate timeout values
Handle transient errors gracefully

Transaction Design

Keep transactions small and fast
Avoid long-running transactions that span minutes
Break large data modifications into batches
Monitor transaction sizes against shape limits

2. Monitoring and Alerting

Key Metrics to Monitor

Replication lag between primary and secondary instances
Storage utilization and growth trends
Transaction throughput and latency
Failover events and frequency
Backup success rates

Set Up Alerts For

Failover events
High replication lag
Storage approaching limits
Backup failures
Configuration change events

3. Testing and Validation

Regular Testing

Perform manual switchovers during maintenance windows
Test application behavior during failovers
Validate backup and restore procedures
Verify monitoring and alerting systems

Chaos Engineering

Simulate instance failures
Test network partition scenarios
Validate automatic recovery mechanisms
Document recovery procedures

4. Maintenance Planning

Scheduling

Plan configuration changes during low-traffic periods
Coordinate with application teams for maintenance windows
Avoid changes during critical business periods
Document rollback procedures

Pre-Change Validation

Test configuration changes in non-production first
Verify compatibility with HA requirements
Ensure backups are current
Have rollback plan ready

5. Cost Optimization

Resource Planning

Start with appropriate shape size
Monitor utilization with Autopilot Shape Advisor
Adjust shapes based on actual workload patterns
Balance performance needs with cost constraints

Backup Management

Set retention periods based on compliance requirements
Use regional backup copies only when needed
Enable backup validation selectively
Clean up old backups per policy

When NOT to Use High Availability

While high availability is excellent for many use cases, it may not be appropriate when:

Development/Testing: Standalone systems are more cost-effective
Analytics-Heavy on AWS: HA doesn't support HeatWave features on AWS
Very Large Transactions: Shape memory limits may be restrictive
Budget Constraints: HA requires 3x compute resources vs standalone
Read-Heavy Workloads: Read replicas might be more appropriate than HA

Alternative High Availability Strategies

If MySQL HeatWave HA limitations are too restrictive:

Read Replicas for Read Scaling

Distribute read traffic across multiple instances
Works with both standalone and HA systems
Up to 10 read replicas per DB system
Better for read-heavy workloads

Multi-Region Deployment

Deploy separate DB systems in different regions
Use application-level failover logic
Replicate data using MySQL replication channels
Provides geographic redundancy

Hybrid Architectures

Use HA for OLTP workloads
Separate standalone system with HeatWave for analytics (AWS)
Replicate data between systems using channels
Best of both worlds but adds complexity

Conclusion

MySQL HeatWave high availability provides enterprise-grade fault tolerance with automatic failover, data protection, and minimal downtime. Understanding the limitations—particularly around rolling upgrades, transaction sizes, and platform-specific constraints—is essential for designing robust database architectures.

By following best practices for application design, monitoring, and maintenance, you can build resilient systems that meet stringent uptime requirements while staying within the boundaries of high availability limitations.

The key is matching your requirements to the capabilities: if you need maximum uptime for OLTP workloads with automatic failover, MySQL HeatWave HA is an excellent choice. If you need advanced analytics features on AWS or have very specific requirements, consider alternative architectures that better fit your needs.

Additional Resources: