DEV Community

Cover image for MySQL HeatWave: Creating High Availability DB Systems and Understanding Limitations
Ryan Giggs
Ryan Giggs

Posted on

MySQL HeatWave: Creating High Availability DB Systems and Understanding Limitations

MySQL HeatWave offers enterprise-grade high availability features that ensure your database remains operational even during hardware failures or maintenance events. Understanding how to properly configure high availability and its associated limitations is crucial for designing resilient database architectures.

This guide walks you through creating a high availability DB system and explores the important limitations you should consider when planning your deployment.

Creating a High Availability DB System

Overview of the Creation Process

Creating a high availability DB system in MySQL HeatWave follows a structured workflow with three main phases:

  1. Set Up Basic DB System Configuration
  2. Specify Database and Infrastructure Details
  3. Configure Backup and Advanced Options

Let's explore each phase in detail.

Phase 1: Set Up Basic DB System Configuration

Step 1: Access the Console

Log in to your Oracle Cloud Infrastructure (OCI) console or MySQL HeatWave on AWS console at the respective portal.

Step 2: Navigate to DB Systems

  • Open the navigation menu
  • Under Databases, select HeatWave MySQL (OCI) or MySQL HeatWave (AWS)
  • Click DB Systems
  • Click Create DB System

Step 3: Choose Deployment Template

Select the Production template, which automatically configures your system for high availability with recommended settings for production workloads.

Alternatively, you can start with Development or Testing and manually enable high availability later.

Step 4: Basic Configuration

Compartment

  • Select the compartment where the DB system will reside
  • Compartments help organize resources and control access

Name Your DB System

  • Display Name: Provide a meaningful name (e.g., "prod-mysql-ha-primary")
  • Description: (Optional) Add details about the system's purpose

Database System Type

  • Ensure "High Availability" is selected if using the Production template
  • For manual setup, you can enable this option here

Phase 2: Specify Database Information

Administrator Credentials

Username Configuration

  • Specify the administrator username (default is typically "admin")
  • Avoid reserved usernames like "root", "mysql", "sys"
  • Username can be up to 32 characters

Password Requirements

  • Must be 8-32 characters long
  • Include at least one:
    • Uppercase letter
    • Lowercase letter
    • Number
    • Special character from: ,.~-+*;:_!
  • Store credentials securely (consider OCI Vault or AWS Secrets Manager)

Network Configuration

Virtual Cloud Network (VCN)

  • Select an existing VCN or create a new one
  • Subnet: Choose a private subnet for enhanced security
  • Public subnets are not recommended for production databases

Availability Domain Selection

  • Choose the preferred availability domain for your primary instance
  • This becomes your preferred placement
  • The system will automatically distribute secondary instances across different ADs or fault domains

Security Configuration

  • Ensure security lists or network security groups allow:
    • Port 3306 for MySQL connections
    • Port 33060 for MySQL X Protocol
  • Restrict source CIDR blocks to known application IP ranges

Hardware Configuration (Shape)

Select DB System Shape

Choose a shape that meets your performance requirements:

ECPU Shapes (Recommended)

  • Architecture-agnostic and future-proof
  • Examples: MySQL.8, MySQL.16, MySQL.32, MySQL.64, MySQL.128, MySQL.256
  • Can be changed after deployment
  • Support high availability and HeatWave clusters

Shape Characteristics

  • Defines CPU count, memory, and network bandwidth
  • Higher shapes provide better performance but cost more
  • Use Autopilot Shape Advisor for recommendations based on workload

Important: All OCPU shapes are deprecated and unavailable to new users created after September 5, 2024. Existing OCPU shapes will be unavailable after March 13, 2026.

Storage Configuration

Data Storage Size

  • Minimum: 50 GB
  • Maximum: 131,072 GB (128 TB)
  • Plan for current data plus growth projections
  • Storage can only be increased, never decreased

Automatic Storage Expansion

  • Recommended: Enable this feature
  • Automatically increases storage when usage approaches limits
  • Prevents downtime due to storage exhaustion
  • Based on Health Monitor read-only thresholds

Storage Performance

  • Higher storage allocations provide better IOPS
  • Storage is backed by high-performance block storage

Phase 3: Configure Backups

Enable Automatic Backups

Why Backups Matter

  • Protects against data loss
  • Enables point-in-time recovery
  • Required for disaster recovery planning

Backup Configuration Options

Enable Automatic Backups

  • Toggle: ON (highly recommended for HA systems)
  • Backups run automatically at scheduled times

Backup Retention Period

  • Range: 1 to 35 days
  • Recommended: At least 7 days for production
  • Longer retention = higher storage costs but better recovery options

Backup Window Selection

  • Choose a time when database activity is lowest
  • Typically during off-peak hours (e.g., 2:00 AM - 4:00 AM)
  • Avoid peak transaction periods to minimize performance impact

Point-in-Time Recovery (PITR)

  • Enable to restore to any point within retention period
  • Uses binary logs for granular recovery
  • Essential for recovering from accidental data modifications

Regional Backup Copy (OCI only)

  • Schedule automatic backup copies to another OCI region
  • Provides geographic redundancy
  • Protects against regional disasters

Advanced Backup Options

Backup Validation

  • Verifies backup integrity
  • Ensures backups are restorable
  • Available on supported MySQL versions

Soft Delete for Backups

  • Enabled by default
  • Backups remain in DELETED state before permanent deletion
  • Provides grace period for recovery of accidentally deleted backups

Additional Configuration Options

HeatWave Cluster Configuration (Optional)

If you need analytics capabilities:

Enable HeatWave Cluster

  • Checkbox: Select to add analytics processing
  • Choose HeatWave node shape (HeatWave.512GB, HeatWave.256GB, HeatWave.128GB)
  • Specify number of nodes (1 to 512 depending on shape and region)

Important HA Consideration: On OCI, HeatWave clusters are fully supported with high availability. On AWS, HeatWave features (including Lakehouse, GenAI, and AutoML) are currently not supported on HA DB systems.

MySQL Configuration

Configuration Selection

  • Use default HA-compatible configuration (recommended)
  • Configurations with "HA" suffix support high availability
  • Custom configurations must be based on HA-compatible templates

Configuration Characteristics

  • Defines system variables and initialization parameters
  • Can be changed after deployment with rolling restarts
  • Affects memory allocation, connection limits, and performance tuning

Tagging (Optional)

Resource Tags

  • Add metadata for organization
  • Enable cost tracking and allocation
  • Support automation and bulk operations

Tag Types

  • Defined Tags: Created by administrators with controlled vocabularies
  • Free-form Tags: Simple key-value pairs for flexible labeling

Finalizing Creation

Review Configuration

Before creating:

  • Verify all settings are correct
  • Check estimated monthly cost
  • Ensure network connectivity is properly configured
  • Confirm backup settings meet compliance requirements

Create DB System

Click Create to provision your high availability DB system.

Provisioning Timeline

  • DB system enters CREATING state
  • Typical creation time: 15-30 minutes for HA systems
  • Three instances are provisioned across availability/fault domains
  • Status changes to ACTIVE when ready

What Happens During Creation

  1. Primary instance is provisioned in preferred placement
  2. Two secondary instances created in different domains
  3. MySQL Group Replication is configured automatically
  4. Initial backup is taken (if enabled)
  5. Monitoring and health checks are activated

Understanding High Availability Limitations

While MySQL HeatWave high availability provides robust fault tolerance, it's important to understand its limitations to design systems appropriately.

1. Rolling Upgrades and Brief Downtime

Characteristic: High availability DB systems require rolling upgrades for maintenance, during which there is a brief period of downtime before the newly promoted MySQL instance resumes connections.

What This Means:

  • During maintenance, instances are upgraded one at a time
  • When the primary instance is upgraded, a failover occurs
  • Connection strings remain the same, but applications must reconnect
  • Downtime is typically measured in seconds, not minutes

Best Practice:

  • Implement connection retry logic in applications
  • Use connection pooling with automatic reconnection
  • Schedule maintenance during low-traffic periods
  • Test application behavior during failovers

2. MySQL Version Requirements

Limitation: High availability is only supported for MySQL version 8.0.24 or higher.

Implications:

  • Older MySQL versions cannot use HA features
  • Must upgrade to 8.0.24+ before enabling high availability
  • Consider upgrading to MySQL 8.4.x (LTS) or 9.x (Innovation) for latest features

Migration Path:

  1. Upgrade standalone DB system to 8.0.24 or higher
  2. Ensure all prerequisites are met (primary keys, etc.)
  3. Enable high availability after upgrade completes

3. Direct Access Limitations

Limitation: You can only access the primary MySQL instance directly; the two secondary instances cannot be accessed using MySQL Shell or any other client.

Why This Matters:

  • Cannot distribute read load across secondary instances for HA systems
  • All connections go through the primary instance
  • Read replicas are separate feature for read scaling (not part of HA)

Alternative Solutions:

  • Use Read Replicas for read scaling (separate from HA)
  • Read replicas can be created on HA or standalone systems
  • Offload analytics queries to HeatWave cluster instead

4. Transaction Size Limits Based on Shape Memory

Limitation: The maximum transaction size depends on the amount of memory of the shape.

Transaction Size Limits by Shape:

  • Smaller shapes have lower transaction size limits
  • Larger transactions require more memory for Group Replication
  • Exceeding limits causes transaction failure

Example Limits (approximate):

  • MySQL.8 (8 ECPU, 128 GB RAM): ~2 GB transaction limit
  • MySQL.16 (16 ECPU, 256 GB RAM): ~4 GB transaction limit
  • MySQL.32 (32 ECPU, 512 GB RAM): ~8 GB transaction limit

Workarounds:

  • Break large transactions into smaller batches
  • Use bulk loading features with smaller commit sizes
  • Select appropriate shape for your transaction patterns
  • Consider upgrading shape if hitting limits regularly

5. HeatWave Cluster Operations and HA Changes

Limitation: If a HeatWave cluster load or reload operation is ongoing, enabling or disabling high availability can fail.

Why This Happens:

  • HeatWave cluster operations lock resources during data loading
  • HA changes require cluster reconfiguration
  • Concurrent operations can cause conflicts

Best Practice:

  • Wait for HeatWave operations to complete before HA changes
  • Monitor HeatWave operation status
  • Schedule HA changes during maintenance windows
  • Avoid simultaneous infrastructure and data operations

6. HeatWave Lakehouse Compatibility

Previous Limitation: Prior to MySQL 8.3.0-u2, you could not enable high availability on a DB system with HeatWave Lakehouse enabled.

Current Status: This limitation has been lifted in MySQL 8.3.0-u2 and later versions.

Upgrade Path:

  • Systems on older versions must upgrade to use HA with Lakehouse
  • After upgrade, both features can coexist
  • On OCI, full support is available
  • On AWS, HeatWave features remain unsupported on HA systems

7. Platform-Specific Limitations

MySQL HeatWave on AWS

On AWS, MySQL HeatWave Clusters and features requiring a cluster (MySQL HeatWave Lakehouse, GenAI, AutoML) are not currently supported on high availability DB systems.

Impact:

  • HA systems on AWS are limited to OLTP workloads
  • Analytics features require standalone systems
  • Must choose between HA and advanced analytics on AWS

Query PrivateLink Considerations

When using Query PrivateLink endpoints, failovers and switchovers take longer than on public endpoints.

Mitigation:

  • Set connect_timeout to small value (e.g., 5 seconds)
  • Implement aggressive connection retry logic
  • Monitor failover times
  • Consider public endpoints for latency-sensitive applications

8. Configuration and Infrastructure Limitations

Bulk Ingest: The Bulk Ingest feature is not supported on high availability DB systems.

Alternative: Use standard data loading methods like LOAD DATA INFILE or MySQL Shell utilities.

PrivateLink Management: High availability cannot be enabled or disabled on a DB system with PrivateLink(s) configured.

Workaround: Delete PrivateLinks, make HA changes, then recreate PrivateLinks.

9. Rolling Restart for Configuration Changes

Behavior: When updating configuration of a high availability DB system that changes only dynamic variables, a rolling restart is performed, which shortens downtime and causes a change in current primary placement.

What Happens:

  1. Configuration updated on primary instance
  2. Controlled failover to secondary instance
  3. Original primary updated and becomes secondary
  4. System automatically rebalances if needed
  5. HeatWave data reloaded automatically if cluster attached

Implications:

  • Brief connection interruption during rolling restart
  • Current placement may differ from preferred placement
  • Applications must handle reconnections
  • HeatWave queries may be briefly unavailable during reload

Best Practices for HA DB Systems

1. Application Design

Connection Management

  • Implement exponential backoff for reconnection attempts
  • Use connection pooling with health checks
  • Set appropriate timeout values
  • Handle transient errors gracefully

Transaction Design

  • Keep transactions small and fast
  • Avoid long-running transactions that span minutes
  • Break large data modifications into batches
  • Monitor transaction sizes against shape limits

2. Monitoring and Alerting

Key Metrics to Monitor

  • Replication lag between primary and secondary instances
  • Storage utilization and growth trends
  • Transaction throughput and latency
  • Failover events and frequency
  • Backup success rates

Set Up Alerts For

  • Failover events
  • High replication lag
  • Storage approaching limits
  • Backup failures
  • Configuration change events

3. Testing and Validation

Regular Testing

  • Perform manual switchovers during maintenance windows
  • Test application behavior during failovers
  • Validate backup and restore procedures
  • Verify monitoring and alerting systems

Chaos Engineering

  • Simulate instance failures
  • Test network partition scenarios
  • Validate automatic recovery mechanisms
  • Document recovery procedures

4. Maintenance Planning

Scheduling

  • Plan configuration changes during low-traffic periods
  • Coordinate with application teams for maintenance windows
  • Avoid changes during critical business periods
  • Document rollback procedures

Pre-Change Validation

  • Test configuration changes in non-production first
  • Verify compatibility with HA requirements
  • Ensure backups are current
  • Have rollback plan ready

5. Cost Optimization

Resource Planning

  • Start with appropriate shape size
  • Monitor utilization with Autopilot Shape Advisor
  • Adjust shapes based on actual workload patterns
  • Balance performance needs with cost constraints

Backup Management

  • Set retention periods based on compliance requirements
  • Use regional backup copies only when needed
  • Enable backup validation selectively
  • Clean up old backups per policy

When NOT to Use High Availability

While high availability is excellent for many use cases, it may not be appropriate when:

  1. Development/Testing: Standalone systems are more cost-effective
  2. Analytics-Heavy on AWS: HA doesn't support HeatWave features on AWS
  3. Very Large Transactions: Shape memory limits may be restrictive
  4. Budget Constraints: HA requires 3x compute resources vs standalone
  5. Read-Heavy Workloads: Read replicas might be more appropriate than HA

Alternative High Availability Strategies

If MySQL HeatWave HA limitations are too restrictive:

Read Replicas for Read Scaling

  • Distribute read traffic across multiple instances
  • Works with both standalone and HA systems
  • Up to 10 read replicas per DB system
  • Better for read-heavy workloads

Multi-Region Deployment

  • Deploy separate DB systems in different regions
  • Use application-level failover logic
  • Replicate data using MySQL replication channels
  • Provides geographic redundancy

Hybrid Architectures

  • Use HA for OLTP workloads
  • Separate standalone system with HeatWave for analytics (AWS)
  • Replicate data between systems using channels
  • Best of both worlds but adds complexity

Conclusion

MySQL HeatWave high availability provides enterprise-grade fault tolerance with automatic failover, data protection, and minimal downtime. Understanding the limitations—particularly around rolling upgrades, transaction sizes, and platform-specific constraints—is essential for designing robust database architectures.

By following best practices for application design, monitoring, and maintenance, you can build resilient systems that meet stringent uptime requirements while staying within the boundaries of high availability limitations.

The key is matching your requirements to the capabilities: if you need maximum uptime for OLTP workloads with automatic failover, MySQL HeatWave HA is an excellent choice. If you need advanced analytics features on AWS or have very specific requirements, consider alternative architectures that better fit your needs.

Additional Resources:

Top comments (0)