DEV Community

Cover image for Premiere on NewSQL for Professional Software Engineers
Ahmed Rakan
Ahmed Rakan

Posted on

Premiere on NewSQL for Professional Software Engineers

Introduction: The Database Evolution

For decades, software engineers have faced a fundamental trade-off in database selection: choose traditional SQL databases for ACID guarantees and relational integrity, or opt for NoSQL solutions to achieve horizontal scalability and high performance. NewSQL databases emerged to challenge this dichotomy, promising the best of both worlds—the consistency and transactional guarantees of SQL with the scalability and performance characteristics of NoSQL.

What is NewSQL?

NewSQL is a class of modern database management systems that attempts to bridge the gap between traditional RDBMS and NoSQL databases. The key defining characteristics include:

  • Relational data model: Full SQL support with tables, schemas, and relationships
  • ACID compliance: Strong consistency guarantees across distributed systems
  • Horizontal scalability: Ability to scale out by adding commodity hardware
  • High performance: Optimized for modern hardware and distributed architectures
  • Fault tolerance: Built-in replication and automatic failover mechanisms

Unlike NoSQL databases that sacrifice consistency for availability (following the CAP theorem), NewSQL systems employ sophisticated techniques to maintain both consistency and partition tolerance while maximizing availability.

The NewSQL Ecosystem

Major NewSQL Databases

Open Source & Self-Hosted:

  • VoltDB: In-memory database with strong consistency and partition-level serialization
  • CockroachDB: PostgreSQL-compatible, geo-distributed database with automatic rebalancing
  • TiDB: MySQL-compatible, horizontally scalable database with distributed SQL execution
  • NuoDB: Cloud-native database with elastic scalability

Cloud-Native Solutions:

  • Google Spanner: Globally distributed database with external consistency
  • Amazon Aurora: MySQL and PostgreSQL-compatible with up to 5x performance improvements
  • MySQL Cluster: High-availability, real-time database with auto-sharding

Partitioning in NewSQL

Partitioning (or sharding) is fundamental to NewSQL's scalability promise. Understanding how data is distributed is crucial for optimal performance.

Partitioning Strategy

Most NewSQL databases implement hash-based partitioning with these characteristics:

  • Based on a single column: The partition key determines data distribution
  • Primary key requirement: The partition column must be the primary key or part of a composite primary key
  • Even distribution: Hash functions ensure balanced data across nodes

Why This Matters:

  • Performance: Queries targeting a single partition execute locally without cross-node coordination
  • Scalability: Adding nodes redistributes data automatically, maintaining balance
  • Predictability: Developers can design schemas to optimize data locality

In this example, all orders for a customer reside on the same partition, enabling fast single-partition transactions for customer-specific queries.

Consistency Model: The VoltDB Approach

VoltDB exemplifies NewSQL's innovative approach to consistency without sacrificing performance.

Serialized Execution at Partition Level

Traditional databases use locks to manage concurrent access, which introduces overhead and potential deadlocks. VoltDB takes a radically different approach:

  • Single-threaded execution per partition: Each partition processes transactions serially
  • Timestamp ordering: Transactions receive timestamps for global ordering
  • No locks required: Eliminates locking overhead entirely
  • Deadlock-free: Serial execution makes deadlocks architecturally impossible

Multi-Partition Transactions

When transactions span multiple partitions:

  • Partitions process their portion in parallel
  • Coordination occurs only at commit time
  • Two-phase commit ensures atomicity across partitions

This design maximizes parallelism while maintaining strict serializability—the strongest consistency model.

Consistency Trade-offs Comparison

Database Type Consistency Model Concurrency Control Deadlocks
Traditional SQL ACID, Serializable Locks (2PL) Possible
NoSQL Eventual Consistency Optimistic/None N/A
NewSQL (VoltDB) Strict Serializable Timestamp Ordering Impossible

NewSQL vs SQL vs NoSQL: When to Use What

Traditional SQL (PostgreSQL, MySQL)

Use when:

  • Data size fits comfortably on a single node (< 1TB typically)
  • Complex queries with joins across many tables
  • Strict transactional guarantees with moderate throughput requirements
  • Existing tooling and ORM integration is critical

Limitations:

  • Vertical scaling only (eventually hits hardware limits)
  • Replication adds complexity and potential consistency issues
  • Performance degrades with large datasets

NoSQL (MongoDB, Cassandra, Redis)

Use when:

  • Massive horizontal scalability is required (petabyte-scale)
  • Eventual consistency is acceptable
  • Data model is primarily key-value or document-oriented
  • Write-heavy workloads with simple query patterns
  • Schema flexibility is important

Limitations:

  • No native JOIN support
  • Complex transactions are difficult or impossible
  • Application-level consistency management required

NewSQL (VoltDB, CockroachDB, TiDB)

Use when:

  • Need horizontal scalability beyond single-server SQL
  • ACID guarantees are non-negotiable
  • High-throughput transactional workloads (financial, gaming, ad tech)
  • Real-time analytics on operational data
  • Global distribution with strong consistency

Limitations:

  • More complex operational overhead than traditional SQL
  • Some SQL features may be limited or optimized differently
  • Cross-partition transactions have performance cost
  • Newer ecosystem with fewer mature tools

Use Cases: Where NewSQL Shines

1. Financial Services

  • Requirements: Strict ACID compliance, high throughput, low latency
  • Example: Real-time fraud detection processing thousands of transactions per second
  • Why NewSQL: Cannot sacrifice consistency; traditional SQL can't scale to required throughput

2. Gaming and Betting

  • Requirements: Real-time leaderboards, inventory management, transaction processing
  • Example: Massively multiplayer online game with millions of concurrent users
  • Why NewSQL: Need ACID for in-game purchases; NoSQL too weak; SQL won't scale

3. Ad Tech and Analytics

  • Requirements: Real-time bidding, click tracking, campaign analytics
  • Example: Ad exchange processing billions of events daily with real-time attribution
  • Why NewSQL: Combines operational transactions with analytical queries on live data

4. IoT and Telemetry

  • Requirements: High ingest rates, time-series queries, device state management
  • Example: Smart city infrastructure with millions of sensors
  • Why NewSQL: Need transactional updates with complex analytical queries

5. E-commerce at Scale

  • Requirements: Inventory management, order processing, pricing consistency
  • Example: Global marketplace with complex pricing rules and inventory across regions
  • Why NewSQL: Traditional SQL can't handle scale; NoSQL can't guarantee inventory consistency

Hands-On: VoltDB with Python

Let's build a practical example to see NewSQL in action.

Setup VoltDB Locally

# Download VoltDB Community Edition
wget https://downloads.voltdb.com/technologies/server/voltdb-community-latest.tar.gz
tar -xzf voltdb-community-latest.tar.gz
cd voltdb-*

# Initialize and start single-node cluster
./bin/voltdb init
./bin/voltdb start

Enter fullscreen mode Exit fullscreen mode

Python Integration

Install the VoltDB Python client:

pip install voltdb

Enter fullscreen mode Exit fullscreen mode

Example: Real-Time Order Processing System

import voltdb
import time
from datetime import datetime

# Connect to VoltDB
client = voltdb.FastSerializer("localhost", 21212)

# Create schema
schema_ddl = """
CREATE TABLE customers (
    customer_id INTEGER NOT NULL,
    customer_name VARCHAR(100),
    balance DECIMAL(12,2),
    PRIMARY KEY (customer_id)
) PARTITION BY customer_id;

CREATE TABLE orders (
    customer_id INTEGER NOT NULL,
    order_id INTEGER NOT NULL,
    order_date TIMESTAMP,
    amount DECIMAL(10,2),
    status VARCHAR(20),
    PRIMARY KEY (customer_id, order_id)
) PARTITION BY customer_id;

CREATE INDEX idx_order_date ON orders (order_date);
"""

# Define stored procedure for atomic order placement
procedure_sql = """
CREATE PROCEDURE place_order
    PARTITION BY customer_id ON customers
    AS
BEGIN
    DECLARE available_balance DECIMAL;

    -- Check customer balance
    SELECT balance INTO available_balance
    FROM customers
    WHERE customer_id = ?;

    -- Verify sufficient funds
    IF available_balance >= ? THEN
        -- Deduct from balance
        UPDATE customers
        SET balance = balance - ?
        WHERE customer_id = ?;

        -- Insert order
        INSERT INTO orders VALUES (?, ?, NOW(), ?, 'CONFIRMED');

        RETURN 'SUCCESS';
    ELSE
        RETURN 'INSUFFICIENT_FUNDS';
    END IF;
END;
"""

def place_order(customer_id, order_id, amount):
    """
    Places an order atomically - either balance is updated and order created,
    or neither happens. Single partition transaction = maximum performance.
    """
    try:
        response = client.call_procedure(
            "place_order",
            [customer_id, amount, amount, customer_id,
             customer_id, order_id, amount]
        )
        return response.status
    except Exception as e:
        print(f"Transaction failed: {e}")
        return None

def get_customer_orders(customer_id):
    """
    Retrieve all orders for a customer.
    Single partition query = sub-millisecond response.
    """
    response = client.call_procedure(
        "ORDER_HISTORY.select",
        [customer_id]
    )
    return response.tables[0].tuples

# Performance test: High-throughput order processing
def benchmark_throughput():
    start_time = time.time()
    num_transactions = 10000

    for i in range(num_transactions):
        customer_id = i % 1000  # 1000 customers
        place_order(customer_id, i, 99.99)

    elapsed = time.time() - start_time
    tps = num_transactions / elapsed

    print(f"Processed {num_transactions} transactions in {elapsed:.2f}s")
    print(f"Throughput: {tps:.0f} TPS")

if __name__ == "__main__":
    benchmark_throughput()

Enter fullscreen mode Exit fullscreen mode

Key Observations

  1. Single-Partition Optimization: The place_order procedure operates on one partition (indicated by PARTITION BY clause), enabling maximum throughput
  2. ACID Guarantees: Balance check and order insertion are atomic—no possibility of overselling
  3. No Application Locks: VoltDB's architecture eliminates the need for explicit locking in application code
  4. Performance: Expect 10,000+ TPS on modest hardware for single-partition transactions

Alternative NewSQL Options for Local Development

CockroachDB (PostgreSQL-Compatible)

# Install CockroachDB
wget -qO- https://binaries.cockroachdb.com/cockroach-latest.linux-amd64.tgz | tar xvz
sudo cp -i cockroach-latest.linux-amd64/cockroach /usr/local/bin/

# Start single-node cluster
cockroach start-single-node --insecure --listen-addr=localhost

# Connect with psycopg2 (PostgreSQL driver)
import psycopg2

conn = psycopg2.connect(
    host="localhost",
    port=26257,
    user="root",
    database="defaultdb"
)

Enter fullscreen mode Exit fullscreen mode

Advantages:

  • Drop-in PostgreSQL replacement
  • Excellent documentation and community support
  • Strong consistency with automatic rebalancing

TiDB (MySQL-Compatible)

# Install TiUP (TiDB installer)
curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh

# Start local cluster
tiup playground

# Connect with MySQL connector
import mysql.connector

conn = mysql.connector.connect(
    host="127.0.0.1",
    port=4000,
    user="root"
)

Enter fullscreen mode Exit fullscreen mode

Advantages:

  • MySQL compatibility (works with existing MySQL tools)
  • Horizontal scalability with HTAP capabilities
  • Mature ecosystem with commercial support

Design Patterns for NewSQL Success

1. Choose Partition Keys Wisely

Good: customer_id for e-commerce (queries naturally filter by customer)
Bad: timestamp (hot partitions, uneven distribution)

2. Minimize Cross-Partition Transactions

Design schemas so that most transactions touch a single partition. Cross-partition operations are significantly slower.

3. Leverage Stored Procedures

For VoltDB especially, stored procedures execute entirely in-database, eliminating network round-trips and maximizing throughput.

4. Denormalize Strategically

Unlike traditional SQL where normalization is king, NewSQL often benefits from denormalization to keep related data co-located on the same partition.

5. Monitor Partition Skew

Uneven data distribution kills performance. Monitor partition sizes and adjust partition keys if skew develops.

Operational Considerations

Monitoring and Observability

NewSQL databases require monitoring:

  • Partition distribution and balance
  • Cross-partition transaction percentage
  • Node health and replication lag
  • Query latency percentiles (p50, p95, p99)

Backup and Recovery

Most NewSQL systems provide:

  • Continuous backup with point-in-time recovery
  • Snapshot capabilities
  • Multi-region replication for disaster recovery

Cost Considerations

NewSQL databases typically cost more than traditional SQL due to:

  • Multiple nodes required for distributed architecture
  • Higher memory requirements (especially for in-memory systems like VoltDB)
  • More complex operational overhead

However, they're often cheaper than vertical scaling traditional databases to extreme sizes or building custom sharding layers.

Conclusion: Is NewSQL Right for Your Project?

NewSQL represents a significant architectural evolution, but it's not a silver bullet. Consider NewSQL when:

You need it if:

  • Current SQL database is hitting scalability limits
  • ACID guarantees are business-critical
  • High-throughput transactional workload (> 10k TPS)
  • Growing beyond single-server capacity
  • Real-time analytics on transactional data

Stick with traditional SQL if:

  • Data size < 500GB
  • Moderate throughput requirements (< 1k TPS)
  • Complex analytical queries with unpredictable patterns
  • Team lacks distributed systems expertise

Choose NoSQL instead if:

  • Eventual consistency is acceptable
  • Simple key-value or document access patterns
  • Extreme scale (> 100TB)
  • Schema flexibility is paramount

NewSQL databases like VoltDB, CockroachDB, and TiDB offer compelling solutions for the growing number of applications that need both the scale of NoSQL and the guarantees of SQL. As these systems mature and tooling improves, they're becoming increasingly viable for professional software engineers building the next generation of data-intensive applications.

Resources for Further Learning

  • VoltDB Documentation: https://docs.voltdb.com/
  • CockroachDB University: Free courses on distributed SQL
  • TiDB Documentation: https://docs.pingcap.com/
  • Google Spanner Paper: Original research on globally distributed databases
  • "Designing Data-Intensive Applications" by Martin Kleppmann: Essential reading on database architecture

Have you implemented NewSQL in production? What challenges did you face? Share your experiences in the comments below.

Top comments (0)