Ahmed Rakan

Posted on Oct 20

Premiere on NewSQL for Professional Software Engineers

#webdev #database #programming

Introduction: The Database Evolution

For decades, software engineers have faced a fundamental trade-off in database selection: choose traditional SQL databases for ACID guarantees and relational integrity, or opt for NoSQL solutions to achieve horizontal scalability and high performance. NewSQL databases emerged to challenge this dichotomy, promising the best of both worlds—the consistency and transactional guarantees of SQL with the scalability and performance characteristics of NoSQL.

What is NewSQL?

NewSQL is a class of modern database management systems that attempts to bridge the gap between traditional RDBMS and NoSQL databases. The key defining characteristics include:

Relational data model: Full SQL support with tables, schemas, and relationships
ACID compliance: Strong consistency guarantees across distributed systems
Horizontal scalability: Ability to scale out by adding commodity hardware
High performance: Optimized for modern hardware and distributed architectures
Fault tolerance: Built-in replication and automatic failover mechanisms

Unlike NoSQL databases that sacrifice consistency for availability (following the CAP theorem), NewSQL systems employ sophisticated techniques to maintain both consistency and partition tolerance while maximizing availability.

The NewSQL Ecosystem

Major NewSQL Databases

Open Source & Self-Hosted:

VoltDB: In-memory database with strong consistency and partition-level serialization
CockroachDB: PostgreSQL-compatible, geo-distributed database with automatic rebalancing
TiDB: MySQL-compatible, horizontally scalable database with distributed SQL execution
NuoDB: Cloud-native database with elastic scalability

Cloud-Native Solutions:

Google Spanner: Globally distributed database with external consistency
Amazon Aurora: MySQL and PostgreSQL-compatible with up to 5x performance improvements
MySQL Cluster: High-availability, real-time database with auto-sharding

Partitioning in NewSQL

Partitioning (or sharding) is fundamental to NewSQL's scalability promise. Understanding how data is distributed is crucial for optimal performance.

Partitioning Strategy

Most NewSQL databases implement hash-based partitioning with these characteristics:

Based on a single column: The partition key determines data distribution
Primary key requirement: The partition column must be the primary key or part of a composite primary key
Even distribution: Hash functions ensure balanced data across nodes

Why This Matters:

Performance: Queries targeting a single partition execute locally without cross-node coordination
Scalability: Adding nodes redistributes data automatically, maintaining balance
Predictability: Developers can design schemas to optimize data locality

In this example, all orders for a customer reside on the same partition, enabling fast single-partition transactions for customer-specific queries.

Consistency Model: The VoltDB Approach

VoltDB exemplifies NewSQL's innovative approach to consistency without sacrificing performance.

Serialized Execution at Partition Level

Traditional databases use locks to manage concurrent access, which introduces overhead and potential deadlocks. VoltDB takes a radically different approach:

Single-threaded execution per partition: Each partition processes transactions serially
Timestamp ordering: Transactions receive timestamps for global ordering
No locks required: Eliminates locking overhead entirely
Deadlock-free: Serial execution makes deadlocks architecturally impossible

Multi-Partition Transactions

When transactions span multiple partitions:

Partitions process their portion in parallel
Coordination occurs only at commit time
Two-phase commit ensures atomicity across partitions

This design maximizes parallelism while maintaining strict serializability—the strongest consistency model.

Consistency Trade-offs Comparison

Database Type	Consistency Model	Concurrency Control	Deadlocks
Traditional SQL	ACID, Serializable	Locks (2PL)	Possible
NoSQL	Eventual Consistency	Optimistic/None	N/A
NewSQL (VoltDB)	Strict Serializable	Timestamp Ordering	Impossible

NewSQL vs SQL vs NoSQL: When to Use What

Traditional SQL (PostgreSQL, MySQL)

Use when:

Data size fits comfortably on a single node (< 1TB typically)
Complex queries with joins across many tables
Strict transactional guarantees with moderate throughput requirements
Existing tooling and ORM integration is critical

Limitations:

Vertical scaling only (eventually hits hardware limits)
Replication adds complexity and potential consistency issues
Performance degrades with large datasets

NoSQL (MongoDB, Cassandra, Redis)

Use when:

Massive horizontal scalability is required (petabyte-scale)
Eventual consistency is acceptable
Data model is primarily key-value or document-oriented
Write-heavy workloads with simple query patterns
Schema flexibility is important

Limitations:

No native JOIN support
Complex transactions are difficult or impossible
Application-level consistency management required

NewSQL (VoltDB, CockroachDB, TiDB)

Use when:

Need horizontal scalability beyond single-server SQL
ACID guarantees are non-negotiable
High-throughput transactional workloads (financial, gaming, ad tech)
Real-time analytics on operational data
Global distribution with strong consistency

Limitations:

More complex operational overhead than traditional SQL
Some SQL features may be limited or optimized differently
Cross-partition transactions have performance cost
Newer ecosystem with fewer mature tools

Use Cases: Where NewSQL Shines

1. Financial Services

Requirements: Strict ACID compliance, high throughput, low latency
Example: Real-time fraud detection processing thousands of transactions per second
Why NewSQL: Cannot sacrifice consistency; traditional SQL can't scale to required throughput

2. Gaming and Betting

Requirements: Real-time leaderboards, inventory management, transaction processing
Example: Massively multiplayer online game with millions of concurrent users
Why NewSQL: Need ACID for in-game purchases; NoSQL too weak; SQL won't scale

3. Ad Tech and Analytics

Requirements: Real-time bidding, click tracking, campaign analytics
Example: Ad exchange processing billions of events daily with real-time attribution
Why NewSQL: Combines operational transactions with analytical queries on live data

4. IoT and Telemetry

Requirements: High ingest rates, time-series queries, device state management
Example: Smart city infrastructure with millions of sensors
Why NewSQL: Need transactional updates with complex analytical queries

5. E-commerce at Scale

Requirements: Inventory management, order processing, pricing consistency
Example: Global marketplace with complex pricing rules and inventory across regions
Why NewSQL: Traditional SQL can't handle scale; NoSQL can't guarantee inventory consistency

Hands-On: VoltDB with Python

Let's build a practical example to see NewSQL in action.

Setup VoltDB Locally

# Download VoltDB Community Edition
wget https://downloads.voltdb.com/technologies/server/voltdb-community-latest.tar.gz
tar -xzf voltdb-community-latest.tar.gz
cd voltdb-*

# Initialize and start single-node cluster
./bin/voltdb init
./bin/voltdb start

Python Integration

Install the VoltDB Python client:

pip install voltdb

Example: Real-Time Order Processing System

import voltdb
import time
from datetime import datetime

# Connect to VoltDB
client = voltdb.FastSerializer("localhost", 21212)

# Create schema
schema_ddl = """
CREATE TABLE customers (
    customer_id INTEGER NOT NULL,
    customer_name VARCHAR(100),
    balance DECIMAL(12,2),
    PRIMARY KEY (customer_id)
) PARTITION BY customer_id;

CREATE TABLE orders (
    customer_id INTEGER NOT NULL,
    order_id INTEGER NOT NULL,
    order_date TIMESTAMP,
    amount DECIMAL(10,2),
    status VARCHAR(20),
    PRIMARY KEY (customer_id, order_id)
) PARTITION BY customer_id;

CREATE INDEX idx_order_date ON orders (order_date);
"""

# Define stored procedure for atomic order placement
procedure_sql = """
CREATE PROCEDURE place_order
    PARTITION BY customer_id ON customers
    AS
BEGIN
    DECLARE available_balance DECIMAL;

    -- Check customer balance
    SELECT balance INTO available_balance
    FROM customers
    WHERE customer_id = ?;

    -- Verify sufficient funds
    IF available_balance >= ? THEN
        -- Deduct from balance
        UPDATE customers
        SET balance = balance - ?
        WHERE customer_id = ?;

        -- Insert order
        INSERT INTO orders VALUES (?, ?, NOW(), ?, 'CONFIRMED');

        RETURN 'SUCCESS';
    ELSE
        RETURN 'INSUFFICIENT_FUNDS';
    END IF;
END;
"""

def place_order(customer_id, order_id, amount):
    """
    Places an order atomically - either balance is updated and order created,
    or neither happens. Single partition transaction = maximum performance.
    """
    try:
        response = client.call_procedure(
            "place_order",
            [customer_id, amount, amount, customer_id,
             customer_id, order_id, amount]
        )
        return response.status
    except Exception as e:
        print(f"Transaction failed: {e}")
        return None

def get_customer_orders(customer_id):
    """
    Retrieve all orders for a customer.
    Single partition query = sub-millisecond response.
    """
    response = client.call_procedure(
        "ORDER_HISTORY.select",
        [customer_id]
    )
    return response.tables[0].tuples

# Performance test: High-throughput order processing
def benchmark_throughput():
    start_time = time.time()
    num_transactions = 10000

    for i in range(num_transactions):
        customer_id = i % 1000  # 1000 customers
        place_order(customer_id, i, 99.99)

    elapsed = time.time() - start_time
    tps = num_transactions / elapsed

    print(f"Processed {num_transactions} transactions in {elapsed:.2f}s")
    print(f"Throughput: {tps:.0f} TPS")

if __name__ == "__main__":
    benchmark_throughput()

Key Observations

Single-Partition Optimization: The place_order procedure operates on one partition (indicated by PARTITION BY clause), enabling maximum throughput
ACID Guarantees: Balance check and order insertion are atomic—no possibility of overselling
No Application Locks: VoltDB's architecture eliminates the need for explicit locking in application code
Performance: Expect 10,000+ TPS on modest hardware for single-partition transactions

Alternative NewSQL Options for Local Development

CockroachDB (PostgreSQL-Compatible)

# Install CockroachDB
wget -qO- https://binaries.cockroachdb.com/cockroach-latest.linux-amd64.tgz | tar xvz
sudo cp -i cockroach-latest.linux-amd64/cockroach /usr/local/bin/

# Start single-node cluster
cockroach start-single-node --insecure --listen-addr=localhost

# Connect with psycopg2 (PostgreSQL driver)
import psycopg2

conn = psycopg2.connect(
    host="localhost",
    port=26257,
    user="root",
    database="defaultdb"
)

Advantages:

Drop-in PostgreSQL replacement
Excellent documentation and community support
Strong consistency with automatic rebalancing

TiDB (MySQL-Compatible)

# Install TiUP (TiDB installer)
curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh

# Start local cluster
tiup playground

# Connect with MySQL connector
import mysql.connector

conn = mysql.connector.connect(
    host="127.0.0.1",
    port=4000,
    user="root"
)

Advantages:

MySQL compatibility (works with existing MySQL tools)
Horizontal scalability with HTAP capabilities
Mature ecosystem with commercial support

Design Patterns for NewSQL Success

1. Choose Partition Keys Wisely

Good: customer_id for e-commerce (queries naturally filter by customer)
Bad: timestamp (hot partitions, uneven distribution)

2. Minimize Cross-Partition Transactions

Design schemas so that most transactions touch a single partition. Cross-partition operations are significantly slower.

3. Leverage Stored Procedures

For VoltDB especially, stored procedures execute entirely in-database, eliminating network round-trips and maximizing throughput.

4. Denormalize Strategically

Unlike traditional SQL where normalization is king, NewSQL often benefits from denormalization to keep related data co-located on the same partition.

5. Monitor Partition Skew

Uneven data distribution kills performance. Monitor partition sizes and adjust partition keys if skew develops.

Operational Considerations

Monitoring and Observability

NewSQL databases require monitoring:

Partition distribution and balance
Cross-partition transaction percentage
Node health and replication lag
Query latency percentiles (p50, p95, p99)

Backup and Recovery

Most NewSQL systems provide:

Continuous backup with point-in-time recovery
Snapshot capabilities
Multi-region replication for disaster recovery

Cost Considerations

NewSQL databases typically cost more than traditional SQL due to:

Multiple nodes required for distributed architecture
Higher memory requirements (especially for in-memory systems like VoltDB)
More complex operational overhead

However, they're often cheaper than vertical scaling traditional databases to extreme sizes or building custom sharding layers.

Conclusion: Is NewSQL Right for Your Project?

NewSQL represents a significant architectural evolution, but it's not a silver bullet. Consider NewSQL when:

✅ You need it if:

Current SQL database is hitting scalability limits
ACID guarantees are business-critical
High-throughput transactional workload (> 10k TPS)
Growing beyond single-server capacity
Real-time analytics on transactional data

❌ Stick with traditional SQL if:

Data size < 500GB
Moderate throughput requirements (< 1k TPS)
Complex analytical queries with unpredictable patterns
Team lacks distributed systems expertise

❌ Choose NoSQL instead if:

Eventual consistency is acceptable
Simple key-value or document access patterns
Extreme scale (> 100TB)
Schema flexibility is paramount

NewSQL databases like VoltDB, CockroachDB, and TiDB offer compelling solutions for the growing number of applications that need both the scale of NoSQL and the guarantees of SQL. As these systems mature and tooling improves, they're becoming increasingly viable for professional software engineers building the next generation of data-intensive applications.

Resources for Further Learning

VoltDB Documentation: https://docs.voltdb.com/
CockroachDB University: Free courses on distributed SQL
TiDB Documentation: https://docs.pingcap.com/
Google Spanner Paper: Original research on globally distributed databases
"Designing Data-Intensive Applications" by Martin Kleppmann: Essential reading on database architecture

Have you implemented NewSQL in production? What challenges did you face? Share your experiences in the comments below.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.