Kenji Tomita

Posted on Jul 29

Learning About SWAR

Introduction

Digit-wise addition and carry detection are typically performed one digit at a time, with individual checks for carry.
However, with SWAR(SIMD Within A Register) techniques, it's possible to process multiple digits at once.

※ All code examples below are written in Rust.

What is SWAR?

SWAR(SIMD Within A Register) is a technique that allows the same operation to be performed simultaneously on multiple data within a single register.

Example:

A 32-bit value is treated as four 8-bit(byte) values
Each byte is added simultaneously

Comparison of SWAR and Conventional Addition

Conventional Addition

fn main() {
    let a = [8, 6, 7, 4];
    let b = [4, 3, 2, 3];
    let mut carry = [false; 4];

    for i in 0..4 {
        let sum = a[i] + b[i];
        if sum >= 10 {
            carry[i] = true;
        }
    }

    println!("{:?}", carry); // [true, false, false, false]
}

In this case, the sums of each element are [12, 9, 9, 7], so carry becomes [true, false, false, false].
The carry array stores whether a carry(sum of 10 or more) occurred in each digit addition.

Addition Using SWAR

fn main() {
    let a: i64 = 0x04070608;
    let b: i64 = 0x03020304;
    let sum = a + b;
    let adjusted = sum + 0xF6F6F6F6;
    let carry = (!adjusted) & 0x80808080;

    println!("0x{:08X}", carry); // 0x00808080
}

No loop
No branching
Bitwise operations only

carry = 0x00808080 means that a carry occurred in the 1st, 2nd, and 3rd bytes (from LSB). The MSB (most significant bit) of each byte is set if a carry occurred in that byte during addition.

Note: Why use numbers like 0x04070608?

This is a packed 32-bit integer representation of the following byte array:

let a = [8, 6, 7, 4];

0x04070608 is a 32-bit integer encoding the byte array [8, 6, 7, 4] in little-endian order. This allows SWAR to handle it as a single integer, enabling faster processing.

Why Add 0xF6?

For digit-wise addition, a carry occurs if the result is 10 or more.
Adding 246 (0xF6) ensures:

x + 246 > 255 only when x is 10 or more
If the sum exceeds 255, the MSB(Most Significant Bit) becomes 0, allowing carry detection with & 0x80 == 0

Why is it Fast?

Comparison

Operation	Instruction Count	Branching
`x == 10`	Multiple	Yes
`x & 0x80`	Single	No

No branching or loops = Better for CPU pipelines
Multiple digits processed at once = Fewer loop iterations

Actual Code

Here's an example of checking carry per byte using bitwise operations:

fn main() {
    let a = 5;
    let b = 7;
    let (result, carry) = add_digits_with_carry(a, b);
    println!("Result: {}, Carry: {}", result, carry); // Result: 2, Carry: true
}

fn is_carry(x: u16) -> bool {
    (x + 246) & 0x80 == 0
}

fn add_digits_with_carry(a: u16, b: u16) -> (u16, bool) {
    let sum = a + b;
    let carry = is_carry(sum);
    (sum % 10, carry)
}

This code defines a function for adding single digits with carry detection. The is_carry function determines carry presence without branching by using bitwise operations.

What is bcmath?

bcmath is a high-precision math library included with PHP. It handles arbitrary-precision integers and decimals instead of floating-point numbers.
This enables accurate calculations in cases where precision is critical, such as financial computations and cryptocurrency transactions.

Use of SWAR-like Techniques in bcmath

The source code for PHP's bcmach (doaddsub.c) includes the following snippet:

n1bytes += SWAR_REPEAT(0xF6) + n2bytes + carry;
carry = !(n1bytes & ((BC_VECTOR) 1 << (8 * sizeof(BC_VECTOR) - 1)));
BC_VECTOR sum_mask = ((n1bytes & SWAR_REPEAT(0x80)) >> 7) * 0xF6;
n1bytes -= sum_mask;

By adding 0xF6 to each byte to detect carry and then correcting the result, this is an example of a SWAR-like technique.
bcmath performs arbitrary-precision arithmetic by operating on multi-byte arrays internally. Form this, we can see that SWAR-like bit manipulation is used in parts of its implementation.

Conclusion

SWAR stands out in its ability to avoid loops and branches by handing multiple digit calculations simultaneously using bitwise operations.

DEV Community