DEV Community

Kenji Tomita
Kenji Tomita

Posted on

Learning About SWAR

Introduction

Digit-wise addition and carry detection are typically performed one digit at a time, with individual checks for carry.
However, with SWAR(SIMD Within A Register) techniques, it's possible to process multiple digits at once.

※ All code examples below are written in Rust.

What is SWAR?

SWAR(SIMD Within A Register) is a technique that allows the same operation to be performed simultaneously on multiple data within a single register.

Example:

  • A 32-bit value is treated as four 8-bit(byte) values
  • Each byte is added simultaneously

Comparison of SWAR and Conventional Addition

Conventional Addition

fn main() {
    let a = [8, 6, 7, 4];
    let b = [4, 3, 2, 3];
    let mut carry = [false; 4];

    for i in 0..4 {
        let sum = a[i] + b[i];
        if sum >= 10 {
            carry[i] = true;
        }
    }

    println!("{:?}", carry); // [true, false, false, false]
}
Enter fullscreen mode Exit fullscreen mode

In this case, the sums of each element are [12, 9, 9, 7], so carry becomes [true, false, false, false].
The carry array stores whether a carry(sum of 10 or more) occurred in each digit addition.

Addition Using SWAR

fn main() {
    let a: i64 = 0x04070608;
    let b: i64 = 0x03020304;
    let sum = a + b;
    let adjusted = sum + 0xF6F6F6F6;
    let carry = (!adjusted) & 0x80808080;

    println!("0x{:08X}", carry); // 0x00808080
}
Enter fullscreen mode Exit fullscreen mode
  • No loop
  • No branching
  • Bitwise operations only

carry = 0x00808080 means that a carry occurred in the 1st, 2nd, and 3rd bytes (from LSB). The MSB (most significant bit) of each byte is set if a carry occurred in that byte during addition.

Note: Why use numbers like 0x04070608?

This is a packed 32-bit integer representation of the following byte array:

let a = [8, 6, 7, 4];
Enter fullscreen mode Exit fullscreen mode

0x04070608 is a 32-bit integer encoding the byte array [8, 6, 7, 4] in little-endian order. This allows SWAR to handle it as a single integer, enabling faster processing.

Why Add 0xF6?

For digit-wise addition, a carry occurs if the result is 10 or more.
Adding 246 (0xF6) ensures:

  • x + 246 > 255 only when x is 10 or more
  • If the sum exceeds 255, the MSB(Most Significant Bit) becomes 0, allowing carry detection with & 0x80 == 0

Why is it Fast?

Comparison

Operation Instruction Count Branching
x == 10 Multiple Yes
x & 0x80 Single No
  • No branching or loops = Better for CPU pipelines
  • Multiple digits processed at once = Fewer loop iterations

Actual Code

Here's an example of checking carry per byte using bitwise operations:

fn main() {
    let a = 5;
    let b = 7;
    let (result, carry) = add_digits_with_carry(a, b);
    println!("Result: {}, Carry: {}", result, carry); // Result: 2, Carry: true
}

fn is_carry(x: u16) -> bool {
    (x + 246) & 0x80 == 0
}

fn add_digits_with_carry(a: u16, b: u16) -> (u16, bool) {
    let sum = a + b;
    let carry = is_carry(sum);
    (sum % 10, carry)
}
Enter fullscreen mode Exit fullscreen mode

This code defines a function for adding single digits with carry detection. The is_carry function determines carry presence without branching by using bitwise operations.

What is bcmath?

bcmath is a high-precision math library included with PHP. It handles arbitrary-precision integers and decimals instead of floating-point numbers.
This enables accurate calculations in cases where precision is critical, such as financial computations and cryptocurrency transactions.

Use of SWAR-like Techniques in bcmath

The source code for PHP's bcmach (doaddsub.c) includes the following snippet:

n1bytes += SWAR_REPEAT(0xF6) + n2bytes + carry;
carry = !(n1bytes & ((BC_VECTOR) 1 << (8 * sizeof(BC_VECTOR) - 1)));
BC_VECTOR sum_mask = ((n1bytes & SWAR_REPEAT(0x80)) >> 7) * 0xF6;
n1bytes -= sum_mask;
Enter fullscreen mode Exit fullscreen mode

By adding 0xF6 to each byte to detect carry and then correcting the result, this is an example of a SWAR-like technique.
bcmath performs arbitrary-precision arithmetic by operating on multi-byte arrays internally. Form this, we can see that SWAR-like bit manipulation is used in parts of its implementation.

Conclusion

SWAR stands out in its ability to avoid loops and branches by handing multiple digit calculations simultaneously using bitwise operations.

Top comments (0)