Introduction
Digit-wise addition and carry detection are typically performed one digit at a time, with individual checks for carry.
However, with SWAR(SIMD Within A Register) techniques, it's possible to process multiple digits at once.
※ All code examples below are written in Rust.
What is SWAR?
SWAR(SIMD Within A Register) is a technique that allows the same operation to be performed simultaneously on multiple data within a single register.
Example:
- A 32-bit value is treated as four 8-bit(byte) values
- Each byte is added simultaneously
Comparison of SWAR and Conventional Addition
Conventional Addition
fn main() {
let a = [8, 6, 7, 4];
let b = [4, 3, 2, 3];
let mut carry = [false; 4];
for i in 0..4 {
let sum = a[i] + b[i];
if sum >= 10 {
carry[i] = true;
}
}
println!("{:?}", carry); // [true, false, false, false]
}
In this case, the sums of each element are [12, 9, 9, 7], so carry becomes [true, false, false, false].
The carry array stores whether a carry(sum of 10 or more) occurred in each digit addition.
Addition Using SWAR
fn main() {
let a: i64 = 0x04070608;
let b: i64 = 0x03020304;
let sum = a + b;
let adjusted = sum + 0xF6F6F6F6;
let carry = (!adjusted) & 0x80808080;
println!("0x{:08X}", carry); // 0x00808080
}
- No loop
- No branching
- Bitwise operations only
carry = 0x00808080 means that a carry occurred in the 1st, 2nd, and 3rd bytes (from LSB). The MSB (most significant bit) of each byte is set if a carry occurred in that byte during addition.
Note: Why use numbers like 0x04070608?
This is a packed 32-bit integer representation of the following byte array:
let a = [8, 6, 7, 4];
0x04070608 is a 32-bit integer encoding the byte array [8, 6, 7, 4] in little-endian order. This allows SWAR to handle it as a single integer, enabling faster processing.
Why Add 0xF6?
For digit-wise addition, a carry occurs if the result is 10 or more.
Adding 246 (0xF6) ensures:
-
x + 246 > 255
only when x is 10 or more - If the sum exceeds 255, the MSB(Most Significant Bit) becomes 0, allowing carry detection with
& 0x80 == 0
Why is it Fast?
Comparison
Operation | Instruction Count | Branching |
---|---|---|
x == 10 |
Multiple | Yes |
x & 0x80 |
Single | No |
- No branching or loops = Better for CPU pipelines
- Multiple digits processed at once = Fewer loop iterations
Actual Code
Here's an example of checking carry per byte using bitwise operations:
fn main() {
let a = 5;
let b = 7;
let (result, carry) = add_digits_with_carry(a, b);
println!("Result: {}, Carry: {}", result, carry); // Result: 2, Carry: true
}
fn is_carry(x: u16) -> bool {
(x + 246) & 0x80 == 0
}
fn add_digits_with_carry(a: u16, b: u16) -> (u16, bool) {
let sum = a + b;
let carry = is_carry(sum);
(sum % 10, carry)
}
This code defines a function for adding single digits with carry detection. The is_carry function determines carry presence without branching by using bitwise operations.
What is bcmath?
bcmath is a high-precision math library included with PHP. It handles arbitrary-precision integers and decimals instead of floating-point numbers.
This enables accurate calculations in cases where precision is critical, such as financial computations and cryptocurrency transactions.
Use of SWAR-like Techniques in bcmath
The source code for PHP's bcmach (doaddsub.c) includes the following snippet:
n1bytes += SWAR_REPEAT(0xF6) + n2bytes + carry;
carry = !(n1bytes & ((BC_VECTOR) 1 << (8 * sizeof(BC_VECTOR) - 1)));
BC_VECTOR sum_mask = ((n1bytes & SWAR_REPEAT(0x80)) >> 7) * 0xF6;
n1bytes -= sum_mask;
By adding 0xF6 to each byte to detect carry and then correcting the result, this is an example of a SWAR-like technique.
bcmath performs arbitrary-precision arithmetic by operating on multi-byte arrays internally. Form this, we can see that SWAR-like bit manipulation is used in parts of its implementation.
Conclusion
SWAR stands out in its ability to avoid loops and branches by handing multiple digit calculations simultaneously using bitwise operations.
Top comments (0)