Joe Yan

Posted on

# Optimizing an N-Queens Solution (in JavaScript)

There are many things you can do to optimize a solver for the n-queens problem. I'll describe some of the approaches that I took recently. This won't be a tutorial or anything like that, but maybe you'll find a couple things interesting.

Let's take a look at five techniques:

1. The basic solution
2. Involving indexes
3. Seize the symmetry
4. Bring in the bits
5. Unroll it! And other terrible things
6. Whack it with web workers

(Any terminology used herein is just what came to mind, some of it's probably correct, hopefully most of it gets the point across... and if there's anything wrong, you can let me know.)

### 0. The Basic Solution

Suppose you have a `Board` object with some useful methods: `size()`, `togglePiece(row, col)`, and `hasAnyQueenConflictsOn(row, col)`. We can then solve the n-queens problem with this approach:

• `for each unoccupied space in the current board state`
• `toggle a queen onto the current space`
• `check if the board remains unconflicted`
• `if so, recur`
• `toggle the queen off of the current space`

Add in a base case (`number of queens placed = n`), track the number of solutions encountered, and it's complete. But, realizing that placing n queens on an n-by-n board requires placing exactly one queen per row, we can simply check each column on each recursive call, rather than every unoccupied space.

That brings us to this initial solution (in JavaScript):

``````countNQueensSolutions = function(n) {
return countNQueensHelper(new Board(n), 0);

function countNQueensHelper(board, numPlaced) {
var n = board.size();
// numPlaced is both our current row and the number of queens on the board
// if we've placed n queens, we've found a solution
if (numPlaced === n) {
return 1;
}

var nSols = 0;

// go through each column, testing if placement is valid
for (var c = 0; c < n; c++) {
board.togglePiece(numPlaced, c);
// if current position is unconflicted, recur and
// count the solutions stemming from our current position
if (!board.hasAnyQueenConflictsOn(numPlaced, c)) {
nSols += countNQueensHelper(board, numPlaced + 1);
}
// undo the current position when we've finished exploring this subtree
board.togglePiece(numPlaced, c);
}
return nSols;
}
};
``````
##### Performance:
n time (s)
8 0.06
9 0.16
10 0.73
11 3.8
12 21.2

N.B.: this timing was done with a Backbone-based `Board` implementation, which will make it extra slow

### 1. Involving indexes

Insight: Columns are just like rows. We're not re-checking rows -- why do we need to re-check the same columns every time?
So... Keep an n-length boolean array, with `true` and `false` identifying occupied and free columns respectively. Instead of testing every column, test only those that are free.

Insight 2: Diagonals aren't that different either, are they? If we say that the board has 2n-1 left (top-left to bottom-right) and right (top-right to bottom-left) diagonals and add in an indexing convention, then they're just like rows and columns.
So... Keep two (2n-1)-length boolean arrays, one for left and one for right diagonals.

Last insight: If we track free columns, rows, and diagonals, then by definition, we know which spaces are unconflicted - who needs that pesky `hasAnyQueenConflictsOn`? And if we're not using that, then we don't need `togglePiece` ...so... toss out the whole `Board`!

The solution now:

``````countNQueensSolutions = function(n) {
return countNQueensHelper(
0,
new Array(n).fill(false),
new Array(2 * n - 1).fill(false),
new Array(2 * n - 1).fill(false)
);

function countNQueensHelper(numPlaced, qInCol, qInLDiag, qInRDiag) {
var n = qInCol.length;
if (numPlaced === n) return 1;

var r = numPlaced;
var nSols = 0;

// go through each column, testing if placement is valid
for (var c = 0; c < n; c++) {
var ld = c - r;
var rd = r + c;

// if current position is valid, recur
if (!qInCol[c] && !qInLDiag[ld] && !qInRDiag[rd]) {
(qInCol[c] = true), (qInLDiag[ld] = true), (qInRDiag[rd] = true);

nSols += countNQueensHelper(r + 1, qInCol, qInLDiag, qInRDiag);

(qInCol[c] = false), (qInLDiag[ld] = false), (qInRDiag[rd] = false);
}
}
return nSols;
}
};
``````

Note that left diagonals are being indexed from `-n+1` to `n-1` and right diagonals are being indexed from `0` to `2*(n-1)`

n basic indexes
8 0.06 <0.01
9 0.16 <0.01
10 0.73 0.02
11 3.8 0.08
12 21.2 0.38
13 - 2.3
14 - 14.7

### 2. Seize the symmetry

Insight: The board is symmetric. If we take all solutions wherein the queen on the first row is on the left side of the board, there is exactly one paired solution to be found by reflecting the board across its center.

So... Cut the search space (roughly) in half: for even n, consider only solutions where the first-row queen is on the left half of the board and double the resulting solution count. For odd n, consider solutions where the first-row queen is on the left half of the board or in the center column, but don't double the solution count in the latter case.

The solution now:

``````countNQueensSolutions = function(n) {
return countNQueensHelper(
0,
new Array(n).fill(false),
new Array(2 * n - 1).fill(false),
new Array(2 * n - 1).fill(false)
);

function countNQueensHelper(numPlaced, qInCol, qInLDiag, qInRDiag) {
var n = qInCol.length;
if (numPlaced === n) return 1;

var r = numPlaced;
var nSols = 0;

// go through each column, testing if placement is valid
for (var c = 0; c < n; c++) {
var ld = c - r;
var rd = r + c;

// if current position is valid, recur
if (!qInCol[c] && !qInLDiag[ld] && !qInRDiag[rd]) {
(qInCol[c] = true), (qInLDiag[ld] = true), (qInRDiag[rd] = true);

nSols += countNQueensHelper(r + 1, qInCol, qInLDiag, qInRDiag);

(qInCol[c] = false), (qInLDiag[ld] = false), (qInRDiag[rd] = false);
}
}
return nSols;
}
};
``````

Note: You can apply a further optimization to odd-n boards by applying the same symmetry argument in the second row when the first-row queen is in the center column. I read about this in this article by Joyce Liu, though I imagine it's been written about more widely as well. We'll save this implementation for later.

##### Performance:
n basic indexes symmetry
8 0.06 <0.01 <0.01
9 0.16 <0.01 <0.01
10 0.73 0.02 0.01
11 3.8 0.08 0.05
12 21.2 0.38 0.22
13 - 2.3 1.3
14 - 14.7 7.5
15 - - 51.7

### 3. Bring in the bits

Insight: Our state can actually be represented by just the boolean arrays for columns, left diagonals, and right diagonals. We don't even need to pass `numPlaced` above, as we could simply count the number of `true` values in the columns array. This boolean array representation should lend itself well to bitwise manipulation
So... let's translate the previous solution into a version that uses numbers instead of boolean arrays and bitwise operators instead of loops.

There are lots of resources on bitwise solutions to n-queens, so I won't belabor the details. Here's my code (no symmetry optimization in this version):

``````countNQueensSolutions = function(n) {
var nMask = 2 ** n - 1;

return countNQueensHelper(0, 0, 0);

function countNQueensHelper(qInCol, qInLDiag, qInRDiag) {
if (qInCol === nMask) return 1;

var nSols = 0;

// shift major diagonal to the right, minor diagonal to the left
qInLDiag >>>= 1;
qInRDiag <<= 1;

// get openPositions bitstring
var openPositions = ~(qInCol | qInLDiag | qInRDiag) & nMask;

// recur once with every set bit in possibilities array
for (var poss of bitstringDecomps[openPositions])
nSols += countNQueensHelper(qInCol | poss, qInLDiag | poss, qInRDiag | poss);

return nSols;
}
};
``````

Why does this save time? Unlike in C or similar, JavaScript doesn't have inherent performance advantages for bitwise operations. However, as bitwise operations operate on the whole "array" at once, they are still much quicker than the boolean arrays.

Let's examine what it takes to find unconflicted positions. Above, we loop through each column and each iteration checks whether the column, left diagonal, and right diagonal are free. This involves n iterations, each with 10 operations: 4 operations to get the indexes for the diagonals (2 subtraction and 2 assignment), 3 array lookups, 2 logical `or`s, and 1 `if` evaluation. The solution above also has 3 negations, but these could be refactored out. Call it ~10n operations.

In the bitwise solution, this can all be done in constant time: 2 bit-shifts, 2 bitwise `or`s, 1 bitwise `not`, 1 bitwise `and`, and 3 assignments. We still need a loop to recur on each unconflicted position, but we come out far ahead overall.

One cool thing to point out is that you don't actually need to track 2n-1 diagonals, as the diagonals "shift out of" the board. Only n diagonals are relevant in each row, and the bitshift operators kindly remove irrelevant ones and shift the relevant ones into their correct positions.

This solution cheats, a little bit: it uses a precomputed array called bitstringDecomps. This array "decomposes" an "open positions" bitstring into an array of "possibility" bitstrings, i.e. bitstrings with only one bit set. That is, if `openPositions = 161 = 0b10100001` for an n=8 board, indicating that columns 0, 2, and 7 in the current row are unconflicted, `bitStringDecomps[161] = [128, 32, 1] = [0b10000000, 0b100000, 0b1]`. Looping through the decomposition allows us to test each unconflicted position.

This can be avoided with some more bitwise magic (`openPositions & -openPositions`), which you can see in this paper by Martin Richards. But I didn't figure that out, thus the above.

##### Performance:
n basic indexes symmetry bit (no symmetry)
8 0.06 <0.01 <0.01 <0.01
9 0.16 <0.01 <0.01 <0.01
10 0.73 0.02 0.01 <0.01
11 3.8 0.08 0.05 0.01
12 21.2 0.38 0.22 0.04
13 - 2.3 1.3 0.18
14 - 14.7 7.5 1.0
15 - - 51.7 6.6
16 - - - 44.2

### 4. Unroll it! And other terrible things

This time around, code first. This is a bitwise solution with symmetry (including the additional optimization for odd-n mentioned above), plus a bunch of bad stuff:

``````countNQueensSolutions = function(n) {
if (n === 0) return 1;
window.nMask = 2 ** n - 1;
window.solutionCount = 0;

_start(0, 0, 0);

return window.solutionCount;

function _start(col, lDiag, rDiag) {

var open = ~(col | lDiag | rDiag) & nMask;

// run the right half of possibilities (small half for odd n)
var halfNumPoss = window.bitstringDecomps[open].length >> 1;
for (let i = 0; i < halfNumPoss; i++) {
var poss = window.bitstringDecomps[open][i];
_main(col | poss, (lDiag | poss) >> 1, (rDiag | poss) << 1);
}
window.solutionCount <<= 1;

// run the central branch if N is odd
if (n % 2 === 1) {
var poss = window.bitstringDecomps[open][halfNumPoss];

var prevCt = window.solutionCount;
window.solutionCount = 0;
_oddOpt(col | poss, (lDiag | poss) >> 1, (rDiag | poss) << 1);
window.solutionCount = window.solutionCount * 2 + prevCt;
}
}

// function to split the branches of the central tree when N is odd
function _oddOpt(col, lDiag, rDiag) {

var open = ~(col | lDiag | rDiag) & window.nMask;
// split the tree
var halfNumPoss = window.bitstringDecomps[open].length >> 1;
for (let i = 0; i < halfNumPoss; i++) {
var poss = window.bitstringDecomps[open][i];
_main(col | poss, (lDiag | poss) >> 1, (rDiag | poss) << 1);
}
}

// main recursive function
function _main(col, lDiag, rDiag) {

var open = ~(col | lDiag | rDiag) & window.nMask;

while (open > 0) {
var poss = open & -open;
_main(col | poss, (lDiag | poss) >> 1, (rDiag | poss) << 1);
open ^= poss;
}
}
};
``````

Insight: Not a whole lot of insight in this one, just a lot of bad practices and dirty tricks.
So... here are some of the things that have been done:

• code repeated/unrolled in 3 nearly-identical functions to avoid repeated extra conditional checks in main recursive loop
• use of global variable `solutionCount` to avoid passing values back up the call stack and save an operation or two (global vs. closure to avoid navigating the scope chain)
• global `nMask`, again to avoid both passing as an argument repeatedly and navigating the scope chain
• passing shifted diagonals bitstrings into next recursive call rather than shifting in body of function to avoid an extra assignment
• application of the `openPositions & -openPositions` trick described above, but only in the main recursive function, as elsewhere the precomputed array of bit decompositions was faster

Obviously eking out tiny little gains here, and some of this stuff might not do anything, but here we are. Better to do this before the next step anyways.

##### Performance:
n basic indexes symmetry bitwise unrolled+
8 0.06 <0.01 <0.01 <0.01 <0.01
9 0.16 <0.01 <0.01 <0.01 <0.01
10 0.73 0.02 0.01 <0.01 <0.01
11 3.8 0.08 0.05 0.01 <0.01
12 21.2 0.38 0.22 0.04 0.01
13 - 2.3 1.3 0.18 0.06
14 - 14.7 7.5 1.0 0.36
15 - - 51.7 6.6 2.2
16 - - - 44.2 6.5
17 - - - - 44.3

### 5. Whack it with web workers

JavaScript is single threaded, but web workers run in a separate thread and can thereby be used for parallelization. Communication between threads occurs through events: the main thread and worker threads can each `postMessage`s to the other and add `message` event listeners to execute code upon receiving messages from the other. Communication does not exist otherwise, as each worker exists in its own unique global scope.

Worker code:

``````// queensWorker.js

"message",
function({ data: msgData }) {
let { col, lDiag, rDiag, nMask } = msgData;

nSols = 0;
solve(col, lDiag, rDiag);

function solve(col, lDiag, rDiag) {

var open = ~(col | lDiag | rDiag) & nMask;

while (open > 0) {
var poss = open & -open;
solve(col | poss, (lDiag | poss) >> 1, (rDiag | poss) << 1);
open ^= poss;
}
}

postMessage(nSols);
},
false
);

``````

The worker is pretty straightforward, and it's essentially just the `_main` function from earlier. It takes in a message with a subproblem definition (consisting of `col`, `lDiag`, `rDiag`, and `nMask`) solves it recursively, and posts the number of solutions found for that subproblem back to the main thread.

Main code:

``````nQueensWorkers = function(n, nWorkers = navigator.hardwareConcurrency || 4) {
if (n === 0) return 1;

// set up closure variables
let nMask = 2 ** n - 1;
let halfSolCount = 0;

// set up work for workers
let subProbs = [];

// set up workers
let createWorker = function() {
let worker = new Worker("src/queensWorker.js");
halfSolCount += event.data;
if (subProbs.length > 0) {
worker.postMessage(subProbs.pop());
} else {
worker.terminate();
nWorkers--;
if (nWorkers === 0) {
console.log(`nQueensWorkers(\${n}): \${halfSolCount * 2} solutions`);
}
}
});
return worker;
};
let workers = [...Array(nWorkers)].map(() => createWorker());

// queue the "side branch" problems to be solved
let open = nMask; // initially all columns are open, which is represented by nMask
let halfNumPoss = bitstringDecomps[open].length >> 1;
for (let i = 0; i < halfNumPoss; i++) {
let poss = bitstringDecomps[open][i];
subProbs.push({ col: poss, lDiag: poss >> 1, rDiag: poss << 1, nMask: nMask });
}

// if n is odd, queue the "center branch" queue
if (n % 2 === 1) {

let col = bitstringDecomps[open][halfNumPoss];
let lDiag = col >> 1;
let rDiag = col << 1;
open = ~(col | lDiag | rDiag) & nMask;
halfNumPoss = bitstringDecomps[open].length >> 1;
for (let i = 0; i < halfNumPoss; i++) {
let poss = window.bitstringDecomps[open][i];
subProbs.push(
);
}
}

// begin running the workers
for (let worker of workers) {
if (subProbs.length > 0) {
let subProb = subProbs.pop();
subProb.timeProblemPost = new Date().getTime();
worker.postMessage(subProb);
} else {
worker.terminate();
nWorkers--;
}
}
}
``````

The main thread is a bit more involved, but it still follows the logic in the previous solution. We follow these steps:

1. Instantiate web workers
2. Split the n-queens problem into subproblems (one per column on the left side of the board)
3. If n is odd, also add the "center column" subproblems
4. Begin running web workers

This implementation uses a "thread pool" for the workers, where each worker picks up a new subproblem after completing its current workload see `createWorker()`. An alternative strategy would be to instantiate workers for each individual subproblem, either up front or as they are formulated. I expected the optimal number of workers to be 8 on my quad-core multi-threaded machine (this is `navigator.hardwareConcurrency`) but that turns out not to be the case. Empirically, performance improves noticeably up to 16 workers on n=17 and n=18. That could either be because of CPU hardware magic or because of messaging problems with the workers; I suspect the former. It's not a great setup though, because workload is very unevenly distributed - the position of the first-row queen greatly affects the number of possible solutions in that subproblem.

##### Performance:
n basic indexes symmetry bitwise unrolled+ web workers
8 0.06 <0.01 <0.01 <0.01 <0.01 0.04
9 0.16 <0.01 <0.01 <0.01 <0.01 0.05
10 0.73 0.02 0.01 <0.01 <0.01 0.05
11 3.8 0.08 0.05 0.01 <0.01 0.08
12 21.2 0.38 0.22 0.04 0.01 0.07
13 - 2.3 1.3 0.18 0.06 0.12
14 - 14.7 7.5 1.0 0.36 0.13
15 - - 51.7 6.6 2.2 0.42
16 - - - 44.2 6.5 1.8
17 - - - - 44.3 12.3
18 - - - - - 102.7

From here, it could be interesting to explored shared web workers, which share global scope. This would make it easier to communicate between workers, facilitating passing of subproblems and improving distribution of computing workload.