There are many things you can do to optimize a solver for the nqueens problem. I'll describe some of the approaches that I took recently. This won't be a tutorial or anything like that, but maybe you'll find a couple things interesting.
Let's take a look at five techniques:
 The basic solution
 Involving indexes
 Seize the symmetry
 Bring in the bits
 Unroll it! And other terrible things
 Whack it with web workers
(Any terminology used herein is just what came to mind, some of it's probably correct, hopefully most of it gets the point across... and if there's anything wrong, you can let me know.)
0. The Basic Solution
Suppose you have a Board
object with some useful methods: size()
, togglePiece(row, col)
, and hasAnyQueenConflictsOn(row, col)
. We can then solve the nqueens problem with this approach:

for each unoccupied space in the current board state
toggle a queen onto the current space

check if the board remains unconflicted
if so, recur
toggle the queen off of the current space
Add in a base case (number of queens placed = n
), track the number of solutions encountered, and it's complete. But, realizing that placing n queens on an nbyn board requires placing exactly one queen per row, we can simply check each column on each recursive call, rather than every unoccupied space.
That brings us to this initial solution (in JavaScript):
countNQueensSolutions = function(n) {
return countNQueensHelper(new Board(n), 0);
function countNQueensHelper(board, numPlaced) {
var n = board.size();
// numPlaced is both our current row and the number of queens on the board
// if we've placed n queens, we've found a solution
if (numPlaced === n) {
return 1;
}
var nSols = 0;
// go through each column, testing if placement is valid
for (var c = 0; c < n; c++) {
board.togglePiece(numPlaced, c);
// if current position is unconflicted, recur and
// count the solutions stemming from our current position
if (!board.hasAnyQueenConflictsOn(numPlaced, c)) {
nSols += countNQueensHelper(board, numPlaced + 1);
}
// undo the current position when we've finished exploring this subtree
board.togglePiece(numPlaced, c);
}
return nSols;
}
};
Performance:
n  time (s) 

8  0.06 
9  0.16 
10  0.73 
11  3.8 
12  21.2 
N.B.: this timing was done with a Backbonebased Board
implementation, which will make it extra slow
1. Involving indexes
Insight: Columns are just like rows. We're not rechecking rows  why do we need to recheck the same columns every time?
So... Keep an nlength boolean array, with true
and false
identifying occupied and free columns respectively. Instead of testing every column, test only those that are free.
Insight 2: Diagonals aren't that different either, are they? If we say that the board has 2n1 left (topleft to bottomright) and right (topright to bottomleft) diagonals and add in an indexing convention, then they're just like rows and columns.
So... Keep two (2n1)length boolean arrays, one for left and one for right diagonals.
Last insight: If we track free columns, rows, and diagonals, then by definition, we know which spaces are unconflicted  who needs that pesky hasAnyQueenConflictsOn
? And if we're not using that, then we don't need togglePiece
...so... toss out the whole Board
!
The solution now:
countNQueensSolutions = function(n) {
return countNQueensHelper(
0,
new Array(n).fill(false),
new Array(2 * n  1).fill(false),
new Array(2 * n  1).fill(false)
);
function countNQueensHelper(numPlaced, qInCol, qInLDiag, qInRDiag) {
var n = qInCol.length;
if (numPlaced === n) return 1;
var r = numPlaced;
var nSols = 0;
// go through each column, testing if placement is valid
for (var c = 0; c < n; c++) {
var ld = c  r;
var rd = r + c;
// if current position is valid, recur
if (!qInCol[c] && !qInLDiag[ld] && !qInRDiag[rd]) {
(qInCol[c] = true), (qInLDiag[ld] = true), (qInRDiag[rd] = true);
nSols += countNQueensHelper(r + 1, qInCol, qInLDiag, qInRDiag);
(qInCol[c] = false), (qInLDiag[ld] = false), (qInRDiag[rd] = false);
}
}
return nSols;
}
};
Note that left diagonals are being indexed from n+1
to n1
and right diagonals are being indexed from 0
to 2*(n1)
Performance:
n  basic  indexes 

8  0.06  <0.01 
9  0.16  <0.01 
10  0.73  0.02 
11  3.8  0.08 
12  21.2  0.38 
13    2.3 
14    14.7 
2. Seize the symmetry
Insight: The board is symmetric. If we take all solutions wherein the queen on the first row is on the left side of the board, there is exactly one paired solution to be found by reflecting the board across its center.
So... Cut the search space (roughly) in half: for even n, consider only solutions where the firstrow queen is on the left half of the board and double the resulting solution count. For odd n, consider solutions where the firstrow queen is on the left half of the board or in the center column, but don't double the solution count in the latter case.
The solution now:
countNQueensSolutions = function(n) {
return countNQueensHelper(
0,
new Array(n).fill(false),
new Array(2 * n  1).fill(false),
new Array(2 * n  1).fill(false)
);
function countNQueensHelper(numPlaced, qInCol, qInLDiag, qInRDiag) {
var n = qInCol.length;
if (numPlaced === n) return 1;
var r = numPlaced;
var nSols = 0;
// go through each column, testing if placement is valid
for (var c = 0; c < n; c++) {
var ld = c  r;
var rd = r + c;
// if current position is valid, recur
if (!qInCol[c] && !qInLDiag[ld] && !qInRDiag[rd]) {
(qInCol[c] = true), (qInLDiag[ld] = true), (qInRDiag[rd] = true);
nSols += countNQueensHelper(r + 1, qInCol, qInLDiag, qInRDiag);
(qInCol[c] = false), (qInLDiag[ld] = false), (qInRDiag[rd] = false);
}
}
return nSols;
}
};
Note: You can apply a further optimization to oddn boards by applying the same symmetry argument in the second row when the firstrow queen is in the center column. I read about this in this article by Joyce Liu, though I imagine it's been written about more widely as well. We'll save this implementation for later.
Performance:
n  basic  indexes  symmetry 

8  0.06  <0.01  <0.01 
9  0.16  <0.01  <0.01 
10  0.73  0.02  0.01 
11  3.8  0.08  0.05 
12  21.2  0.38  0.22 
13    2.3  1.3 
14    14.7  7.5 
15      51.7 
3. Bring in the bits
Insight: Our state can actually be represented by just the boolean arrays for columns, left diagonals, and right diagonals. We don't even need to pass numPlaced
above, as we could simply count the number of true
values in the columns array. This boolean array representation should lend itself well to bitwise manipulation
So... let's translate the previous solution into a version that uses numbers instead of boolean arrays and bitwise operators instead of loops.
There are lots of resources on bitwise solutions to nqueens, so I won't belabor the details. Here's my code (no symmetry optimization in this version):
countNQueensSolutions = function(n) {
var nMask = 2 ** n  1;
return countNQueensHelper(0, 0, 0);
function countNQueensHelper(qInCol, qInLDiag, qInRDiag) {
if (qInCol === nMask) return 1;
var nSols = 0;
// shift major diagonal to the right, minor diagonal to the left
qInLDiag >>>= 1;
qInRDiag <<= 1;
// get openPositions bitstring
var openPositions = ~(qInCol  qInLDiag  qInRDiag) & nMask;
// recur once with every set bit in possibilities array
for (var poss of bitstringDecomps[openPositions])
nSols += countNQueensHelper(qInCol  poss, qInLDiag  poss, qInRDiag  poss);
return nSols;
}
};
Why does this save time? Unlike in C or similar, JavaScript doesn't have inherent performance advantages for bitwise operations. However, as bitwise operations operate on the whole "array" at once, they are still much quicker than the boolean arrays.
Let's examine what it takes to find unconflicted positions. Above, we loop through each column and each iteration checks whether the column, left diagonal, and right diagonal are free. This involves n iterations, each with 10 operations: 4 operations to get the indexes for the diagonals (2 subtraction and 2 assignment), 3 array lookups, 2 logical or
s, and 1 if
evaluation. The solution above also has 3 negations, but these could be refactored out. Call it ~10n operations.
In the bitwise solution, this can all be done in constant time: 2 bitshifts, 2 bitwise or
s, 1 bitwise not
, 1 bitwise and
, and 3 assignments. We still need a loop to recur on each unconflicted position, but we come out far ahead overall.
One cool thing to point out is that you don't actually need to track 2n1 diagonals, as the diagonals "shift out of" the board. Only n diagonals are relevant in each row, and the bitshift operators kindly remove irrelevant ones and shift the relevant ones into their correct positions.
This solution cheats, a little bit: it uses a precomputed array called bitstringDecomps. This array "decomposes" an "open positions" bitstring into an array of "possibility" bitstrings, i.e. bitstrings with only one bit set. That is, if openPositions = 161 = 0b10100001
for an n=8 board, indicating that columns 0, 2, and 7 in the current row are unconflicted, bitStringDecomps[161] = [128, 32, 1] = [0b10000000, 0b100000, 0b1]
. Looping through the decomposition allows us to test each unconflicted position.
This can be avoided with some more bitwise magic (openPositions & openPositions
), which you can see in this paper by Martin Richards. But I didn't figure that out, thus the above.
Performance:
n  basic  indexes  symmetry  bit (no symmetry) 

8  0.06  <0.01  <0.01  <0.01 
9  0.16  <0.01  <0.01  <0.01 
10  0.73  0.02  0.01  <0.01 
11  3.8  0.08  0.05  0.01 
12  21.2  0.38  0.22  0.04 
13    2.3  1.3  0.18 
14    14.7  7.5  1.0 
15      51.7  6.6 
16        44.2 
4. Unroll it! And other terrible things
This time around, code first. This is a bitwise solution with symmetry (including the additional optimization for oddn mentioned above), plus a bunch of bad stuff:
countNQueensSolutions = function(n) {
if (n === 0) return 1;
window.nMask = 2 ** n  1;
window.solutionCount = 0;
_start(0, 0, 0);
return window.solutionCount;
function _start(col, lDiag, rDiag) {
if (col === window.nMask) window.solutionCount++;
var open = ~(col  lDiag  rDiag) & nMask;
// run the right half of possibilities (small half for odd n)
var halfNumPoss = window.bitstringDecomps[open].length >> 1;
for (let i = 0; i < halfNumPoss; i++) {
var poss = window.bitstringDecomps[open][i];
_main(col  poss, (lDiag  poss) >> 1, (rDiag  poss) << 1);
}
window.solutionCount <<= 1;
// run the central branch if N is odd
if (n % 2 === 1) {
var poss = window.bitstringDecomps[open][halfNumPoss];
var prevCt = window.solutionCount;
window.solutionCount = 0;
_oddOpt(col  poss, (lDiag  poss) >> 1, (rDiag  poss) << 1);
window.solutionCount = window.solutionCount * 2 + prevCt;
}
}
// function to split the branches of the central tree when N is odd
function _oddOpt(col, lDiag, rDiag) {
if (col === window.nMask) window.solutionCount++;
var open = ~(col  lDiag  rDiag) & window.nMask;
// split the tree
var halfNumPoss = window.bitstringDecomps[open].length >> 1;
for (let i = 0; i < halfNumPoss; i++) {
var poss = window.bitstringDecomps[open][i];
_main(col  poss, (lDiag  poss) >> 1, (rDiag  poss) << 1);
}
}
// main recursive function
function _main(col, lDiag, rDiag) {
if (col === window.nMask) window.solutionCount++;
var open = ~(col  lDiag  rDiag) & window.nMask;
while (open > 0) {
var poss = open & open;
_main(col  poss, (lDiag  poss) >> 1, (rDiag  poss) << 1);
open ^= poss;
}
}
};
Insight: Not a whole lot of insight in this one, just a lot of bad practices and dirty tricks.
So... here are some of the things that have been done:
 code repeated/unrolled in 3 nearlyidentical functions to avoid repeated extra conditional checks in main recursive loop
 use of global variable
solutionCount
to avoid passing values back up the call stack and save an operation or two (global vs. closure to avoid navigating the scope chain)  global
nMask
, again to avoid both passing as an argument repeatedly and navigating the scope chain  passing shifted diagonals bitstrings into next recursive call rather than shifting in body of function to avoid an extra assignment
 application of the
openPositions & openPositions
trick described above, but only in the main recursive function, as elsewhere the precomputed array of bit decompositions was faster
Obviously eking out tiny little gains here, and some of this stuff might not do anything, but here we are. Better to do this before the next step anyways.
Performance:
n  basic  indexes  symmetry  bitwise  unrolled+ 

8  0.06  <0.01  <0.01  <0.01  <0.01 
9  0.16  <0.01  <0.01  <0.01  <0.01 
10  0.73  0.02  0.01  <0.01  <0.01 
11  3.8  0.08  0.05  0.01  <0.01 
12  21.2  0.38  0.22  0.04  0.01 
13    2.3  1.3  0.18  0.06 
14    14.7  7.5  1.0  0.36 
15      51.7  6.6  2.2 
16        44.2  6.5 
17          44.3 
5. Whack it with web workers
JavaScript is single threaded, but web workers run in a separate thread and can thereby be used for parallelization. Communication between threads occurs through events: the main thread and worker threads can each postMessage
s to the other and add message
event listeners to execute code upon receiving messages from the other. Communication does not exist otherwise, as each worker exists in its own unique global scope.
Worker code:
// queensWorker.js
self.addEventListener(
"message",
function({ data: msgData }) {
let { col, lDiag, rDiag, nMask } = msgData;
nSols = 0;
solve(col, lDiag, rDiag);
function solve(col, lDiag, rDiag) {
if (col === nMask) nSols++;
var open = ~(col  lDiag  rDiag) & nMask;
while (open > 0) {
var poss = open & open;
solve(col  poss, (lDiag  poss) >> 1, (rDiag  poss) << 1);
open ^= poss;
}
}
postMessage(nSols);
},
false
);
The worker is pretty straightforward, and it's essentially just the _main
function from earlier. It takes in a message with a subproblem definition (consisting of col
, lDiag
, rDiag
, and nMask
) solves it recursively, and posts the number of solutions found for that subproblem back to the main thread.
Main code:
nQueensWorkers = function(n, nWorkers = navigator.hardwareConcurrency  4) {
if (n === 0) return 1;
// set up closure variables
let nMask = 2 ** n  1;
let halfSolCount = 0;
// set up work for workers
let subProbs = [];
// set up workers
let createWorker = function() {
let worker = new Worker("src/queensWorker.js");
worker.addEventListener("message", event => {
halfSolCount += event.data;
if (subProbs.length > 0) {
worker.postMessage(subProbs.pop());
} else {
worker.terminate();
nWorkers;
if (nWorkers === 0) {
console.log(`nQueensWorkers(${n}): ${halfSolCount * 2} solutions`);
}
}
});
worker.addEventListener("error", err => console.error(err));
return worker;
};
let workers = [...Array(nWorkers)].map(() => createWorker());
// queue the "side branch" problems to be solved
let open = nMask; // initially all columns are open, which is represented by nMask
let halfNumPoss = bitstringDecomps[open].length >> 1;
for (let i = 0; i < halfNumPoss; i++) {
let poss = bitstringDecomps[open][i];
subProbs.push({ col: poss, lDiag: poss >> 1, rDiag: poss << 1, nMask: nMask });
}
// if n is odd, queue the "center branch" queue
if (n % 2 === 1) {
open = nMask;
let col = bitstringDecomps[open][halfNumPoss];
let lDiag = col >> 1;
let rDiag = col << 1;
open = ~(col  lDiag  rDiag) & nMask;
halfNumPoss = bitstringDecomps[open].length >> 1;
for (let i = 0; i < halfNumPoss; i++) {
let poss = window.bitstringDecomps[open][i];
subProbs.push(
{ col: colposs, lDiag: (lDiagposs)>>1, rDiag: (rDiagposs)<<1, nMask: nMask }
);
}
}
// begin running the workers
for (let worker of workers) {
if (subProbs.length > 0) {
let subProb = subProbs.pop();
subProb.timeProblemPost = new Date().getTime();
worker.postMessage(subProb);
} else {
worker.terminate();
nWorkers;
}
}
}
The main thread is a bit more involved, but it still follows the logic in the previous solution. We follow these steps:
 Instantiate web workers
 Split the nqueens problem into subproblems (one per column on the left side of the board)
 If n is odd, also add the "center column" subproblems
 Begin running web workers
This implementation uses a "thread pool" for the workers, where each worker picks up a new subproblem after completing its current workload see createWorker()
. An alternative strategy would be to instantiate workers for each individual subproblem, either up front or as they are formulated. I expected the optimal number of workers to be 8 on my quadcore multithreaded machine (this is navigator.hardwareConcurrency
) but that turns out not to be the case. Empirically, performance improves noticeably up to 16 workers on n=17 and n=18. That could either be because of CPU hardware magic or because of messaging problems with the workers; I suspect the former. It's not a great setup though, because workload is very unevenly distributed  the position of the firstrow queen greatly affects the number of possible solutions in that subproblem.
Performance:
n  basic  indexes  symmetry  bitwise  unrolled+  web workers 

8  0.06  <0.01  <0.01  <0.01  <0.01  0.04 
9  0.16  <0.01  <0.01  <0.01  <0.01  0.05 
10  0.73  0.02  0.01  <0.01  <0.01  0.05 
11  3.8  0.08  0.05  0.01  <0.01  0.08 
12  21.2  0.38  0.22  0.04  0.01  0.07 
13    2.3  1.3  0.18  0.06  0.12 
14    14.7  7.5  1.0  0.36  0.13 
15      51.7  6.6  2.2  0.42 
16        44.2  6.5  1.8 
17          44.3  12.3 
18            102.7 
From here, it could be interesting to explored shared web workers, which share global scope. This would make it easier to communicate between workers, facilitating passing of subproblems and improving distribution of computing workload.
Top comments (0)