DEV Community

yw662
yw662

Posted on

How good is the shuffleArray from 7-killer-one-liners ?

As many of us may like this post about 7-killer-one-liners, we all know that shuffling looks not very promising, compared with the "correct" way, Fisher-Yates and its variants.

const shuffleArray = (arr) => arr.sort(() => Math.random() - 0.5)
Enter fullscreen mode Exit fullscreen mode

But, how bad can it be? Basically it depends on the sorting algorithm. It is usually some kind of intro-sort, with is usually some mixture of quick sort, insertion sort and heap sort. The randomness makes it hard to predict the result. So, let's do some experiments instead.

Firstly it is the shuffle function:

declare global {
  interface Array<T> {
    shuffle: () => T[]
  }
}

Array.prototype.shuffle = function <T>(this: T[]) {
  return this.sort(() => Math.random() - 0.5)
}

export {}
Enter fullscreen mode Exit fullscreen mode

And now we can:

const experiment = (N: number, times?: number) => {
  times = times ?? N ** 2
  const original = [...Array(N).keys()]
  const samples = Array.from(Array(times), () => [...original].shuffle())
}
Enter fullscreen mode Exit fullscreen mode

We now have so many shuffled samples, but how can we assess them ?

Here we gonna calculate the frequency each number may appear on each position.

const NumberPosition = (numbers: number[], samples: number[][]) => {
  return numbers.map(
    n => samples.map(sample => [n, sample.indexOf(n)] as const)
    // (n, k) => samples.map(sample => [sample[k], k] as const)
  ).flat(1)
}

const experiment = (N: number, times?: number) => {
  times = times ?? N ** 2
  const original = [...Array(N).keys()]
  const samples = Array.from(Array(times), () => [...original].shuffle())
  const pairs = NumberPosition(original, samples)
}
Enter fullscreen mode Exit fullscreen mode

Both methods work. The former one seems more "understandable", and we don't care about performance at all.

Here we gonna count the pairs. We need a Map<[number, number], number> for that. But here is a problem:

const m = new Map<[number, number], number>()
m.set([0, 0], 1)
m.set([0, 0], 2)
console.log(m)

> Map(2) { [ 0, 0 ] => 1, [ 0, 0 ] => 2 }
Enter fullscreen mode Exit fullscreen mode

To make things cool, we use a pool, which is a [number, number][][], to keep the reference unique.

  const map = new Map<readonly [number, number], number>()
  const pool = original.map(
    n => original.map((_, k) => [n, k] as const)
  )
  const keyOf = (pair: readonly [number, number]) =>
    pool[pair[0]][pair[1]]
  for (const pair of pairs) {
    const key = keyOf(pair)
    map.set(key, (map.get(key) ?? 0) + 1)
  }
Enter fullscreen mode Exit fullscreen mode

All these as const are there for a reason.

Now we have the stats. We gonna sort it by counts.

  return Array.from(map.entries())
    .sort(([, a], [, b]) => b - a)
Enter fullscreen mode Exit fullscreen mode

Now the whole script looks like:

declare global {
  interface Array<T> {
    shuffle: () => T[]
  }
}

Array.prototype.shuffle = function <T>(this: T[]) {
  return this.sort(() => Math.random() - 0.5)
}

const experiment = (N: number, times?: number) => {
  times = times ?? N ** 2
  const original = [...Array(N).keys()]
  const samples = Array.from(Array(times), () => [...original].shuffle())
  const pairs = original.map(
    n => samples.map(sample => [n, sample.indexOf(n)] as const)
    // (n, k) => samples.map(sample => [sample[k], k] as const)
  ).flat(1)

  const map = new Map<readonly [number, number], number>()
  const pool = original.map(n => original.map((_, k) => [n, k] as const))
  const keyOf = (pair: readonly [number, number]) => pool[pair[0]][pair[1]]
  for (const pair of pairs) {
    const key = keyOf(pair)
    map.set(key, (map.get(key) ?? 0) + 1)
  }
  return Array.from(map.entries()).sort(([, a], [, b]) => b - a)
}

export { }
Enter fullscreen mode Exit fullscreen mode

So now let's try sth easy:

console.table(experiment(3, 65536))
Enter fullscreen mode Exit fullscreen mode

and the result:

┌─────────┬──────────┬───────┐
│ (index) │    0     │   1   │
├─────────┼──────────┼───────┤
│    0    │ [ 1, 1 ] │ 45117 │
│    1    │ [ 2, 2 ] │ 32746 │
│    2    │ [ 0, 0 ] │ 28609 │
│    3    │ [ 0, 2 ] │ 24666 │
│    4    │ [ 2, 0 ] │ 24632 │
│    5    │ [ 1, 0 ] │ 12295 │
│    6    │ [ 0, 1 ] │ 12261 │
│    7    │ [ 2, 1 ] │ 8158  │
│    8    │ [ 1, 2 ] │ 8124  │
└─────────┴──────────┴───────┘
Enter fullscreen mode Exit fullscreen mode

[1, 1] 45117 and [2, 2] 32746 vs [1, 2] 8124 and [2, 1] 8158, That means some elements are more likely to stay where they originally are: and it is 45117/65536, not a very good one.

Let's try a larger array. For larger ones, we care only about first few and last few records, so let's do a filter:

const endN = 4
console.table(
  experiment(40, 100000)
    .filter(
      (_, k, a) => k < endN || a.length - k < endN)
)
Enter fullscreen mode Exit fullscreen mode
┌─────────┬────────────┬──────┐
│ (index) │     0      │  1   │
├─────────┼────────────┼──────┤
│    0    │  [ 0, 0 ]  │ 7031 │
│    1    │  [ 0, 1 ]  │ 6308 │
│    2    │ [ 30, 39 ] │ 4650 │
│    3    │  [ 3, 0 ]  │ 4624 │
│    4    │ [ 1, 37 ]  │ 772  │
│    5    │ [ 1, 38 ]  │ 579  │
│    6    │ [ 1, 39 ]  │ 378  │
└─────────┴────────────┴──────┘
Enter fullscreen mode Exit fullscreen mode

10 times, but it is 0.07, seems better. And it means "there are a possibility of 0.07 that 0 stays on position 0".

Things are kept near where they were, typical insertion sort. This is how intro-sort looks like when N is low.

And a larger one, 1000. I have to do less iterations (down to 10000) or there will be no enough address space for node.js to use.

┌─────────┬──────────────┬────┐
│ (index) │      0       │ 1  │
├─────────┼──────────────┼────┤
│    0    │  [ 441, 0 ]  │ 55 │
│    1    │   [ 0, 4 ]   │ 53 │
│    2    │  [ 315, 1 ]  │ 52 │
│    3    │   [ 0, 3 ]   │ 52 │
│    4    │  [ 252, 2 ]  │ 49 │
│    5    │  [ 0, 10 ]   │ 48 │
│    6    │  [ 0, 13 ]   │ 48 │
│    7    │  [ 63, 4 ]   │ 47 │
│    8    │   [ 0, 9 ]   │ 47 │
│    9    │  [ 189, 3 ]  │ 46 │
│   10    │ [ 190, 999 ] │ 1  │
│   11    │ [ 134, 999 ] │ 1  │
│   12    │ [ 887, 999 ] │ 1  │
│   13    │ [ 946, 999 ] │ 1  │
│   14    │ [ 63, 999 ]  │ 1  │
│   15    │ [ 632, 999 ] │ 1  │
│   16    │ [ 883, 999 ] │ 1  │
│   17    │ [ 71, 999 ]  │ 1  │
│   18    │ [ 889, 999 ] │ 1  │
└─────────┴──────────────┴────┘
Enter fullscreen mode Exit fullscreen mode

No much data but a stable one. 55/10000 is not a very big issue, but 55:1 is still bad.

At the end let's try a real Fisher-Yates and see how good it is:

Array.prototype.shuffle = function <T>(this: T[]) {
  for (let i = this.length - 1; i > 0; i--) {
    const j = Math.floor(Math.random() * (i + 1));
    [this[i], this[j]] = [this[j], this[i]]
  }
  return this
}
Enter fullscreen mode Exit fullscreen mode

You can tell from above I don't like semis, but I have to keep this one :-).
and

┌─────────┬──────────┬──────┐
│ (index) │    0     │  1   │
├─────────┼──────────┼──────┤
│    0    │ [ 2, 0 ] │ 3370 │
│    1    │ [ 1, 2 ] │ 3369 │
│    2    │ [ 0, 2 ] │ 3360 │
│    3    │ [ 2, 1 ] │ 3359 │
│    4    │ [ 0, 1 ] │ 3344 │
│    5    │ [ 1, 0 ] │ 3334 │
│    6    │ [ 1, 1 ] │ 3297 │
│    7    │ [ 0, 0 ] │ 3296 │
│    8    │ [ 2, 2 ] │ 3271 │
└─────────┴──────────┴──────┘
Enter fullscreen mode Exit fullscreen mode

Looks good.

and 40

┌─────────┬────────────┬──────┐
│ (index) │     0      │  1   │
├─────────┼────────────┼──────┤
│    0    │ [ 39, 11 ] │ 2638 │
│    1    │ [ 11, 11 ] │ 2636 │
│    2    │ [ 38, 34 ] │ 2634 │
│    3    │ [ 4, 36 ]  │ 2633 │
│    4    │ [ 20, 21 ] │ 2348 │
│    5    │ [ 27, 25 ] │ 2348 │
│    6    │ [ 32, 20 ] │ 2345 │
└─────────┴────────────┴──────┘
Enter fullscreen mode Exit fullscreen mode

and 100

┌─────────┬────────────┬──────┐
│ (index) │     0      │  1   │
├─────────┼────────────┼──────┤
│    0    │ [ 74, 70 ] │ 2168 │
│    1    │ [ 55, 2 ]  │ 2167 │
│    2    │ [ 68, 74 ] │ 2164 │
│    3    │ [ 50, 20 ] │ 2157 │
│    4    │ [ 35, 54 ] │ 1830 │
│    5    │ [ 3, 92 ]  │ 1823 │
│    6    │ [ 27, 69 ] │ 1794 │
└─────────┴────────────┴──────┘
Enter fullscreen mode Exit fullscreen mode

The GC becomes unhappy when I increase the size, due to the address space limitation, and I am unhappy to make the code GC friendly :), but this is enough.

Top comments (0)