DEV Community

Discussion on: Performance measurement of JavaScript solutions to common algorithmic questions (part 1)

Collapse
 
alephnaught2tog profile image
Max Cerrina

You can preallocate memory using the Buffer class if you're in a node environment.

Collapse
 
detunized profile image
Dmitry Yakimenko

I measured it, the conversion to string afterwards makes it quite slow. Slower than the alternatives.

Collapse
 
qm3ster profile image
Mihail Malo

Did you try latin1 or ascii conversion (both ways)?

Thread Thread
 
detunized profile image
Dmitry Yakimenko

Not sure. I didn't enforce any encoding. The content of the string was "a a a a...".

Thread Thread
 
qm3ster profile image
Mihail Malo • Edited

The default is 'utf8', which is variable-length characters. So naturally it's much slower.
'latin1' is 1-1 char for byte (Obviously high Unicode will get corrupted, but if you start with a Buffer, you will always be able to get it back.
'ascii' decodes the same as 'latin1', but the encoder drops the first bit.

> Buffer.from(Buffer.from([0xff]).toString('latin1'),'latin1')
<Buffer ff>
> Buffer.from(Buffer.from([0xff]).toString('latin1'),'ascii')
<Buffer ff>
> Buffer.from(Buffer.from([0xff]).toString('ascii'),'latin1')
<Buffer 7f>

UTF8, lol:

Buffer.from(Buffer.from([0xff]).toString())
<Buffer ef bf bd>

Notably, V8 doesn't use UTF8 to represent strings/string chunks in memory. It uses something like UTF16 (2 or 4 bytes per character, never 1 or 3) and sometimes opportunistically optimizes to using something like 'latin1' when high characters aren't used.
So, for weird strings 'utf16le' is fastest, and for basic strings, 'latin1'/'ascii'. The default of 'utf8' is the slowest in all cases (but the most compatible and compact without compression)

const { log, time, timeEnd, error } = console
const announce = msg => {
  const bar = "=".repeat(msg.length)
  log(bar); log(msg); log(bar)
}

const [a, z, _] = Buffer.from("az ", "latin1")
const len = 1 + z - a
const strings = new Array(len)
for (let i = 0; i < len; i++)
  strings[i] = Buffer.alloc(0x1000000, Buffer.from([a + i, _]))
    .toString("latin1")

const badStuff = ["💩 💩 💩", "ਸਮਾਜਵਾਦ", "สังคมนิยม"]
  .map(x => x.repeat(0x200000))

const tainted = strings.map(x => "💩" + x)

const benchmarkEncoding = strings => enc => {
  let failed
  time(enc)
  for (const x of strings) {
    const out = Buffer.from(x, enc).toString(enc)
    if (out !== x && !failed) failed = { enc, x, out }
  }
  timeEnd(enc)
  if (failed)
    error(failed.enc, "failed:", failed.x.slice(0, 6), failed.out.slice(0, 6))
}
const encodings = ["utf16le", "utf8", "latin1", "ascii"]

announce("Normal")
encodings.forEach(benchmarkEncoding(strings))
announce("And now, the bad stuff!")
encodings.forEach(benchmarkEncoding(badStuff))
announce("What about tainted ASCII?")
encodings.forEach(benchmarkEncoding(tainted))

Edit: So, I was wrong. 'utf16le' is slower than 'utf8' on pure ASCII (But 3+ times faster otherwise)

Thread Thread
 
detunized profile image
Dmitry Yakimenko

Some thorough research. Well done! One small problem with this, is all of this doesn't really help with the original problem =) We wanted to count letters or reverse them or whatever. It's not correct to assume any trivial encoding, like latin1 or ascii. Nor it helps to put everything in the byte buffer, even correctly encoded sequence of bytes. But I'm sure there are use cases where it's ok to assume 7 or 8 bit encoding, like massive amounts of base64, for example.

Thread Thread
 
qm3ster profile image
Mihail Malo

There's Buffer.from('ffff','base64') for that :D
I was just responding to you saying that in your quest to to count letters or reverse them or whatever you rejected Buffer because of the conversion time.
So I was wondering if it would be more competitive, on your admittedly ASCII 'a a a a a ... data, without the overhead of the default encoding.

Thread Thread
 
detunized profile image
Dmitry Yakimenko • Edited

You are right, it's much faster on Node using Buffer to reverse a string if it's known it's ASCII. I assume the same goes for any fixed character size encoding. Code:

function reverseStringBuffer(str) {
    let b = Buffer.from(str, "ascii");
    for (let i = 0, n = str.length; i < n / 2; i++) {
        let t = b[i];
        b[i] = b[n - 1 - i];
        b[n - 1 - i] = t;
    }
    return b.toString("ascii");
}

Look at the green line:

Thread Thread
 
qm3ster profile image
Mihail Malo

Heeeey, that's pretty fast :v

I don't know what you're using to benchmark and draw these nice graphs, so I'm forced to bother you again:
Can you try it with Buffer.reverse()? :D
It's in place, like the Array one, and inherited from Uint8Array.

Thread Thread
 
detunized profile image
Dmitry Yakimenko

The code is mentioned in the bottom of the post: gist.github.com/detunized/17f524d7...
The graphs I draw using Google Sheets. I can add this code later to my set. I actually looked for reverse on Buffet but didn't see it. Weird.

Thread Thread
 
qm3ster profile image
Mihail Malo • Edited

Yeah, it's not listed on the Node docs, because it's inherited.
It's on MDN though.
This is over twice as fast :D

function reverseStringBuffer2(str) {
  return Buffer.from(str, "ascii").reverse().toString("ascii");
}

This was 💩 though:

function findLongestWordLengthBuffer(str) {
  const buf = Buffer.from(str, 'ascii')
  const length = buf.length
  let maxLength = 0;
  let start = 1;
  let offset = 0;
  let last = false;
  while (!last) {
    const found = buf.indexOf(32, offset)
    offset = found+1
    last = offset === 0
    const len = found - start
    start = found
    if (len > maxLength) maxLength = len
  }
  return maxLength
}

Just plugging a buffer into your Fast is twice faster on long strings, same and maybe a little worse on short:

function findLongestWordLengthBufferFast(str) {
  const buf = Buffer.from(str,'ascii')
  let l = buf.length;

  let maxLength = 0;
  let currentLength = 0;

  for (let i = 0; i < l; ++i) {
      if (buf[i] === 32) {
          if (currentLength > maxLength) {
              maxLength = currentLength;
          }
          currentLength = 0;
      } else {
          ++currentLength;
     }
  }

  // Account for the last word
  return currentLength > maxLength ? currentLength : maxLength;
}