DEV Community

Vee Satayamas
Vee Satayamas

Posted on

2 1

Reading lines using Flexi-streams on a Zstd stream is not fast.

I use Flexi-streams to read 1,000 lines from a ZSTD archive and find the average length of lines. My program looks like the one below.

(defun average-1000-etipitaka-flexi-streams ()                                                                                                                                             
  (with-open-file (f #P"etipitaka.txt.zst" :element-type '(unsigned-byte 8))                                                                                 
    (zstd:with-decompressing-stream (zs f)                                                                                                                   
      (let ((s (utf8-input-stream:make-utf8-input-stream zs)))                                                                                               
        (loop for line = (read-line s nil nil)                                                                                                               
              until (null line)                                                                                                                              
              count 1 into l                                                                                                                                 
              sum (length line) into c                                                                                                                       
              when (> l 1000) do (return (float (/ c l)))                                                                                                    
              finally (return (float (/ c l))))))))    
Enter fullscreen mode Exit fullscreen mode

The text in the file etipitaka.txt.zst looks like below:

สิงสถิต เขตเมืองเวรัญชา พร้อมด้วยภิกษุสงฆ์หมู่ใหญ่ประมาณ ๕๐๐ รูป เวรัญชพราหมณ์                                                                                              
ได้สดับข่าวถนัดแน่ว่า ท่านผู้เจริญ พระสมณะโคดมศากยบุตร ทรงผนวชจากศากยตระกูล                                                                                              
ประทับอยู่ ณ บริเวณต้นไม้สะเดาที่นเฬรุยักษ์สิงสถิต เขตเมืองเวรัญชา พร้อมด้วยภิกษุสงฆ์   
Enter fullscreen mode Exit fullscreen mode

The average line length is 68.8 bytes.

I ran the average-1000-etipitaka-flexi-streams on SBCL 2.2.5-1.1-suse on my laptop with Celeron N4500. It took 1.591 seconds.

Then I change the file to my-data.ndjson.zst, whose average line length is 515.5 bytes. Running average-1000-ndjson-flexi-streams took 4.411 seconds.

So I also tested with my customized utf8-input-stream. Running average-1000-etipitaka-utf8-input-stream, and average-1000-ndjson-utf8-input-stream took 0.019 seconds, and 0.043 seconds respectively, which means utf8-input-stream is 83X faster for short lines, and 102X faster for long lines, than Flexi-streams in these tests.

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs