DEV Community

Vee Satayamas
Vee Satayamas

Posted on

2 1

Reading lines using Flexi-streams on a Zstd stream is not fast.

I use Flexi-streams to read 1,000 lines from a ZSTD archive and find the average length of lines. My program looks like the one below.

(defun average-1000-etipitaka-flexi-streams ()                                                                                                                                             
  (with-open-file (f #P"etipitaka.txt.zst" :element-type '(unsigned-byte 8))                                                                                 
    (zstd:with-decompressing-stream (zs f)                                                                                                                   
      (let ((s (utf8-input-stream:make-utf8-input-stream zs)))                                                                                               
        (loop for line = (read-line s nil nil)                                                                                                               
              until (null line)                                                                                                                              
              count 1 into l                                                                                                                                 
              sum (length line) into c                                                                                                                       
              when (> l 1000) do (return (float (/ c l)))                                                                                                    
              finally (return (float (/ c l))))))))    
Enter fullscreen mode Exit fullscreen mode

The text in the file etipitaka.txt.zst looks like below:

สิงสถิต เขตเมืองเวรัญชา พร้อมด้วยภิกษุสงฆ์หมู่ใหญ่ประมาณ ๕๐๐ รูป เวรัญชพราหมณ์                                                                                              
ได้สดับข่าวถนัดแน่ว่า ท่านผู้เจริญ พระสมณะโคดมศากยบุตร ทรงผนวชจากศากยตระกูล                                                                                              
ประทับอยู่ ณ บริเวณต้นไม้สะเดาที่นเฬรุยักษ์สิงสถิต เขตเมืองเวรัญชา พร้อมด้วยภิกษุสงฆ์   
Enter fullscreen mode Exit fullscreen mode

The average line length is 68.8 bytes.

I ran the average-1000-etipitaka-flexi-streams on SBCL 2.2.5-1.1-suse on my laptop with Celeron N4500. It took 1.591 seconds.

Then I change the file to my-data.ndjson.zst, whose average line length is 515.5 bytes. Running average-1000-ndjson-flexi-streams took 4.411 seconds.

So I also tested with my customized utf8-input-stream. Running average-1000-etipitaka-utf8-input-stream, and average-1000-ndjson-utf8-input-stream took 0.019 seconds, and 0.043 seconds respectively, which means utf8-input-stream is 83X faster for short lines, and 102X faster for long lines, than Flexi-streams in these tests.

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more

Top comments (0)

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up