DEV Community

vindarel
vindarel

Posted on • Edited on

1

Common Lisp's groupBy is Serapeum:assort

It's the second time I search for such a function so here is it: the functional "group by" utility you are looking for is Serapeum's assort.

Example

I have a list of sell objects:

((:|isbn| "9782290252499" :|quantity| 2 :|price| 6.5d0 :|vat| 5.5d0
  :|distributor| "UNION DISTRIBUTION - UD" :|discount| 35.0d0 :|type_name|
  "Livre" :|type_vat| 5.5d0 :|price_bought| "price_bought" :|price_sold|
  5.915d0 :|quantity_sold| 15 :|sold_date| "2024-04-03 10:31:56")
 (:|isbn| "9791034742752" :|quantity| 1 :|price| 23.5d0 :|vat| 5.5d0
  :|distributor| "MDS" :|discount| 35.0d0 :|type_name| "Livre" :|type_vat|
  5.5d0 :|price_bought| "price_bought" :|price_sold| 22.325d0 :|quantity_sold|
  1 :|sold_date| "2024-04-03 08:41:09")
  
)
Enter fullscreen mode Exit fullscreen mode

Here it is a list of plists: a plist is a list that alternates a key (as a symbol) and a value.

I can have more than one plist for the same ISBN number (the "978…"). I want to group all of them together, so that it will be easier to work with them (I need to sum the total sold for each unique ISBN).

I can write my own loop, but I can also just use serapeum's assort:

CL-USER> (assort *SELLS* :key #'isbn)
(((:|isbn| "9782290252499" :|quantity| 2 :|price| 6.5d0 :|vat| 5.5d0
   :|distributor| "UNION DISTRIBUTION - UD" :|discount| 35.0d0 :|type_name|
   "Livre" :|type_vat| 5.5d0 :|price_bought| "price_bought" :|price_sold|
   5.915d0 :|quantity_sold| 15 :|sold_date| "2024-04-03 10:31:56")
  (:|isbn| "9782290252499" :|quantity| 1 :|price| 6.5d0 :|vat| 5.5d0
   :|distributor| "UNION DISTRIBUTION - UD" :|discount| 35.0d0 :|type_name|
   "Livre" :|type_vat| 5.5d0 :|price_bought| 0 :|price_sold|
   5.915d0 :|quantity_sold| 3 :|sold_date| "2024-04-03 10:55:56"))
  
)
Enter fullscreen mode Exit fullscreen mode

Yes, we have a triple (((, we have to follow. That's why it's easier sometimes to create an object class and to see printed object representations, or to use hash-tables (aka dictionaries). Serapeum has the great dict helper if you don't know it. I included it in my workflow. But so far I am following.

assort's full docstring

(assort seq &key key test start end hash)
Enter fullscreen mode Exit fullscreen mode

Return SEQ assorted by KEY.

 (assort (iota 10)
         :key (lambda (n) (mod n 3)))
 => '((0 3 6 9) (1 4 7) (2 5 8))
Enter fullscreen mode Exit fullscreen mode

Groups are ordered as encountered. This property means you could, in principle, use assort to implement remove-duplicates by taking the first element of each group:

 (mapcar #'first (assort list))
 ≡ (remove-duplicates list :from-end t)
Enter fullscreen mode Exit fullscreen mode

However, if TEST is ambiguous (a partial order), and an element could qualify as a member of more than one group, then it is not guaranteed that it will end up in the leftmost group that it could be a member of.

(assort '(1 2 1 2 1 2) :test #'<=)
=> '((1 1) (2 2 1 2))
Enter fullscreen mode Exit fullscreen mode

The default algorithm used by assort is, in the worst case, O(n) in the number of groups. If HASH is specified, then a hash table is used instead. However TEST must be acceptable as the :test argument to make-hash-table.


We also have serapeum's frequencies to check for our EAN13 frequencies:

CL-USER> (frequencies *SELLS* :key #'isbn)

 (dict  
  "9782290252499" 2
  "9791034742752" 1
  "9782361936150" 1
  "9782956296348" 1
  "9782846405287" 1
  "9782492939075" 1
  "9782889755462" 1
  "9791034747979" 1
  "9782203226692" 1
  "9791092752953" 1
  "9782874263699" 1 
 ) 
11
Enter fullscreen mode Exit fullscreen mode

Look at this "dict" representation, it's a hash-table, but user-readable, and that can be read back in by the lisp reader (if you serialize it for instance). You know this already if you read the CL Cookbook.

That's all, googlers o/


Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more →

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more