Common Lisp's groupBy is Serapeum:assort

#lisp #commonlisp

It's the second time I search for such a function so here is it: the functional "group by" utility you are looking for is Serapeum's assort.

Example

I have a list of sell objects:

((:|isbn| "9782290252499" :|quantity| 2 :|price| 6.5d0 :|vat| 5.5d0
  :|distributor| "UNION DISTRIBUTION - UD" :|discount| 35.0d0 :|type_name|
  "Livre" :|type_vat| 5.5d0 :|price_bought| "price_bought" :|price_sold|
  5.915d0 :|quantity_sold| 15 :|sold_date| "2024-04-03 10:31:56")
 (:|isbn| "9791034742752" :|quantity| 1 :|price| 23.5d0 :|vat| 5.5d0
  :|distributor| "MDS" :|discount| 35.0d0 :|type_name| "Livre" :|type_vat|
  5.5d0 :|price_bought| "price_bought" :|price_sold| 22.325d0 :|quantity_sold|
  1 :|sold_date| "2024-04-03 08:41:09")
  …
)

Here it is a list of plists: a plist is a list that alternates a key (as a symbol) and a value.

I can have more than one plist for the same ISBN number (the "978…"). I want to group all of them together, so that it will be easier to work with them (I need to sum the total sold for each unique ISBN).

I can write my own loop, but I can also just use serapeum's assort:

CL-USER> (assort *SELLS* :key #'isbn)
(((:|isbn| "9782290252499" :|quantity| 2 :|price| 6.5d0 :|vat| 5.5d0
   :|distributor| "UNION DISTRIBUTION - UD" :|discount| 35.0d0 :|type_name|
   "Livre" :|type_vat| 5.5d0 :|price_bought| "price_bought" :|price_sold|
   5.915d0 :|quantity_sold| 15 :|sold_date| "2024-04-03 10:31:56")
  (:|isbn| "9782290252499" :|quantity| 1 :|price| 6.5d0 :|vat| 5.5d0
   :|distributor| "UNION DISTRIBUTION - UD" :|discount| 35.0d0 :|type_name|
   "Livre" :|type_vat| 5.5d0 :|price_bought| 0 :|price_sold|
   5.915d0 :|quantity_sold| 3 :|sold_date| "2024-04-03 10:55:56"))
  …
)

Yes, we have a triple (((, we have to follow. That's why it's easier sometimes to create an object class and to see printed object representations, or to use hash-tables (aka dictionaries). Serapeum has the great dict helper if you don't know it. I included it in my workflow. But so far I am following.

`assort`'s full docstring

(assort seq &key key test start end hash)

Return SEQ assorted by KEY.

 (assort (iota 10)
         :key (lambda (n) (mod n 3)))
 => '((0 3 6 9) (1 4 7) (2 5 8))

Groups are ordered as encountered. This property means you could, in principle, use assort to implement remove-duplicates by taking the first element of each group:

 (mapcar #'first (assort list))
 ≡ (remove-duplicates list :from-end t)

However, if TEST is ambiguous (a partial order), and an element could qualify as a member of more than one group, then it is not guaranteed that it will end up in the leftmost group that it could be a member of.

(assort '(1 2 1 2 1 2) :test #'<=)
=> '((1 1) (2 2 1 2))

The default algorithm used by assort is, in the worst case, O(n) in the number of groups. If HASH is specified, then a hash table is used instead. However TEST must be acceptable as the :test argument to make-hash-table.

We also have serapeum's frequencies to check for our EAN13 frequencies:

CL-USER> (frequencies *SELLS* :key #'isbn)

 (dict  
  "9782290252499" 2
  "9791034742752" 1
  "9782361936150" 1
  "9782956296348" 1
  "9782846405287" 1
  "9782492939075" 1
  "9782889755462" 1
  "9791034747979" 1
  "9782203226692" 1
  "9791092752953" 1
  "9782874263699" 1 
 ) 
11

Look at this "dict" representation, it's a hash-table, but user-readable, and that can be read back in by the lisp reader (if you serialize it for instance). You know this already if you read the CL Cookbook.

you also have this library: https://github.com/AccelerationNet/group-by Its result is different, it returns an alist with the common parameter as the key.

That's all, googlers o/