DEV Community

vindarel
vindarel

Posted on • Edited on

Common Lisp's groupBy is Serapeum:assort

It's the second time I search for such a function so here is it: the functional "group by" utility you are looking for is Serapeum's assort.

Example

I have a list of sell objects:

((:|isbn| "9782290252499" :|quantity| 2 :|price| 6.5d0 :|vat| 5.5d0
  :|distributor| "UNION DISTRIBUTION - UD" :|discount| 35.0d0 :|type_name|
  "Livre" :|type_vat| 5.5d0 :|price_bought| "price_bought" :|price_sold|
  5.915d0 :|quantity_sold| 15 :|sold_date| "2024-04-03 10:31:56")
 (:|isbn| "9791034742752" :|quantity| 1 :|price| 23.5d0 :|vat| 5.5d0
  :|distributor| "MDS" :|discount| 35.0d0 :|type_name| "Livre" :|type_vat|
  5.5d0 :|price_bought| "price_bought" :|price_sold| 22.325d0 :|quantity_sold|
  1 :|sold_date| "2024-04-03 08:41:09")
  
)
Enter fullscreen mode Exit fullscreen mode

Here it is a list of plists: a plist is a list that alternates a key (as a symbol) and a value.

I can have more than one plist for the same ISBN number (the "978…"). I want to group all of them together, so that it will be easier to work with them (I need to sum the total sold for each unique ISBN).

I can write my own loop, but I can also just use serapeum's assort:

CL-USER> (assort *SELLS* :key #'isbn)
(((:|isbn| "9782290252499" :|quantity| 2 :|price| 6.5d0 :|vat| 5.5d0
   :|distributor| "UNION DISTRIBUTION - UD" :|discount| 35.0d0 :|type_name|
   "Livre" :|type_vat| 5.5d0 :|price_bought| "price_bought" :|price_sold|
   5.915d0 :|quantity_sold| 15 :|sold_date| "2024-04-03 10:31:56")
  (:|isbn| "9782290252499" :|quantity| 1 :|price| 6.5d0 :|vat| 5.5d0
   :|distributor| "UNION DISTRIBUTION - UD" :|discount| 35.0d0 :|type_name|
   "Livre" :|type_vat| 5.5d0 :|price_bought| 0 :|price_sold|
   5.915d0 :|quantity_sold| 3 :|sold_date| "2024-04-03 10:55:56"))
  
)
Enter fullscreen mode Exit fullscreen mode

Yes, we have a triple (((, we have to follow. That's why it's easier sometimes to create an object class and to see printed object representations, or to use hash-tables (aka dictionaries). Serapeum has the great dict helper if you don't know it. I included it in my workflow. But so far I am following.

assort's full docstring

(assort seq &key key test start end hash)
Enter fullscreen mode Exit fullscreen mode

Return SEQ assorted by KEY.

 (assort (iota 10)
         :key (lambda (n) (mod n 3)))
 => '((0 3 6 9) (1 4 7) (2 5 8))
Enter fullscreen mode Exit fullscreen mode

Groups are ordered as encountered. This property means you could, in principle, use assort to implement remove-duplicates by taking the first element of each group:

 (mapcar #'first (assort list))
 ≡ (remove-duplicates list :from-end t)
Enter fullscreen mode Exit fullscreen mode

However, if TEST is ambiguous (a partial order), and an element could qualify as a member of more than one group, then it is not guaranteed that it will end up in the leftmost group that it could be a member of.

(assort '(1 2 1 2 1 2) :test #'<=)
=> '((1 1) (2 2 1 2))
Enter fullscreen mode Exit fullscreen mode

The default algorithm used by assort is, in the worst case, O(n) in the number of groups. If HASH is specified, then a hash table is used instead. However TEST must be acceptable as the :test argument to make-hash-table.


We also have serapeum's frequencies to check for our EAN13 frequencies:

CL-USER> (frequencies *SELLS* :key #'isbn)

 (dict  
  "9782290252499" 2
  "9791034742752" 1
  "9782361936150" 1
  "9782956296348" 1
  "9782846405287" 1
  "9782492939075" 1
  "9782889755462" 1
  "9791034747979" 1
  "9782203226692" 1
  "9791092752953" 1
  "9782874263699" 1 
 ) 
11
Enter fullscreen mode Exit fullscreen mode

Look at this "dict" representation, it's a hash-table, but user-readable, and that can be read back in by the lisp reader (if you serialize it for instance). You know this already if you read the CL Cookbook.

That's all, googlers o/


Top comments (0)