Jason Steinhauser

Posted on Feb 25, 2019

Wormhole - Data Collection Cheat Sheet and Library in 4 Languages

#showdev

During the Advent of Code this past year, I was trying to enhance my knowledge of Elixir, as well as just functional programming in general. There were times where I found a function that most other languages I used didn't have an analogous function (a glaringly obvious one being Enum.reduce_while), and other times where I was writing functions I'd used often in other languages (Clojure's frequencies would've been mighty handy!). I finally decided to bite the bullet and try to create a list of collection manipulation functions I used often in other languages, implement them in others I've learned/currently trying to learn, and discover new functions that I wouldn't want to live without!

What Is Wormhole?

It's this, really:

jdsteinhauser / wormhole

Some of my most used functions implemented in different languages

Wormhole

You ever think, "Hey I wish this langugage had the capability of some other language I like?" Enter Wormhole.

Motivation

During the Advent of Code 2018, I found myself writing the same functions in Elixir that I knew I had used in Clojure, F#, or some other language. In order to prevent myself from doing this in the future, I decided that building a library to do house all of these helpful functions across languages that I either knew or wanted to learn.

Desired functions

These are the functions that I've used and that I'd like to have in multiple languages. Some of them are already implemented in the language already, so I won't reimplement them. Each implementation will list the functions as implemented in the language, as well as links to their documentation.

map
filter
reduce
reduce_while
chunk
chunk_by
juxt
min_by
max_by
frequencies
group_by
scan
inc
dec
zip…

View on GitHub

I've got an addiction. I love learning new languages. With learning new languages, you end up finding functions, classes, and concepts that you wish that you had in other languages. Sometimes, those functions are named different things and it gets confusing when you switch between languages. I end up doing a lot of data collection manipulation, and so I decided to start with what I knew best and branch out from there!

What Functions Am I Looking For?

For a non-exhaustive list, I wanted to have at least the following:

Collection basics: map, filter, reduce, and scan
Chunking data: chunk, chunk_by
Common stats: min_by, max_by, group_by, frequencies
Other hella useful things: reduce_while, juxt, identity

What Languages Am I Targeting?

For now, I have filled in my perceived gaps in functions in C#, Clojure, and Elixir. I have an F# solution that I'll be comfortable with early this week, and I've started looking at a comprehensive list of Ruby functions as well. After that... well, I'm not entirely sure! I think I'm going to go through Rust, JavaScript, Java, and possibly Kotlin and Python 3 to see what other handy things I can implement across all those languages.

Will These Be Deployed to Package Managers?

Yes... but not right now. I need to get the documentation to a suitable state. I've pulled down several packages before but I've never pushed mine up to any! I'm sure that will end up being a blog post in and of itself.

Current Cheat Sheet

Here's a summary of the languages I've targeted so far, with documentation links to each function that either already exists, or that I've implemented in Wormhole.

Function	C#	F#	Clojure	Elixir
`map`	`Enumerable.Select`	`Seq.map`	`clojure.core/map`	`Enum.map/2`, `Stream.map/2`
`filter`	`Enumerable.Where`	`Seq.filter`	`clojure.core/filter`	`Enum.filter/2`, `Stream.filter/2`
`reduce`	`Enumerable.Aggregate`	`Seq.reduce`	`clojure.core/reduce`	`Enum.reduce`
`reduce_while`	`ReduceWhile`	`reduceWhile`	`reduce-while`	`Enum.reduce_while/3`
`scan`	`Scan`	`Seq.scan`	`clojure.core/reductions`	`Enum.scan`, `Stream.scan`
`chunk`	`Chunk`	`chunk`*	`clojure.core/partition`	`Enum.chunk_every/4`, `Stream.chunk_every/4`
`chunk_by`	`ChunkBy`	`chunkBy`	`clojure.core/partition-by`	`Enum.chunk_by/2`, `Stream.chunk_by/2`
`juxt`	`Juxt`	`juxt`, `juxt2`, `juxt3`	`clojure.core/juxt`	`Wormhole.juxt/1`
`min_by`	`MinBy`	`Seq.minBy`	`min-by`	`Enum.min_by/3`
`max_by`	`MaxBy`	`Seq.maxBy`	`max-by`	`Enum.max_by/3`
`frequencies`	`Frequencies`	`freqs`	`clojure.core/frequencies`	`Wormhole.freqs/1`
`group_by`	`Enumerable.GroupBy`	`Seq.groupBy`	`clojure.core/group-by`	`Enum.group_by/3`
`identity`	`Identity`	`Operators.id`	`clojure.core/identity`	`Wormhole.identity/1`

F# contains a Seq.windowed function, but it only moves the chunk one element at a time.

Why Is This Stuff Useful?

Well, some of the functions are either self-explanatory or already written about in several other articles. I'll cover some of the lesser known ones and why I personally found them useful.

Chunking

I've written about chunk and chunk_by before, but in case you missed it, check out my previous article!

Alright, Break It Up! Using Partition/ Chunk

Jason Steinhauser ・ Oct 18 '18

#functional #programming #coding #algorithms

Reduce While

I'll admit that this is possibly a not-so-often used case. Sometimes you don't want to reduce an entire sequence - just up to a certain point. Unfortunately, reduce is typically all or nothing. That doesn't really work when you have a potentially infinite series of data. However, Elixir's reduce_while helped me keep my solution for AoC 2018 Day 1 Part 2 compact. I'm hoping to find more real-world use cases for it... but it's still one of my favorite data processing functions I've found.

Juxt

While I admit that, at first glance, juxt is nothing special. Take an array of functions that operate on the same parameters, and then return a single function that takes that parameter and returns an array of each function run on those parameters? Why use that?

I've ported this function from Clojure into other work projects before. For instance, I had a very large collection of data (1MM+ entries!) and I couldn't afford to iterate over them multiple times. I used juxt to compose my analysis functions together so that I only had to iterate over the collection one time.

Similarly, since a keyword in Clojure can be treated as a function for retrieving a value out of a map with that key ((:foo {:foo 5 :bar 3}) returns 5), you can compose several keywords for accessing data out of a collection of maps and returning the results in kind of like a table format. I wrote about that as part of a previous post on dense Clojure code:

A verbose explanation of compact code

Jason Steinhauser ・ May 21 '18

#showdev #clojure #productivity

Frequencies

Because sometimes, you just need a histogram. frequencies provides that in one single function!

Conclusion

Hopefully someone out there will find this useful, either as a cheat sheet or as a library. In the near-term, I will be investigating Ruby and Rust (in that order) to see what other handy functions I could foresee using across multiple languages. I'll also put Wormhole up as a package in your favorite package managers soon, and probably write about the things I do/don't like about each.

Happy coding, and I'd love to hear about other general purpose data manipulation functions you've found useful!

The Essential Toolkit for Front-end Developers

Take a user-centric approach to front-end monitoring that evolves alongside increasingly complex frameworks and single-page applications.

Get The Kit

Top comments (6)

Pieter Slabbert • Feb 27 '19

In clojure you can use reduced to stop before you have done the entire sequence

Cameron Desautels • Feb 27 '19

I was going to say the same! Here's what that looks like:

(reduce (fn [acc x]
          (+ acc x))
        (range 11))
;; => 55

(reduce (fn [acc x]
          (if (> x 3)
            (reduced acc)
            (+ acc x)))
        (range 11))
;; => 6

So we already have that capability built-in (without introducing a new function). Clojure also already has max-by and min-by—max-key, min-key though, perhaps, with a slightly different interface that what you might have expected.

Jason Steinhauser • Feb 28 '19

I was unaware that this function existed! I'll have to take a look into it to see how it could've helped in a few cases. Thanks for letting me know!

Mihail Malo • Feb 26 '19 • Edited

Do you know if there's anything juxtlike that would help this case:

Mihail Malostanidis

Feb 24

There is often a case where I have multiple indexes, so I end up doing something like this:

const cats = [
  { name: "Aeris", id: 0x00, isFavourite: true },
  { name: "Juri", id: 0x01 },
  { name: "Dante", id: 0x03 },
  { name: "Frankenstein", id: 0xff }
]
const byName = new Map()
const byId = new Map()
for (const cat of cats) {
  byName.set(cat.name, cat)
  byId.set(cat.id, cat)
}

If this wasn't as common, I'd probably investigate making a function that takes a predicate and an array and makes an iterator of entries that new Map() can consume.
But like this, I only iterate once to populate multiple Maps.

Plus there's the cases where I receive an object (including from JSON), so normal iteration wouldn't work:

const cats = {
  Aeris: { id: 0x00, isFavourite: true },
  Juri: { id: 0x01 },
  Dante: { id: 0x03 },
  Frankenstein: { id: 0xff }
}
const byName = new Map()
const byId = new Map()
for (const name of Object.keys(cats)) {
  const cat = { name, ...cats[name] }
  byName.set(name, cat)
  byId.set(cat.id, cat)
}

Something like "multiple reducers in one iteration"

Jason Steinhauser • Feb 26 '19

That is an interesting case that I hadn't considered before. I will definitely have to look into it while exploring JavaScript ecosystem more thoroughly!