During the Advent of Code this past year, I was trying to enhance my knowledge of Elixir, as well as just functional programming in general. There were times where I found a function that most other languages I used didn't have an analogous function (a glaringly obvious one being Enum.reduce_while
), and other times where I was writing functions I'd used often in other languages (Clojure's frequencies
would've been mighty handy!). I finally decided to bite the bullet and try to create a list of collection manipulation functions I used often in other languages, implement them in others I've learned/currently trying to learn, and discover new functions that I wouldn't want to live without!
What Is Wormhole?
It's this, really:
jdsteinhauser / wormhole
Some of my most used functions implemented in different languages
Wormhole
You ever think, "Hey I wish this langugage had the capability of some other language I like?" Enter Wormhole.
Motivation
During the Advent of Code 2018, I found myself writing the same functions in Elixir that I knew I had used in Clojure, F#, or some other language. In order to prevent myself from doing this in the future, I decided that building a library to do house all of these helpful functions across languages that I either knew or wanted to learn.
Desired functions
These are the functions that I've used and that I'd like to have in multiple languages. Some of them are already implemented in the language already, so I won't reimplement them. Each implementation will list the functions as implemented in the language, as well as links to their documentation.
map
filter
reduce
reduce_while
chunk
chunk_by
juxt
min_by
max_by
frequencies
group_by
scan
inc
dec
-
zip
…
I've got an addiction. I love learning new languages. With learning new languages, you end up finding functions, classes, and concepts that you wish that you had in other languages. Sometimes, those functions are named different things and it gets confusing when you switch between languages. I end up doing a lot of data collection manipulation, and so I decided to start with what I knew best and branch out from there!
What Functions Am I Looking For?
For a non-exhaustive list, I wanted to have at least the following:
- Collection basics:
map
,filter
,reduce
, andscan
- Chunking data:
chunk
,chunk_by
- Common stats:
min_by
,max_by
,group_by
,frequencies
- Other hella useful things:
reduce_while
,juxt
,identity
What Languages Am I Targeting?
For now, I have filled in my perceived gaps in functions in C#, Clojure, and Elixir. I have an F# solution that I'll be comfortable with early this week, and I've started looking at a comprehensive list of Ruby functions as well. After that... well, I'm not entirely sure! I think I'm going to go through Rust, JavaScript, Java, and possibly Kotlin and Python 3 to see what other handy things I can implement across all those languages.
Will These Be Deployed to Package Managers?
Yes... but not right now. I need to get the documentation to a suitable state. I've pulled down several packages before but I've never pushed mine up to any! I'm sure that will end up being a blog post in and of itself.
Current Cheat Sheet
Here's a summary of the languages I've targeted so far, with documentation links to each function that either already exists, or that I've implemented in Wormhole.
Function | C# | F# | Clojure | Elixir |
---|---|---|---|---|
map |
Enumerable.Select |
Seq.map |
clojure.core/map |
Enum.map/2 , Stream.map/2
|
filter |
Enumerable.Where |
Seq.filter |
clojure.core/filter |
Enum.filter/2 , Stream.filter/2
|
reduce |
Enumerable.Aggregate |
Seq.reduce |
clojure.core/reduce |
Enum.reduce |
reduce_while |
ReduceWhile |
reduceWhile |
reduce-while |
Enum.reduce_while/3 |
scan |
Scan |
Seq.scan |
clojure.core/reductions |
Enum.scan , Stream.scan
|
chunk |
Chunk |
chunk * |
clojure.core/partition |
Enum.chunk_every/4 , Stream.chunk_every/4
|
chunk_by |
ChunkBy |
chunkBy |
clojure.core/partition-by |
Enum.chunk_by/2 , Stream.chunk_by/2
|
juxt |
Juxt |
juxt , juxt2 , juxt3
|
clojure.core/juxt |
Wormhole.juxt/1 |
min_by |
MinBy |
Seq.minBy |
min-by |
Enum.min_by/3 |
max_by |
MaxBy |
Seq.maxBy |
max-by |
Enum.max_by/3 |
frequencies |
Frequencies |
freqs |
clojure.core/frequencies |
Wormhole.freqs/1 |
group_by |
Enumerable.GroupBy |
Seq.groupBy |
clojure.core/group-by |
Enum.group_by/3 |
identity |
Identity |
Operators.id |
clojure.core/identity |
Wormhole.identity/1 |
- F# contains a
Seq.windowed
function, but it only moves the chunk one element at a time.
Why Is This Stuff Useful?
Well, some of the functions are either self-explanatory or already written about in several other articles. I'll cover some of the lesser known ones and why I personally found them useful.
Chunking
I've written about chunk
and chunk_by
before, but in case you missed it, check out my previous article!
Alright, Break It Up! Using Partition/ Chunk
Jason Steinhauser ・ Oct 18 '18
Reduce While
I'll admit that this is possibly a not-so-often used case. Sometimes you don't want to reduce an entire sequence - just up to a certain point. Unfortunately, reduce
is typically all or nothing. That doesn't really work when you have a potentially infinite series of data. However, Elixir's reduce_while
helped me keep my solution for AoC 2018 Day 1 Part 2 compact. I'm hoping to find more real-world use cases for it... but it's still one of my favorite data processing functions I've found.
Juxt
While I admit that, at first glance, juxt
is nothing special. Take an array of functions that operate on the same parameters, and then return a single function that takes that parameter and returns an array of each function run on those parameters? Why use that?
I've ported this function from Clojure into other work projects before. For instance, I had a very large collection of data (1MM+ entries!) and I couldn't afford to iterate over them multiple times. I used juxt
to compose my analysis functions together so that I only had to iterate over the collection one time.
Similarly, since a keyword in Clojure can be treated as a function for retrieving a value out of a map with that key ((:foo {:foo 5 :bar 3})
returns 5
), you can compose several keywords for accessing data out of a collection of maps and returning the results in kind of like a table format. I wrote about that as part of a previous post on dense Clojure code:
A verbose explanation of compact code
Jason Steinhauser ・ May 21 '18
Frequencies
Because sometimes, you just need a histogram. frequencies
provides that in one single function!
Conclusion
Hopefully someone out there will find this useful, either as a cheat sheet or as a library. In the near-term, I will be investigating Ruby and Rust (in that order) to see what other handy functions I could foresee using across multiple languages. I'll also put Wormhole up as a package in your favorite package managers soon, and probably write about the things I do/don't like about each.
Happy coding, and I'd love to hear about other general purpose data manipulation functions you've found useful!
Top comments (6)
In clojure you can use reduced to stop before you have done the entire sequence
I was going to say the same! Here's what that looks like:
So we already have that capability built-in (without introducing a new function). Clojure also already has
max-by
andmin-by
—max-key, min-key though, perhaps, with a slightly different interface that what you might have expected.I was unaware that this function existed! I'll have to take a look into it to see how it could've helped in a few cases. Thanks for letting me know!
Do you know if there's anything
juxt
like that would help this case:There is often a case where I have multiple indexes, so I end up doing something like this:
If this wasn't as common, I'd probably investigate making a function that takes a predicate and an array and makes an iterator of entries that
new Map()
can consume.But like this, I only iterate once to populate multiple
Map
s.Plus there's the cases where I receive an object (including from JSON), so normal iteration wouldn't work:
Something like "multiple reducers in one iteration"
That is an interesting case that I hadn't considered before. I will definitely have to look into it while exploring JavaScript ecosystem more thoroughly!
In OOP it would probably be this: