loading...
Cover image for Golfing language extensions

Golfing language extensions

tfausak profile image Taylor Fausak ・10 min read

Recently Baldur Blöndal posted an interesting challenge on Twitter:

When I first saw this there were already a few responses. I played with them a little to produce this monstrosity:

x# _=id @(__)mdo if|let?_=(0x0.0,'(),)[0b_0^0e0| !_[]|_[],then\case]

Those 73 characters require 21 language extensions to compile correctly! This blog post aims to quickly explain what that code does by showing how to require various language extensions with the least amount of code.

Before we get started, a couple notes:

  • The challenge was to require the most language extensions with the fewest characters. I would be happy to be proven wrong, but I think you can't do much better than about 10 extensions with 23 characters. Most things have so much syntactic overhead that it screws up the ratio.

  • Because of that, I decided to change the challenge into roughly: "How many extensions can you require from a single 80-character line of code?" This allows showing off more extensions even though they require more than about two characters.

  • I'm doing all of this with GHC 8.10 on Ubuntu. My goal is to produce a program that compiles. I don't care about warnings, runtime errors, or portability. The code should fail to compile without each extension. Duplicate extensions like RankNTypes and Rank2Types don't count.

Alright. Let's get started.

Baseline

Since we're playing some variant of code golf here, the idea is to use the fewest number of characters. We should set a baseline for the smallest program that will compile. Since GHC will compile an empty program, that's a total of zero characters. But that's not very interesting.

The original challenge looked for a high ratio of extensions to characters, so I thought it made sense to use a regular value declaration, like this:

-- shortest non-empty program ghc compiles
x=0

There's almost no overhead with that. You have to come up with a name and a value, but those can probably be used later. The equals sign is true overhead, but it's less than other approaches like types or classes.

-- these ones are a little longer
type T=()
class C a
instance Show()

MagicHash

The first thing we can do is add a hash to the end of our value's name to require the MagicHash extension. Like many of the extensions we'll encounter, this is purely syntactic. It simply allows you to name things with a hash after them.

x#=0

ScopedTypeVariables

Typically you would give a top-level declaration a standalone type signature. Or perhaps you would annotate its value. But did you know that the ScopedTypeVariables extension allows you to provide the signature inline? It's true!

x#::Int=0

Normally this extension is used to add type signatures to polymorphic local bindings, but it works just as well for our purposes. Here's an example of its more typical usage:

-- typical scoped type variable usage
main = do
  let zero :: Int = 0
  line :: String <- getLine
  print (zero, line)

UnicodeSyntax

That pesky type signature is taking up two whole characters! We can do better. The UnicodeSyntax language extension allows :: to be replaced by , which is only a single character.

x#Int=0

Note that code golf usually counts the number of bytes to discourage exploiting Unicode like this.

PartialTypeSignatures

It's kind of annoying that we have Int in there. It takes up three characters without really requiring any language extensions itself. Fortunately PartialTypeSignatures lets us save two characters and require another extension.

x# _=0

Note that we have to put a space after the hash otherwise this would compile without the MagicHash extension because #∷ would be interpreted as an operator.

This gets GHC to infer the type even though we have an explicit type signature. In normal usage this is used to specify a small part of a larger type.

TypeApplications

We've been focused on the left side of the equals sign. Let's turn to the right side. First let's introduce an unnecessary call to id. It doesn't do anything by itself, but we'll expand on it in a second.

x# _=id 0

Since id is polymorphic, we can use TypeApplications to pick a concrete type for it. But just like before we don't want to actually pick a type because that will take too many characters to write out. Instead we'll let GHC infer it.

x# _=id @_ 0

Although the type application looks like it would require PartialTypeSignatures, it doesn't. Note that there must be a space before the type application otherwise it won't compile.

KindSignatures

We have one last type-level trick up our sleeve. In addition to giving type signatures to values, we can give kind signatures to types. Since we have two type signatures, we have to pick which one to decorate with a kind signature. Each case would require parentheses, so we'll add it to the type application since it will allow us to remove the space before the 0.

x# _=id @(__)0

Curiously this kind signature doesn't require PartialTypeSignatures either. Also note that we're still using the Unicode instead of ::.

BlockArguments

Normally if you want to pass a do block to a function you either have to use parentheses (f (do ...)) or some operator (f $ do ...). The recent BlockArguments extension allows you to avoid any decoration around your argument.

x# _=id @(__)do 0

RecursiveDo

I've never used this extension, so I don't have much to say about it. By changing our do to an mdo, we can require another extension. The mechanics of regular versus recursive do binds don't matter since we're not using them anyway.

x# _=id @(__)mdo 0

BinaryLiterals

At long last we come to that 0 at the far right side. It's not requiring any extensions. Let's fix that!

By default Haskell lets you write octal (0o0) and hexadecimal (0x0) literals. With the BinaryLiterals extension you can also write numbers in binary.

x# _=id @(__)mdo 0b0

NumericUnderscores

The NumericUnderscores extension lets you put underscores in numeric literals, as you might have guessed from the name. We can take advantage of this by adding another underscore that will require another extension.

x# _=id @(__)mdo 0b_0

NumDecimals

There's another extension to the numeric literal syntax that allows you to write integral values using scientific notation. This is useful for writing really big numbers without adding a bunch of zeroes.

Unfortunately this one is a little tricky to actually require. Simply writing a number like 1e2 will infer a Fractional constraint and compile just fine without the extension. We need a way to write a number like 1e2 but also forbid it from being Fractional. And we need to do all that without eating up too many characters.

Fortunately there's a handy operator in the Prelude that puts the right constraints on its arguments:

(^) :: (Num a, Integral b) => a -> b -> a

So we can put a number that requires the NumDecimals extension as the second argument to (^) and we should be good.

x# _=id @(__)mdo 0b_0^0e0

ImplicitParams

At this point it starts to feel like we've run out of easy wins. Time to reach for something that adds a bunch of characters but also adds a bunch of flexibility: ImplicitParams.

This is going to add a bunch of characters because we're forced to also add a let ... in ... expression in order to bind the implicit parameter. That's alright though, we'll make up for it later.

x# _=id @(__)mdo let?_=()in 0b_0^0e0

MultiWayIf

It would be nice if we could replace that let ... in ... with something more exotic. Fortunately multi-way ifs don't have a lot of overhead themselves and allow us to exploit a quirk of Haskell's syntax. This will take a little explaining.

When you use guard syntax, the expression must return a Bool. You're allowed to put any expression in there, including something with let ... in ... as part of it. For example:

unit = case "example" of
  string
    | let size = 7 in length string == size
      -> ()

The weird thing about this is that you can have let without in when you're in a guard clause. The following example is completely valid and compiles without warnings:

unit = case "example" of
  string
    | let size = 7
      -> ()

The guard clause is effectively empty and always evaluates to True even though we never produced a Bool from it. We can use this trick to introduce a let binding without paying for the corresponding in.

However that would hardly be worth it if we had to use case because that introduces so many other characters. Thankfully multi-way ifs are more compact. Bringing it all together we can replace our boring let?_=()in with the novel if|let?_=()→.

x# _=id @(__)mdo if|let?_=()0b_0^0e0

Note that we're using the Unicode arrow instead of the ASCII -> to save a character.

HexFloatLiterals

Once again we are going to abuse numeric literals for our game of code golf. This time we'll be writing a floating point literal using hexadecimal syntax, which requires an extension.

x# _=id @(__)mdo if|let?_=0x0.00b_0^0e0

TupleSections

Why stop there? Now that we have an unused binding just sitting there, we can cram a bunch of interesting values into it. We'll start by converting it into a tuple section, which is a concise way of writing a lambda that produces a tuple.

x# _=id @(__)mdo if|let?_=(0x0.0,)0b_0^0e0

TemplateHaskellQuotes

Somehow we've made it this far without introducing any Template Haskell or quasi-quotes. As far as I can tell the easiest way to require some form of TH is to use the quoted name syntax. We'll be grabbing the name of () because it's the shortest constructor name available.

x# _=id @(__)mdo if|let?_=(0x0.0,'(),)0b_0^0e0

BangPatterns

This one requires a bit of a walk. It feels like we should have been able to require a bang pattern by now, but I don't think they're allowed on any of the binders we have. We need to introduce a new binder that can have a bang pattern on it. We're not going to do this in the most efficient way, but hopefully it'll be worth it later.

We're going to use a list comprehension to introduce a new binding. As mentioned, this is a bunch extra characters.

x# _=id @(__)mdo if|let?_=(0x0.0,'(),)[0b_0^0e0| !_<-[]]

Note that we need a space before the exclamation point otherwise |! would be interpreted as an operator.

ParallelListComp

Fortunately for us, list comprehensions have a few associated language extensions. By adding five more characters we can introduce a parallel list comprehension.

x# _=id @(__)mdo if|let?_=(0x0.0,'(),)[0b_0^0e0| !_[]|_[]]

TransformListComp

I've got to say, this is one of the weirder extensions. It augments list comprehensions with a SQL-like syntax for querying data. But that doesn't really matter for us. All we care about is how many characters it takes to require another extension. Answer: eight.

x# _=id @(__)mdo if|let?_=(0x0.0,'(),)[0b_0^0e0| !_[]|_[],then id]

LambdaCase

We're nearly running out of room on our 80-character line. How can we add more language extensions? There's another lightweight syntactic extension to reach for: LambdaCase. It lets us write a short lambda and case together without naming the lambda's variable.

x# _=id @(__)mdo if|let?_=(0x0.0,'(),)[0b_0^0e0| !_[]|_[],then\case xx]

EmptyCase

The actual body of the case is kind of annoying. It's taking up five characters without contributing anything! Wouldn't you know it, the EmptyCase extension lets us get rid of that.

x# _=id @(__)mdo if|let?_=(0x0.0,'(),)[0b_0^0e0| !_[]|_[],then\case]

So there you have it. That's as many language extensions as I can figure out how to require on a single 80-character line of Haskell code. Can you do better? Are there tricks that I'm missing? Let me know!


Here's the whole thing written with slightly more normal formatting:

{-# language BangPatterns #-}
{-# language BinaryLiterals #-}
{-# language BlockArguments #-}
{-# language EmptyCase #-}
{-# language HexFloatLiterals #-}
{-# language ImplicitParams #-}
{-# language KindSignatures #-}
{-# language LambdaCase #-}
{-# language MagicHash #-}
{-# language MultiWayIf #-}
{-# language NumDecimals #-}
{-# language NumericUnderscores #-}
{-# language ParallelListComp #-}
{-# language PartialTypeSignatures #-}
{-# language RecursiveDo #-}
{-# language ScopedTypeVariables #-}
{-# language TemplateHaskellQuotes #-}
{-# language TransformListComp #-}
{-# language TupleSections #-}
{-# language TypeApplications #-}
{-# language UnicodeSyntax #-}

x#  _ = id @(_  _) mdo
  if | let ?_ = ( 0x0.0, '(), ) 
      [ 0b_0 ^ 0e0
      | !_  []
      | _  []
      , then\case
      ]

Posted on by:

Discussion

markdown guide
 

We can eliminate one space by moving the # to another position that doesn't need it, before a

a∷_=id @(_∷_)mdo if|let?_=(0x0.0,'(),)→[0b_0^0e0| !x#←[]|_←[],then\case]

And add two #'s to the TuppleSection so we use UnboxedTuples

b∷_=id @(_∷_)mdo if|let?_=(#0x0.0,'(),#)→[0b_0^0e0| !x#←[]|_←[],then\case]

Since our toplevel binder is now just one character we can get its name
instead of (), saving 1 character

c∷_=id @(_∷_)mdo if|let?_=(#0x0.0,'c,#)→[0b_0^0e0| !x#←[]|_←[],then\case]

Change one of the patterns inside the list comprehension to use ViewPatterns

d∷_=id @(_∷_)mdo if|let?_=(#0x0.0,'d,#)→[0b_0^0e0| !x#←[]|(id→_)←[],then\case]

= 78 characters, requiring 23 language extensions

Use n+1 as pattern inside the ViewPattern with NPlusKPatterns

e∷_=id @(_∷_)mdo if|let?_=(#0x0.0,'e,#)→[0b_0^0e0| !x#←[]|(id->n+1)←[],then\case]

= 81 chars, slightly to long :(

But we can move the BinaryLiteral (with NumericUnderscores) to the NPlusKPattern.
And move the HexFloatLiteral to where the BinaryLiteral used to be.

f∷_=id @(_∷_)mdo if|let?_=(#'f,#)→[0x0.0^0e0| !x#←[]|(id->n+0b_0)←[],then\case]

:) 79 chars, 24 exts

Slightly expanding the challenge to include ghc flags.
We can replace both empty lists with typed holes _, saving to 2 characters.

Typed holes aren't a language feature, and are normally (handy) compile errors.
But we can ask GHC to please defer those errors to runtime
with the flag: -fdefer-typed-holes

g∷_=id @(_∷_)mdo if|let?_=(#'g,#)→[0x0.0^0e0| !x#←_|(id->n+0b_0)←_,then\case]

I think it's important here that those defered errors don't trigger at runtime,
that would be cheating.
But the code runs just as well as before, producing an empty list of integers.

-fdefer-out-of-scope-variables
Allows us to change the id in the ViewPattern to the undefined i,
saving 1 more character.

h∷_=id @(_∷_)mdo if|let?_=(#'h,#)→[0x0.0^0e0| !x#←_|(i->n+0b_0)←_,then\case]

76 chars, 24 language extensions, 2 ghc flags

{-# language UnboxedTuples #-}
{-# language ViewPatterns #-}
{-# language NPlusKPatterns #-}
{-# options_ghc -fdefer-typed-holes #-}
{-# options_ghc -fdefer-out-of-scope-variables #-}
 

Oops, I broke my own rule on not having runtime errors.

The first empty list can't be a hole, or the bangpattern will trigger a deferred type error. +1 char
But I forgot the use the unicode in the view pattern. -1 char

h_=id @(__)mdo if|let?_=(#'h,#)[0x0.0^0e0| !x#[]|(in+0b_0)_,then\case]

So still 76 chars, 24 language extensions, 2 ghc flags

 

Wonderful!

Many tricks like this in the Reddit thread too: reddit.com/r/haskell/comments/hzz8...

 
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE BinaryLiterals #-}
{-# LANGUAGE EmptyCase #-}
{-# LANGUAGE ImplicitParams #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE MonadComprehensions #-}
{-# LANGUAGE MultiWayIf #-}
{-# LANGUAGE NumDecimals #-}
{-# LANGUAGE NumericUnderscores #-}
{-# LANGUAGE PartialTypeSignatures #-}
{-# LANGUAGE RecursiveDo #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE StarIsType #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TransformListComp #-}
{-# LANGUAGE TupleSections #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE UnboxedTuples #-}
{-# LANGUAGE UnicodeSyntax #-}
{-# LANGUAGE ViewPatterns #-}
module Play where

c_=(#if|let?_='cid @(_ *)mdo[0b0^0_e0| !(\casex#)id,then id],#)

I couldn't get a ratio better than 3. (This one is slightly bigger than 3, 71/23 = 3.08)