Meks (they/them)

Posted on Apr 6, 2021

Sort a String using a Custom Alphabet in Elixir

#elixir

Why a custom sort with Elixir

Last week I wrote a blog on sorting the characters in a string using a custom alphabet in Javascript. This was the challenge for a live code assessment for an interview. The company I interviewed with primarily works in Elixir and after hearing my interviewer talk about it with such excitement, I decided to try solving the challenge with this programming language! I was successful in sorting a string with the regular alphabet but got very stuck on using a custom alphabet. I sent my solution to part 1 to the interviewer and when asked he shared with me his solution for using the custom sort. Since receiving it I've been a little obsessive in trying to understand exactly how every piece of it works.

This is my in-depth explanation for how to use Elixir to sort a string with a custom alphabet! We'll cover setting up for the problem, solving the problem with the regular alphabet and then a custom one, refactoring for readability and following Elixir convention, and the runtime and space complexity for the solution. And through all that, we will touch on Elixir modules, sigils, module functions, module attributes, default arguments, anonymous functions, the capture operator, and the pipe operator.

Set up

If you want to do this yourself and you're just starting out with Elixir, you can use Replit with Elixir to run the code or if you've installed Elixir on your machine you can make an alphabetize.ex file. To run the file you would cd into the directory containing the file and run $ elixir alphabetize.ex and it will return the results of any IO.puts statements in the file.

Elixir Modules

As a functional programming language, Elixir uses modules to group functions together such as the String or Enum module. In order to create our own modules in Elixir, we use the defmodule macro and we use the def macro to define functions in that module. So in our alphabetize.ex file we can add:

defmodule Alphabetize do 

end

This makes a module where we can write functions and add attributes. The goal is to be able to call a function on Alphabetize that we can pass a string and optionally a custom alphabet into it and it will return the string sorted accordingly. To test that in the file we could use a puts statement to see the return value.

defmodule Alphabetize do 

end

IO.puts Alphabetize.alphabetize("learnelixir") # aeeiillnrrx
alphabet = ~w(n b c d e f g h i j k x l m a o p q r s t u v w y z)
IO.puts Alphabetize.alphabetize("learnelixir", alphabet) # neeiixllarr

Sigils

Here we are testing that it will work with a custom alphabet and the regular alphabet. I've set the variable alphabet to pass into the alphabetize method (which is not yet written). The alphabet needs to be accessible as a list where we can find the index of any given letter. Elixir uses sigils, which are one of the mechanisms provided by the language for working with textual representations. Sigils start with the tilde (~) character which is followed by a letter identifying the sigil and then a parenthesis. The ~w sigil is used to generate lists of words (words are just regular strings), or in our case, a list of letters. Inside the ~w sigil, words(letters) are separated by whitespace.

You can verify this produces the expected result by opening an interactive elixir shell with the command $ iex in your terminal:

// ♥ > iex
Erlang/OTP 23 [erts-11.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe] [dtrace]

Interactive Elixir (1.11.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> alphabet = ~w(n b c d e f g h i j k x l m a o p q r s t u v w y z)
["n", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "x", "l", "m", "a", "o",
 "p", "q", "r", "s", "t", "u", "v", "w", "y", "z"]

To exit ctrl + c twice.

We use the sigil because it is much quicker to have Elixir convert the letters into a list with all those double quotes as opposed to writing it all out manually. Huzzah! A chance to save my fingers some keystrokes!

Next, let's start writing the alphabetize method which takes in a string and an alphabet:

defmodule Alphabetize do 

   def alphabetize(string, alphabet) do 

   end

end

IO.puts Alphabetize.alphabetize("learnelixir") # aeeiillnrrx
alphabet = ~w(n b c d e f g h i j k x l m a o p q r s t u v w y z)
IO.puts Alphabetize.alphabetize("learnelixir", alphabet) # neeiixllarr

Default Arguments

We need to make sure that if a custom alphabet is not passed in as the second argument, the method can still run and sort according to the regular alphabet. Named functions in Elixir support default arguments indicated by using two backward slashes (\). What follows the backslashes will be what the function uses as a default argument if another is not specified when the function is invoked.

Module Attributes

To make the default_alphabet we can take advantage of using the word sigil as we did above when writing our custom alphabet. This time though, because we are inside the Alphabetize module, we will create a module attribute to hold the list which is indicated with an '@' symbol. Elixir developers often use module attributes when they wish to make a value more visible or reusable. If we were to forgo the '@' and try setting it the same way we did outside the module, Elixir will instead of making a variable, it will try to invoke default_alphabet as a function.

defmodule Alphabetize do

  @default_alphabet ~w(a b c d e f g h i j k l m n o p q r s t u v w x y z)

  def alphabetize(string, alphabet \\ @default_alphabet) do

  end

end

IO.puts Alphabetize.alphabetize("learnelixir") # aeeiillnrrx
alphabet = ~w(n b c d e f g h i j k x l m a o p q r s t u v w y z)
IO.puts Alphabetize.alphabetize("learnelixir", alphabet) # neeiixllarr

The first puts statement will use the default alphabet which is the regular English alphabet, and the second puts statement will use the custom alphabet that is passed in as a second argument.

Provided Module Functions: split, sort, join

Next, we want our method to take the string, split it into a list of the characters, sort it, and then join the list back into a string. The String module has a function split/3 that allows us to specify the pattern in which we want our string split. The first argument is the string, the second is the pattern. We want to use empty quotes ("") to indicate it should be split on every character. In the third argument, we can pass in options such as trim which if set to the boolean true will remove all whitespace in a string. We can test this out in the interactive terminal:

iex
Erlang/OTP 23 [erts-11.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe] [dtrace]

Interactive Elixir (1.11.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> string = "elixir"
"elixir"
iex(2)> String.split(string, "")       
["", "e", "l", "i", "x", "i", "r", ""]
iex(3)> String.split(string, "", trim: true)
["e", "l", "i", "x", "i", "r"]
iex(4)>

Now in our alphabetize method we can save the list of characters in a variable.

  def alphabetize(string, alphabet \\ @default_alphabet) do
    list = String.split(string, "", trim: true)
  end

To sort the list we will use the Enum Module which provides a set of algorithms to work with enumerables. In Elixir, an enumerable is any data type that implements the Enumerable protocol. Lists ([1, 2, 3]), Maps (%{foo: 1, bar: 2}) and Ranges (1..3) are common data types used as enumerables. We can use any of the algorithms provided by the Enum Module on our list. If we were only concerned with sorting by the regular alphabet, we could use the sort/1 function which sorts according to Erlang's term ordering. Then we could use the Enum join/2 function which joins the given enumerable into a string using joiner as a separator. If there is no joiner provided as an argument, it defaults to an empty string. Which would give us exactly what we want.

iex(1)> string = "elixir"
"elixir"
iex(2)> list = String.split(string, "", trim: true)
["e", "l", "i", "x", "i", "r"]
iex(3)> list
["e", "l", "i", "x", "i", "r"]
iex(4)> sorted = Enum.sort(list)
["e", "i", "i", "l", "r", "x"]
iex(5)> Enum.join(sorted)
"eiilrx"

So if we were to write this out in our Alphabetize Module, we can solve the first part of the sorting problem using the regular alphabet which is how Erlang sorts letters.

defmodule Alphabetize do

  @default_alphabet ~w(a b c d e f g h i j k l m n o p q r s t u v w x y z)

  def alphabetize(string, alphabet \\ @default_alphabet) do
    list = String.split(string, "", trim: true)
    sorted = Enum.sort(list)
    Enum.join(sorted)
  end
end

IO.puts Alphabetize.alphabetize("learnelixir") # aeeiillnrrx

Build the custom sort

In order to sort with a different function, we can use the sort/2 method which sorts the enumerable (our list of letters) by a given function which is passed in as the second argument. Sort/2 uses the merge sort algorithm and it compares two arguments and it returns true if the first argument precedes or is in the same place as the second argument. If it returns true, it will place that first argument preceding the second argument in the returned list. So we know that we need to write a function, that will get the index of where the given letter is in the alphabet, the index of the next letter we are comparing it to, and determine true or false if the first letter proceeds the second.

Pseudocode

for every pair of letters we need to sort
given letters a, b from string and given an alphabet
get the index of a in the alphabet
get the index of b in alphabet
true or false is a < b ?

Breaking it up this way, we can see that we need the index of an enumerable, and the find_index/2 method does just that. The first argument is the enumerable (list, the alphabet in our case) and the second is an anonymous function that tells it which element to find the index of (in our case which letter).

iex(1)> alphabet = ~w(n b c d e f g h i j k x l m a o p q r s t u v w y z)
["n", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "x", "l", "m", "a", "o",
 "p", "q", "r", "s", "t", "u", "v", "w", "y", "z"]
iex(2)> Enum.find_index(alphabet, fn x -> x == "a" end)
14

We can then compare the indices of two different letters and return true if the first letter has a lower index and false if it is higher by using the less-than comparison operator.

iex>  Enum.find_index(alphabet, fn x -> x == "a" end) < Enum.find_index(alphabet, fn x -> x == "e" end)       
false
iex>  Enum.find_index(alphabet, fn x -> x == "a" end) < Enum.find_index(alphabet, fn x -> x == "r" end)
true

Now that we have a comparison that returns true or false using the indices of the alphabet, we can use that in the sort/2 method. Using sort/2 we can write an anonymous function with the 'fn' and 'end' keywords and an '->' to separate the parameters and the function body to indicate how we want the enumerable sorted. For example:

iex> Enum.sort([1, 2, 3], fn (a, b) -> a >= b end)
[3, 2, 1]

We can test out how this logic might work with our custom alphabet in the interactive shell again:

iex(1)> alphabet = ~w(n b c d e f g h i j k x l m a o p q r s t u v w y z)
["n", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "x", "l", "m", "a", "o",
 "p", "q", "r", "s", "t", "u", "v", "w", "y", "z"]
iex(2)> list = String.split("elixir", "", trim: true)
["e", "l", "i", "x", "i", "r"]
iex(3)> sorted = Enum.sort(list, 
...(3)>   fn (a, b) -> 
...(3)>     Enum.find_index(alphabet, fn x -> x == a end) < Enum.find_index(alphabet, fn x -> x == b end) 
...(3)>    end)
["e", "i", "i", "x", "l", "r"]

Woohoo! It is successfully sorting according to the customized alphabet where "x" appears before "l". Let's implement this in our alphabetize method.

Initial Solution

defmodule Alphabetize do

  @default_alphabet ~w(a b c d e f g h i j k l m n o p q r s t u v w x y z)

  def alphabetize(string, alphabet \\ @default_alphabet) do
    list = String.split(string, "", trim: true)
    sorted = Enum.sort(list, 
      fn (a, b) -> 
        Enum.find_index(alphabet, fn x -> x == a end) < Enum.find_index(alphabet, fn x -> x == b end) 
      end)
    Enum.join(sorted)
  end
end

IO.puts Alphabetize.alphabetize("learnelixir") # aeeiillnrrx
alphabet = ~w(n b c d e f g h i j k x l m a o p q r s t u v w y z)
IO.puts Alphabetize.alphabetize("learnelixir", alphabet) # neeiixllarr

Now if you run in your terminal $ elixir alphabetize.ex it returns the IO.puts results of

aeeiillnrrx
neeiixllarr

which is what we expect for the regular and the custom alphabets!

REFACTOR

Now that it works, let's clean up our code to be more readable and to better follow some Elixir patterns.

Write a new function

First, to make the alphabetize method easier to read, we can extract the comparisons of the indices to their own function which we'll call compare_alpha?. Since this function will return true or false for the comparison of the indices we can follow the Elixir convention to use a '?' for methods that return a boolean. This does not change anything functionally about the method, it's just an easy cue for developers that a boolean will be returned. Into compare_alpha? we will pass the two representations of the letters in the sort function and the custom or default alphabet.

defmodule Alphabetize do

  @default_alphabet ~w(a b c d e f g h i j k l m n o p q r s t u v w x y z)

  def alphabetize(string, alphabet \\ @default_alphabet) do
    list = String.split(string, "", trim: true)
    sorted = Enum.sort(list, 
      fn (a, b) -> 
        compare_alpha?(a, b, alphabet)
      end)
    Enum.join(sorted)
  end

  defp compare_alpha?(a, b, alphabet) do
    Enum.find_index(alphabet, fn x -> x == a end) < Enum.find_index(alphabet, fn x -> x == b end)
  end
end

IO.puts Alphabetize.alphabetize("learnelixir") # aeeiillnrrx
alphabet = ~w(n b c d e f g h i j k x l m a o p q r s t u v w y z)
IO.puts Alphabetize.alphabetize("learnelixir", alphabet) # neeiixllarr

Note that the compare_alpha? method is defined with 'defp' which indicates that it is a private method and not accessible outside of the Alphabetize Module.

Capture Operator

The next way we can clean this up is to take advantage of Elixir's shorthand for anonymous functions which is called the capture operator. One of the uses for the capture operator is to partially apply functions, where &1, &2, and so on can be used as value placeholders. For example:

iex(1)> double = &(&1 * 2)
#Function<44.79398840/1 in :erl_eval.expr/5>
iex(2)> double.(2)
4
iex(3)> dbl = fn x -> x * 2 end
#Function<44.79398840/1 in :erl_eval.expr/5>
iex(4)> dbl.(2)
4

So &(&1 * 2) and fn x -> x * 2 end produce the same results. In the example of our Enum.index_of() method, we can chang: Enum.find_index(alphabet, fn x -> x == a end)
to Enum.find_index(alphabet, &(&1 == a)).

Inside of sort where we call, compare_alpha? it is currently using the anonymous function: fn (a, b) -> compare_alpha?(a, b, alphabet) end) but we can change this to use the capture operator &(compare_alpha?(&1, &2, alphabet)).

Putting all of it together our Alphabetize Module now looks like this:

defmodule Alphabetize do

  @default_alphabet ~w(a b c d e f g h i j k l m n o p q r s t u v w x y z)

  def alphabetize(string, alphabet \\ @default_alphabet) do
    list = String.split(string, "", trim: true)
    sorted = Enum.sort(list, &(compare_alpha?(&1, &2, alphabet)))
    Enum.join(sorted)
  end

  defp compare_alpha?(a, b, alphabet) do
    Enum.find_index(alphabet, &(&1 == a)) < Enum.find_index(alphabet, &(&1 == b))
  end
end

IO.puts Alphabetize.alphabetize("learnelixir") # aeeiillnrrx
alphabet = ~w(n b c d e f g h i j k x l m a o p q r s t u v w y z)
IO.puts Alphabetize.alphabetize("learnelixir", alphabet) # neeiixllarr

Pipe Operator

Another refactor we can think about is our use of setting the result of each change to the string to a variable. We currently set the result of split, sort, and join to variables and pass that variable to the next operation. We are doing this because we can't reference some type of object in the way we might with an object-oriented programming language. In addition, all data structures in Elixir are immutable (they cannot be changed, you can only every make copies that contain the additional or changed information). We need a way to keep passing copies of the changing string. One way we could change this would be to nest the results inside of each other like so:

  def alphabetize(string, alphabet \\ @default_alphabet) do
    Enum.join(Enum.sort(String.split(string, "", trim: true), &(compare_alpha?(&1, &2, alphabet))))
  end

This in my opinion is even harder to understand than setting each result to a new variable. But what you can see here, is that going from inside to outside, the result is always the first argument of the next function that acts upon it. This is such a common pattern in Elixir because of its nature as a functional programming language, that it has its own operator, the pipe operator. This operator introduces the expression on the left-hand side as the first argument to the function call on the right-hand side. The |> operator is mostly useful when there is a desire to execute a series of operations resembling a pipeline such as what we will be trying to accomplish here.

string |> String.split("") is the same as String.split(string, "") because the pipe operator is passing the string as the first argument to the split(). If we apply this to the Alphabetize Module we can keep passing along the result of each function, and do so easily with repeated use of the pipe operator.

defmodule Alphabetize do

  @default_alphabet ~w(a b c d e f g h i j k l m n o p q r s t u v w x y z)

  def alphabetize(string, alphabet \\ @default_alphabet) do
    string |> String.split("") |> Enum.sort(&(compare_alpha?(&1, &2, alphabet))) |> Enum.join()
  end

  defp compare_alpha?(a, b, alphabet) do
    Enum.find_index(alphabet, &(&1 == a)) < Enum.find_index(alphabet, &(&1 == b))
  end
end

IO.puts Alphabetize.alphabetize("learnelixir") # aeeiillnrrx
alphabet = ~w(n b c d e f g h i j k x l m a o p q r s t u v w y z)
IO.puts Alphabetize.alphabetize("learnelixir", alphabet) # neeiixllarr

To make it easier to read, we can then add in new lines to make it look like a true pipeline.

Final Solution

defmodule Alphabetize do

  @default_alphabet ~w(a b c d e f g h i j k l m n o p q r s t u v w x y z)

  def alphabetize(string, alphabet \\ @default_alphabet) do
    string
    |> String.split("")
    |> Enum.sort(&compare_alpha?(&1, &2, alphabet))
    |> Enum.join()
  end

  defp compare_alpha?(a, b, alphabet) do
    Enum.find_index(alphabet, &(&1 == a)) < Enum.find_index(alphabet, &(&1 == b))
  end
end

IO.puts Alphabetize.alphabetize("learnelixir") # aeeiillnrrx
alphabet = ~w(n b c d e f g h i j k x l m a o p q r s t u v w y z)
IO.puts Alphabetize.alphabetize("learnelixir", alphabet) # neeiixllarr

Runtime and Space Complexity

To analyze the runtime of sorting a string with a custom alphabet we need to look at the operation that has the slowest runtime. The sigil, splitting the string, and joining the list back into a string would each have O(n) or linear runtime because, for each element, you have to perform one operation. In the case of finding the index, the best case is that the element is the first one that you look at which would be O(1) but the worst case is that it would be the last element in the alphabet meaning the program had to check each element to find the one it was looking for which would be O(n). We want to always use the worst-case scenario when determining run time so finding the index is also linear runtime.

Enum.sort/2 is a little more complex. The documentation says that it uses a merge sort algorithm to do its work.

Merge Sort is a divide and conquer algorithm. It works by recursively breaking down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem. Here is a visual representation of how it would sort a list of letters alphabetically:

Since the list is divided into two halves at each stage which is an O(logn) runtime, and then you have to make comparisons at each stage which is O(n) the resulting runtime is O(nlogn). So it is faster than O(n^2) but slower than O(n) and this is called a quasilinear runtime which is very common for sorting algorithms. Each time the number of elements to be sorted, the amount of work that has to be done doesn't quite double.

When calculating the runtime of the sort O(nlogn) and the individual O(n) for splitting and joining and finding the index, these are added to the runtime so it would be O(4nlogn) depending on how you count it, be we can drop the coefficient since it doesn't matter as n gets infinitely large.

The space complexity for Merge Sort is O(n) because for every recursive call two lists are created for merging which takes up O(n) space and deleted once the merged array has been made and before the next lists are created in the next recursive call.

Runtime Complexity: O(nlogn)
Space Complexity: O(n)

And that's all folx! By now we should all have a better understanding of Elixir modules, sigils, module functions, module attributes, default arguments, anonymous functions, the capture operator, and the pipe operator. There are many nuances going from an OOP to a functional programming language but I hope you've had as much fun as me discovering some of the conventions and tricks to using Elixir, while also flexing those neurons with a brief discussion of merge sort. For me, the most intriguing part was discovering the pipe operator and the capture operator as they really can enhance the readability of your code once you understand how they operate. There is something quite beautiful and aesthetically pleasing in how they move data.

Happy coding!

Top comments (5)

Linden Chiu • Apr 10 '21

Thanks for the educational article!

I had a question about your compare_alpha function; I think it's possible to map the alphabet to its indices? That way you can look up the index of any letter in the map in close to constant time, rather than iterating over a list with Enum.find_index. I rewrote your module with that implementation, and compared both versions with the benchee library, it seems to run faster with the modification.

Meks (they/them) • Apr 10 '21

Oh, I hadn't considered that! That's a really interesting take on it. Would you mind sharing the code for how you wrote compare_alpha? I'd love to test it out too and I can update the post with the new insight!

Linden Chiu • Apr 10 '21 • Edited

map = alphabet
|> String.split("", trim: true)
|> Enum.with_index()
|> Enum.reduce(%{}, fn({k,v}, acc)-> Map.put(acc, k, v) end)

You can create this map in the main alphabetize function, so you only have to do it once.
Then you can pass it into compare_alpha(a, b, map), and it becomes just:

map[a] < map[b]

I used a regular string for the alphabet instead of a sigil; I'm not familiar with Elixir, so I didn't know how to pass the indices from the list into the reduce function. (EDIT: Enum.with_index() works on the sigil, so you don't need to split an alphabet string)

Lucas Perez • Apr 6 '21

The more Elixir, the better :D

Meks (they/them) • Apr 6 '21

I just started learning it, and it has been so fun!!