## DEV Community is a community of 862,249 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

# Benchmarking Scala with ScalaMeter, Pt. 1 (Scala DCP #004)

Daily Coding Problem is a website which will send a programming challenge to your inbox every day. I want to show beginners how to solve some of these problems using Scala, so this will be an ongoing series of my solutions. Feel free to pick them apart in the comments!

### Problem

Given an array of integers, find the first missing positive integer in linear time and constant space. In other words, find the lowest positive integer that does not exist in the array. The array can contain duplicates and negative numbers as well.

For example, the input `[3, 4, -1, 1]` should give `2`. The input `[1, 2, 0]` should give `3`.

You can modify the input array in-place.

### Strategy

See my discussion of this problem (and my Java implementation) here!

Spoilers! Don't look below unless you want to see my solution!

The Java solution I wrote for this problem is very imperative. It uses array indices and mutable variables and more and feels very un-Scala-like. But the "constant space" restriction on the problem seems to mandate that we use a mutable array.

Since this mutability sort of flies in the face of Scala's functional style, and since I've already solved the problem with the constant space restriction, I'd like to take a different approach to solving this problem using Scala. I'd like to solve this as many ways as I can and then benchmark the resulting algorithms. Since the arrays can contain duplicates and negative numbers, we have several dimensions over which we might want to search for the optimal algorithm:

1. % of duplicate elements (`pctDup`)
2. % of negative elements (`pctNeg`)
3. length of array (`n`)
4. maximum integer contained (`maxVal`)
5. minimum integer contained (`minVal`)

These five restrictions essentially define a probability density function for an arbitrary random integer generator. Generating integers within a range is easy enough. If you have Scala 2.13 or later, you can use the `Random.between()` method:

``````scala> import scala.util.Random
import scala.util.Random

scala> val rand = new Random()
rand: scala.util.Random = scala.util.Random@5496c165

scala> (1 to 10).map(e => rand.between(-5, 5))
res1: IndexedSeq[Int] = Vector(3, 3, -3, -2, -2, -1, 4, 3, -3, 3)

scala> // or...

scala> List.fill(10)(rand.between(-5,5))
res5: List[Int] = List(1, 0, 4, 2, 1, -5, 4, 1, -3, 4)
``````

If you have an earlier version of Scala, you can implement the `between()` method yourself by just copying-and-pasting the source code from Scala's GitHub repo. Here is the implementation as it exists in the source (comments removed):

``````  def between(minInclusive: Int, maxExclusive: Int): Int = {
require(minInclusive < maxExclusive, "Invalid bounds")

val difference = maxExclusive - minInclusive
if (difference >= 0) {
nextInt(difference) + minInclusive
} else {
@tailrec
def loop(): Int = {
val n = nextInt()
if (n >= minInclusive && n < maxExclusive) n
else loop()
}
loop()
}
}
``````

Once we have a ranged random integer generator, we can simply generate `n` values to get our sample array. Since we ignore non-positive integers, we probably don't care what the smallest integer is, we only really care about the percentage of non-positive integers. That means the dimensions over which we want to benchmark our solutions might actually be:

1. % of duplicate elements (`pctDup`)
2. % of non-positive elements (`pctBad`)
3. length of array (`n`)
4. maximum integer contained (`maxVal`)

...to get a particular % of non-positive elements, we can simply roll a die when the next integer in the sequence is about to be generated, and make that integer a `0` if some predicate passes. The same thing can be done for duplicate elements.

Finally, we likely want `maxVal` to somehow be dependent on the length of the array `n`, and not arbitrary. So I propose that we instead use a maximum value which is relative to the length of the array:

maximum integer contained, relative to length of array (`maxPct`)

We might naively chose values for these parameters similar to the following

``````pctDup = 0, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 75, 90, 99, 99.9
pctBad = 0, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 75, 90, 99, 99.9
n      = 0, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000
maxPct = 25, 50, 75, 100, 125, 150, 175, 200
``````

Looking at the above, we can see that some of these values may be contradictory. For example, we can't have an array of length `n = 5` with a `maxPct = 25` (meaning the maximum allowable value in the array is `0.25 * 5 = 1.25` or `1` when rounded to an integer, and require that that array have a `pctDup` any less than 100%, which is not one of our allowable values. Also, it might be difficult to implement a "duplicate value" generator, because that requires us to look over the already-generated values and pick one at random before duplicating that one in particular.

Instead, we can inspect `pctDup` as an emergent property of the generated arrays. If `maxPct` is small, then `pctDup` will naturally become large, since we're restricting the range of possible values relative to the length of the array. This simplifies our search space further. Let's drop `pctDup` entirely and instead focus on only three tunable parameters:

1. the length of the array (`n`)
2. the maximum allowable value in the array, as a percentage of the length of the array (`maxPct`)
3. the percentage of non-positive ("bad") elements (`pctBad`)

``````pctBad = 0, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 75, 90, 99, 99.9
n      = 0, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000
maxPct = 10, 25, 50, 75, 100, 125, 150, 175, 200, 500, 1000
``````

This gives us `17 * 11 * 11 = 2057` possible combinations of parameters to iterate over, multiplied by the number of algorithms. Hopefully most of these should be very fast, because in the majority of cases, we're generating arrays with less than 100 elements. If these tests don't take too long, we can add additional tests with greater values of `n`.

We need to be careful to only benchmark the algorithm defined in the prompt and not to benchmark the array-generating code itself. Since we're not benchmarking that code, we don't need to pick apart its efficiency (though, ideally, it will be as efficient as possible so that the tests run quickly).

A quick and dirty implementation of this random array generator could look something like:

``````scala> import scala.util.Random
import scala.util.Random

scala> val rand = new Random()
rand: scala.util.Random = scala.util.Random@5496c165

scala> val n = 10
n: Int = 10

scala> val maxPct = 1000
maxPct: Int = 1000

scala> (1 to n).map(i => if (rand.nextDouble() < pctBad/100) 0 else rand.between(1, n*maxPct/100 + 1))
res0: IndexedSeq[Int] = Vector(52, 0, 77, 84, 0, 0, 16, 87, 8, 30)
``````

The above values of `n`, `pctBad`, and `maxPct` generate an array of `10` elements, where approximately `20` percent of them are "bad" (non-positive), and the values in the array range from `1` to `1000` percent of the length of the array (i.e. 10 times the length of the array or `100`), inclusive on both ends.

Note that, because we're using percentages and not fractions, we need to divide both percentages by `100`. Instead of doing that, let's just move to fractional values instead:

``````pctBad = 0, 0.0001, 0.0002, 0.0005, 0.001, 0.002, 0.005, .01, .02, .05, 0.1, 0.2, 0.5, 0.75, 0.9, 0.99, 0.999
n      = 0, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000
maxPct = 0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 5, 10
``````

And take another shot at our random array generator...

``````scala> def randArray (n: Int, pctBad: Double, maxPct: Double): Array[Int] =
|   (1 to n).map(i => if (rand.nextDouble() < pctBad) 0
|                     else rand.between(1, (n*maxPct).toInt + 1)).toArray
randArray: (n: Int, pctBad: Double, maxPct: Double)Array[Int]

scala> randArray(10, 0.5, 1.0)
res9: Array[Int] = Array(6, 0, 4, 3, 0, 0, 0, 9, 7, 0)

scala> randArray(15, 0.9, 10.0)
res12: Array[Int] = Array(0, 0, 0, 0, 0, 0, 0, 0, 66, 0, 0, 0, 148, 0, 0)

scala> randArray(5, 0, 100.0)
res13: Array[Int] = Array(194, 45, 113, 33, 460)
``````

This seems okay until we realise that it fails for certain combinations of parameters. For instance, if `n` is `1` (allowed) and `maxPct` is `0.1` (allowed), we will try to call `rand.between(1, 1)`, which will throw an `IllegalArgumentException` (`requirement failed: Invalid bounds`). So we can `require` `maxPct` to always be positive, and use

``````Math.ceil(n*maxPct).toInt
``````

``````ceil(n*maxPct).toInt
``````

This ensures that the range sent to `rand.between()` is always valid. One more small fix might be to reduce our number of random values generated. Since `between()` and `nextDouble()` both return uniformly random values over a range, we could simplify the algorithm by simply allowing the range to be negative for a percentage equivalent to `pctBad`.

For instance, if `n` is `20`, `pctBad` is `0.0909` (9.09%), and `maxPct` is `1.5` (150%), it means we want to generate an array of `20` integer values on the range `[1, 30]`, where `1.8` of those values are negative, on average (`0.0909 * 20`).

Instead of throwing two random numbers -- one corresponding to `pctBad`, and one corresponding to generating the positive value if `pctBad` is greater than `0.0909`, we could instead just extend our range into the negative integers, such that 9.09% of the total range is non-positive. In other words, we change our range from `[1, 30]` to `[-2, 30]` and generate integers randomly uniformly over that range. In this case, our range includes `33` integers, `3` of which (`-2`, `-1`, and `0`) are "bad".

This ensures that, on average, 9.09% of the values (`3` out of `33` possible integer values) will be non-positive ("bad"), and that the maximum allowable generated value is 150% of the length of the array. And we only have to generate one random value!

A modified `randArray()` might then look like:

``````// random array generator
def randArray (n: Int, fracBad: Double, maxScaling: Double): Array[Int] = {
require(fracBad >= 0 && fracBad <= 1, "fraction of \"bad\" elements must be between 0 and 1")
require(maxScaling >= 0, "maxScaling must be >= 0")
require(n >= 0, "array length must be >= 0")

if (n == 0) Array()
else if (fracBad == 1.0) Array.fill(n)(0)
else {
val maxVal = n * maxScaling
val minVal = 0.5 - fracBad / (1 - fracBad) * maxVal

val range = f"range [\$minVal%.3f, \$maxVal%.3f]"
require(n == 0 || minVal <= maxVal, s"invalid \$range")

if (minVal == maxVal) Array.fill(n)(minVal.toInt)
else (1 to n).map(_ =>
rand.between(minVal, maxVal).round.toInt).toArray
}
}
``````

...you could do away with the `require()`s if you're sure that you're only passing valid parameters. With this implementation, we'll have some "wiggle" on the percentage of non-positive values in an array, as well as the range of values contained in the array. We can look at all of these things statistically. In summary, we'll use the above `randArray()` implementation and the problem solutions below, benchmarking them as a function of

1. `n`, the number of elements in the array
2. the fraction of non-positive elements in the array
3. the maximum value present in the array, relative to the length of the array
4. the average duplication of given elements in the array...?
5. ...?

Let's go!

### Code

There are three methods I'd like to try and benchmark. The first is my Java-like solution (rewritten in (ugly) Scala, of course). I've debugged this a bit and I think the performance is more or less optimal at this point. It's constant space because it only uses the given mutable array, and I think it performs the fewest number of value swaps and array traversals to solve the problem:

``````scala> :paste
// Entering paste mode (ctrl-D to finish)

def findMissingMutable (array: Array[Int]): Int = {
if (array.length < 1) 1
else {
val length = array.length
var smallestMissing = 1

for (index <- 1 to length) {
var value = array(index - 1)

// if this value is valid, and not at the appropriate index...
if ((value != index) && (value > 0) && (value <= length)) {
breakable {

// ...we will try to swap it
while (value != index) {

// get the value at this value's index
val swapIndex = value
val swapValue = array(swapIndex - 1)

// if value of target index is equal to this value, skip
if (value == swapValue) break

// swap the values at the indices
array(swapIndex - 1) = value
array(index - 1) = swapValue

// re-apply conditions in outer if loop
if (swapValue < 1 || swapValue > length) break
if (swapValue == swapIndex) break

// get newly-swapped value
value = array(index - 1)
}
}
}
}

for (ii <- 0 until length) {
if (array(ii) >= 1) {
if (array(ii) != smallestMissing) return smallestMissing
else smallestMissing += 1
}
}

smallestMissing
}
}

// Exiting paste mode, now interpreting.

findMissingMutable: (array: Array[Int])Int

scala> findMissingMutable(Array(3, 4, -1, 1))
res102: Int = 2

scala> findMissingMutable(Array(1, 2, 0))
res103: Int = 3
``````

The second uses Scala's `Array.sortInPlace()` method (available from 2.13 onward), which should allow us to still satisfy the prompt's constant space complexity requirement:

``````scala> def findMissingInPlace (arr: Array[Int]): Int = {
|   val goodVals = arr.filter(_ > 0).distinct.sortInPlace
|   val missing = goodVals.zipWithIndex.dropWhile({ case (x,i) => x == i+1 }).headOption
|   missing match {
|     case Some((_, i)) => i + 1
|     case None => goodVals.length + 1
|   }
| }
findMissingInPlace: (arr: Array[Int])Int

scala> findMissingInPlace(Array(3, 4, -1, 1))
res46: Int = 2

scala> findMissingInPlace(Array(1, 2, 0))
res47: Int = 3
``````

The third method is linear in space, but might be faster because I'll be converting the `Array` to a `Set`, then performing only `Set` operations, which are, in general, faster than operations on Scala's sequences:

``````scala> def findMissingSet (arr: Array[Int]): Int = {
|   val goodVals = arr.filter(_ > 0).toSet
|   val nVals = goodVals.size
|   val missing = (1 to nVals).dropWhile(goodVals.contains).headOption
|   missing match {
|     case Some(x) => x
|     case None => nVals + 1
|   }
| }
findMissingSet: (arr: Array[Int])Int

scala> findMissingSet(Array(3, 4, -1, 1))
res71: Int = 2

scala> findMissingSet(Array(1, 2, 0))
res72: Int = 3
``````

After testing these solutions a few hundred times and ensuring that they all arrive at the same results (and squashing some bugs), I'm happy enough to benchmark them. Note that, because `findMissingMutable()` and `findMissingInPlace()` both mutate the array passed to them, we must first generate an array, then copy it three times to pass to each of the three implementations above.

So, which one will be fastest? Under which conditions? Place your bets now because I'm about to reveal the results!

Results coming soon!

All the code for my Daily Coding Problems solutions is available at github.com/awwsmm/daily.

Suggestions? Let me know in the comments.

If you enjoyed this post, please consider supporting my work by buying me a coffee!