dev.to staff

Posted on Apr 23, 2020

Daily Challenge #232 - Regex Pattern

#challenge

Implement a function that should return a valid regular expression. This is a pattern which is normally used to match parts of a string. In this case, it'll be used to check if all the characters given in the input appear in a string.

Your input will be a string of upper and lower case alphabetical characters. You should output a regex pattern as a string.

Example

abc = 'abc'
pattern = regex_contains_all(abc)
st = 'zzzaaacccbbbzzz'
bool(re.match(pattern, st), f"Testing if {st} contains all characters in {abc} with your pattern {pattern}")

will return True

Tests

abc = 'abc'
pattern = regex_contains_all(abc)
bool(re.match(pattern, 'bca'))
bool(re.match(pattern, 'baczzz'))
bool(re.match(pattern, 'ac'))
bool(re.match(pattern, 'cb'))

Good luck!

This challenge comes from albertogcmr on CodeWars. Thank you to CodeWars, who has licensed redistribution of this challenge under the 2-Clause BSD License!

Want to propose a challenge idea for a future post? Email yo+challenge@dev.to with your suggestions!

Top comments (11)

Mihail Malo • Apr 23 '20 • Edited

// Javscript

const regex_contains_all = chars =>
  new RegExp(
    chars
      .split("")
      .map(c => `(?=.*${c})`)
      .join("")
  )

// Test

const abc = 'abc'
const re = regex_contains_all(abc)
;[
  re.test('bca'),
  re.test('baczzz'),
  re.test('ac'),
  re.test('cb')
]

This uses "positive lookahead" patterns to match multiple patterns to the same spans.
This allows me to avoid listing all the permutations.
That is not to say that the runtime will be anything but atrocious.

Truly a oneliner

const regex_contains_all = chars => new RegExp(chars.replace(/./g,'(?=.*$&)'))

Craig McIlwrath • Apr 23 '20

I was interested in how the positive lookahead compares to a list of all permutations. I did some basic benchmarking and determined that your solution gives what appears to be logarithmic time, which seems strange. This could be an error of my testing. Whatever is the case with that, listing all permutation is certainly much, much slower. Here's a chart: (y axis is logarithmic scale and in nanoseconds):

The data for "All Permutations" isn't complete. 7 characters is all the node.js regex engine can handle (any longer and it the regex is too large). I only did 6 because I was getting impatient waiting.

Mihail Malo • Apr 23 '20

Wow!
How long are the strings you test against?
I'd be interested in ridiculously long strings which include a lot of some of the letters, but not others, and how finally including one of them at the beginning affects the runtime.

Craig McIlwrath • Apr 24 '20

Each test was on 100k strings of length [30..39] composed of random lowercase letters. The patterns were slices of increasing length from the beginning of the alphabet ('a', 'ab', 'abc', etc.). The same test cases were used for all data points.

If I have some time tomorrow, I may do some more tests with longer strings/placement of the required letters. I'll update you!

Maybe I'll add in a regex-less implementation for comparison too.

Ken Bellows • Apr 23 '20 • Edited

Winner right here. Lookaheads are the one regex feature I have not spent nearly enough time investigating. From what I've seen they dramatically broaden the scope of what you can do with regexes

Paula Gearon • Apr 24 '20

I don't consider myself a regex expert, but this is a feature I've never used and had forgotten about. Very nice, thank you!

Craig McIlwrath • Apr 23 '20

I'm not sure why you would want to implement this as a regular expression. There's no compact way (that I could think of) to represent those rules as a regular language. Here's my solution, in javascript.

function permutations(list) {
  if (list.length === 1) return [list];
  const xs = [];
  for (let i = 0; i < list.length; i++) {
    const tails  = permutations(list.filter((y, j) => j !== i));
    xs.push(...tails.map(ys => [list[i], ...ys]));
  }
  return xs;
}

function regexContainsAll(chars) {
  const regex = permutations(chars.split(''))
    .map(pattern => `^.*${pattern.join('.*')}.*$`)
    .join('|');
  return regex;
}

For example, input 'abc' produces the string 'a.*b.*c|a.*c.*b|b.*a.*c|b.*c.*a|c.*a.*b|c.*b.*a'.

As a side note, why do all of these challenges have "tests" that don't include expected results? It doesn't help me understand the question if they don't tell me what the function is supposed to do...

Davyd McColl • Apr 23 '20

function regex_contains_all(chars, caseSensitive) {
  return new RegExp(`^[${chars}]*$`, caseSensitive ? undefined : "i");
}

it's case-insensitive by default -- the question isn't clear about case sensitivity
the stated tests look like they are meant for Python, so here's a self-contained test suite:

function assert(condition) {
    if (!condition) {
        throw new Error("FAIL");
    }
}

var abc = 'abc'
var pattern = regex_contains_all(abc)
assert("bca".match(pattern) !== null);
assert("baczzz".match(pattern) === null);
assert("ac".match(pattern) !== null);
assert("cb".match(pattern) !== null);

Mihail Malo • Apr 23 '20

You did "string consists only of", not "string includes all of".
The intended result matrix is:

assert(pattern.test("zbca"))
assert(pattern.test("zbaczzz"))
assert(!pattern.test("zac"))
assert(!pattern.test("zcb"))

Paula Gearon • Apr 23 '20

Can I just check...
This looks for every character in the requirement string must be there, and it doesn't matter which order, right? So my understanding is that the tests should give:

abc = 'abc'
pattern = regex_contains_all(abc)

bool(re.match(pattern, 'bca')) => true
bool(re.match(pattern, 'baczzz')) => true
bool(re.match(pattern, 'ac')) => false
bool(re.match(pattern, 'cb')) => false

Paula Gearon • Apr 23 '20

Assuming the above is OK, then here's my Clojure solution:

(require '[clojure.string :as s])
(defn permute [a]
  (or (seq (mapcat #(map (partial cons %) (permute (remove #{%} a))) a)) [[]]))

(defn regex-contains-all [input]
  (str "(" (s/join ")|(" (map (partial s/join ".*") (permute input))) ")"))

Apologies for writing code that looks like line noise.

Given an input of "abc" then it will generate:

(regex-contains-all "abc")
"(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)"

This gives the test results that I describe in the previous comment.

View full discussion (11 comments)