Regex golf at regex.alf.nu 2014-02-02
In this blog post I will summarize some cool and interesting "solutions" to the regular expression challenges at http://www.regex.alf.nu/. I write solutions inside quotation marks because to some of the levels there are better solutions found, but there may also be even better unknown solutions.
I have found many of the solutions in the comments to this Github Gist: https://gist.github.com/jpsim/8057500. It's a fun game, and I enjoy reading what tricks other people have come up with. I'm not a regex l33t by any means, but I think this game is a fun way to see what nifty things you can accomplish with regex. There is also a discussion thread at Hacker News about the game.
If you need to brush up your regular expression skills, check out Mozilla Developer Network's article on the subject.
I also found a cool regex visualizer on http://regexp.quaxio.com/.
And by the way, if you can't get enough of golf, there is also a very well-made game called Vim Golf.
1. Plain Strings
The solution to this level is simply foo
, which earns you 207 points.
2. Anchors
All words in the left column end with the letter k. Therefore the level is
simply won with k$
, giving you 208 points.
3. Ranges
The left column's words only contain the letters a to f. Writing this with a
regex yields ^[a-f]+$
and scores us 202 points.
4. Backrefs
All words in the left column follow a pattern where three letters in sequence appear two times within the word. For example barbary and heavyheaded.
Selecting strings following this pattern can be made with the regex
(...).*\1
, worth 201 points.
5. Abba
Here we want to select words that don't contain letter sequences looking like
anallagmatic, hippogriffin and unclassably. Therefore we will use
?!
in regex to negate our selection.
Here is a solution using negation: ^(?!.*(.)(.)\2\1)
, worth 193 points. I
found this solution in this Stack Overflow
answer.
As a side note, we can't simply write (?!(.)(.)\2\1)
, because x(?!y)
really
means "select x only if it isn't followed by y." In our case: "select the start
^
only if isn't followed by the pattern inside the parenthesises."
6. A man, a plan
Here the mission is to match palindromes, even if it isn't possible as described in this Stack Overflow answer.
However, it's possible to match palindromes of limited length, in this case
with ^(.?)(.?)(.?).?\3\2\1$
, worth 168 points.
An even better solution is ^(.)[^p].*\1$
, worth 177 points. The main part to
understand is ^(.).*\1$
which matches words that start and end with the same
letter, as palindromes obviously do. Since only the word sporous from the right
column starts and ends with the same letter, we filter it out with [^p]
.
7. Prime
A very cool solution to this level is ^(?!(..+?)\1+$)
, worth 285 points. What
it will do is to match a number of x:es that is a multiple of two numbers, i.e.
not a prime number. Then it negates the result, yielding us a selection of all
strings from the left column.
8. Four
The pattern here is a letter repeated four times with one other letter between
each of the pairs. For example Makaraka and
odontonosology. Regex to match this pattern: (.).\1.\1.\1
,
worth 198 points.
To save a letter and score an additional point, use (.)(.\1){3}
.
9. Order
When you think about it, a fairly obvious solution to this level is
^a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$
as answered on Stack Overflow, giving us 156 points.
A slightly trimmed down version, without the unused letters, is worth 166 points and looks like:
^a*b*c*d*e*f*g*h*i*k*l*m*n*o*p*r*s*t*w*y*z*$
A cool cheat I found to this level is ^[^o].....?$
, worth 198 points. This
works because the words in the left column are only 5 or 6 letters long in
constrast to the right column's words which are at least 7 letters long except
for oriole. oriole is filtered out with [^o]
.
10. Triples
This is probably the hardest level. The goal is to write a regular expression that matches multiples of 3. I found a blog post describing how this problem can be written as a finite state machine. The huge solution the author ends up with is
^([0369]|[258][0369]*[147]|[147]([0369]|[147][0369]*[258])*[258]|[258][0369]*[258]([0369]|[147][0369]*[258])*[258]|[147]([0369]|[147][0369]*[258])*[147][0369]*[147]|[258][0369]*[258]([0369]|[147][0369]*[258])*[147][0369]*[147])*$
This yields us 401 points.
But there is apparently a short, little cheat
00($|3|6|9|12|15)|4.2|.1.+4|55|.17
that is worth 596 points.
11. Glob
Short little 397 points cheat: [bncrw][bporn]|^p|c$|ta
12. Balance
When a tag is opened with <
it needs to be properly closed with >
. A regex
for this is:
^(<(<(<(<(<(<<>>)*>)*>)*>)*>)*>)*$`
286 points will this give us.
13. Powers
A first solution, worth 45 points, is:
^(x|xx|xxxx|x{8}|x{16}|x{32}|x{64}|x{128}|x{256}|x{512}|x{1024})$
With the following clever solution we will get 56 points:
^((((((((((x)\10?)\9?)\8?)\7?)\6?)\5?)\4?)\3?)\2?)\1?$
However, I saw an even more clever and nifty solution worth 80 points:
^(x|(xx){1,8}|x{32}|(x{64})+)$
14. Long count & 15. Long count v2
To match the complete string in the left column, we can use the regex worth 253
points ((.+)0 ?\2[1]){8}
. However, we can skip the space and question mark in
the middle of the regex, which will score us two additional points but won't
match the complete left string. This yields us ((.+)0\2[1]){8}
. The regex
will basically match pairs of 4 digit sequences where only the last digit
changes, like in the case of 1010 1011.
16. Alphabetical
My solution, worth only 50 points:
^(ae\w+ )*(as\w+ )*(ar\w+ )*(ea\w+ )*(en\w+ )*(er\w+ )*(es\w+ )*(rea\w+ )*(ren\w+ )*(rer\w+ )*(rese\w+ )*(rest\w+ )*(ret\w+ )*(sea\w+ )*(sen\w+ ?)*(ser\w+ )*(snarer ?)*(sta\w+ )*(strata )*(street)*(tan\w+ )*(tar\w+ ?)*(tas\w+ )*(teaser ?)*(tester)*(ten\w+ ?)*(tester )*(testes ?)*(tsetse)*$
It only works for this case, it's very long and it's ugly.
A pretty nifty and complex cheat, worth 286 points, is:
^(?!.* ((.*)t.* \2[es]|(.*)s.* \3[nr]|(.*)r.* \4[en]))
However, AlanDeSmet on Github has this solution:
^(?!.*\b((.*)b.*\b\2[a]|(.*)c.*\b\3[ab]|(.*)d.*\b\4[a-c]|(.*)e.*\b\5[a-d]|(.*)f.*\b\6[a-e]|(.*)g.*\b\7[a-f]|(.*)h.*\b\8[a-g]|(.*)i.*\b\9[a-h]|(.*)j.*\b\10[a-i]|(.*)k.*\b\11[a-j]|(.*)l.*\b\12[a-k]|(.*)m.*\b\13[a-l]|(.*)n.*\b\14[a-m]|(.*)o.*\b\15[a-n]|(.*)p.*\b\16[a-o]|(.*)q.*\b\17[a-p]|(.*)r.*\b\18[a-q]|(.*)s.*\b\19[a-r]|(.*)t.*\b\20[a-s]|(.*)u.*\b\21[a-t]|(.*)v.*\b\22[a-u]|(.*)w.*\b\23[a-v]|(.*)x.*\b\24[a-w]|(.*)y.*\b\25[a-x]|(.*)z.*\b\26[a-y]))
He writes:
Alphabetical (-109) "Pure" solution: Should work for any number of words of any length limited to characters [a-z]. I haven't even considered the case where the words are of varying lengths, but my brain is done for tonight. But by my own standards, this is a cheat-free solution, and I haven't seen any others that meet that standard. Pity that it scores so abysmally.