A post at Coding Horror got me thinking how the experience of using regular expressions through out code can be improved. I use them frequently in various cases, but many programmers though about them being like some kind of super-human ability:

XKCD comic

The syntax of regular expressions itself isn’t very welcoming and some (really) ugly examples can be found on the web. Also, the creation of a single expression can be a bit long and error prone process, even for experienced programmers and testing their correctness isn’t painless either. What options do we have then?

Regular expressions pocket reference book

Well for creating my own regular expression, I often fall back to a external tool. I use an online tester: Rubular. Although it’s said to be for the ruby language, the rules and characters are pretty generic for most regular expression engines. After getting the expression right on Rubular, I just integrate them with my code. Most of the times I just abstract it to its own method, to help with code reuse and testing. The pocket reference that I acquired also comes in hand, with all the rules explained and implementations for all the popular languages.

Unfortunately there isn’t much intellisense for regular expression when it comes to IDEs. Here I’m talking about identifying them (an easier task in some languages like ruby and perl, in which they have been granted with a unique syntax), checking if the syntax is valid, automatically offer a box where you can throw several test cases, and other nice features you can probably think of.

Finally, some experiences have been done towards making the regular expression a bit more programmer friendly. Most notably, a readable version through object-oriented rendering of the regular expression string. A random example of what it looks like:

Pattern.With.Literal(@"<div")
    .WhiteSpace.Repeat.ZeroOrMore
    .Literal(@"class=""game""").WhiteSpace.Repeat.ZeroOrMore.Literal(@"id=""")
    .NamedGroup("gameId", Pattern.With.Digit.Repeat.OneOrMore)
    .Literal(@"-game""")...

It’s arguable if this version is itself a improvement not only in readability, but also in maintainability (bigger number of lines for example). A lot of arguments have been thrown at the coding horror post comments section, so I suggest you read a bit through them. In my personal opinion I found the syntax a bit harder to remember and would only increase the overhead of making casual changes. However, I also agree that for some folks it might be a simpler and familiar way to start using regular expressions, specially if they’re all against learning a new language.

If you enjoyed this post, consider subscribing to my RSS Feed RSS feed icon

2 Comments

Nice post. For testing I use http://gskinner.com/RegExr/. It's pretty good, and it also has a desktop version. As for the "readable" version, I think verbose regular expressions in Python seem much closer to that goal, both in readability and maintainability.
The tester seems really nice, maybe I'll switch.

About the verbose regular expressions in Python, that's actually a common feature in all regexp implementations. For most, adding the end option 'x' enables multi-line support and embedded comments using #. And yes, when your expression is getting a little out of hand, it sure helps to give some pointers through it...

Leave a Comment

Follow this post comments with this RSS feed.