« Fabricated Complexity | Main | NStatic Presentation »

December 28, 2006


TrackBack URL for this entry:

Listed below are links to weblogs that reference Hard Problems, Simple Solutions:

» Functional Style Regex Engine in C# from Cook Computing
Wesner Moise’s recent post Hard Problems, Simple Solutions speculates about implementing regular expression matching using a functional style. He points to a regular expression engine in 14 lines of Python and suggests this could be ported to C# using... [Read More]


Charles Cook

I've posted a C# version of the Python regex engine here:


Not quite as elegant as the Python code!

Chris W.

See this article on SureLogic:

The company is a spin-off of Carnegie Mellon’s Fluid Project.


See this recent paper on regexp perf

Jonathan Allen

> I contend that, with this approach, we can get a reasonable implementation of regular expressions over lists of arbitrary types in less than 200 lines of code, which is two orders of magnitude more concise than Microsoft’s more specific implementation over strings. I’ll produce a port of this one day and post it in my blog.

I don't believe you.

If it really takes about 200 lines, you should be able to do it over a lunch break or too. Hell, it should take less time than you spent writing this post and the follow-up.

Wesner Moise

Your time estimate is off.

I wrote a port of the original Python before. It does take more time than a lunch break.

It takes a few hours because I do have to think about it and test it. I'll get around to doing it later.

Cook has the slow version in the link in the first comment, which is less than 200 lines.

When I actually do the post, I'll refer to your comment.


Well this looks like the counter example for your basic premise. If people would be writing 3*(13-7) as Multiply(Three, Minus(Thirteen, Seven)) we would still be waiting for the appearance of calculus in this part of the universe :-)

Not to mention that these I-can-do-regex-as-function examples are so artificial that it hurts to read them. If you really, really don't know what kind of work is a regex expected to do these days and with what kind of real-world requirements, well there's a book ... :-)

One could analogously claim that "LALR(1) is too complicated" and "see how it can be done all with just functions" - except that it's LL(1), but hey, it's all "parser", right? :-))

So I'll leave you with a quote form an author that is known :-) : Things should be made as simple as possible, but not any simpler.

Oh and academic articles like that one bellow go claiming "perf issues" and bitching against NFA since they have found one case where it chokes, but only if you refuse to use non-greedy quantifiers - in order to publish an article for example :-)) Kind of like claiming that exp(n) is terribly slow while refusing to use anything but e+e+e+..., pardon plus(e,plus(e,plus(e, ...))) :-)

Wesner Moise

Operators are functions with special syntax.

a + b is really operator+(a,b).


Actually not :-) Quite a dangerous misconception though. The operator doesn't have just the function but also algebraic properties - and doesn't have to be reducible to a function at all. Operators are actually one of the major ways how a problem, once solved becomes simple.

That's why the attempt at manual function chaining to simulate your grandma's regex is such a countexample. Especially for .NET which compiles full scale Perl regex to the bare metal binary that runs thread-safe without locking.

For a further reaching example - compare the largely unreported beauty with which C# compiler generates the most efficient string concatenation out of a+b+c, by treating + as a genuine operator, while C++ compilers stumble and force people into using string buffers by clinging to the "opearators are just functions" dogma.

Functional representation may work for things that don't have a known efficient solution. When pushed on everything it just obfuscates and brings constant risk of exponential explosion. Like that innocent little n^n, being a constant time operation when written well, but turning exponential when written in terms of all pure add(a,b) functions, all the way down ..., or was it turtles? :-)


Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

My name is Wesner Moise. I am a software entrepreneur developing revolutionary AI desktop applications. I worked as a software engineer in Microsoft Excel group for six years during the 1990s. I worked on PivotTables and wrote the most lines of code in Excel 97-- about 10 times the median developer. I have a Harvard BA in applied math/computer science and a UCLA MBA in technology entrepreneurship. I am a member of the Triple Nine Society, a 99.9 percentile high-IQ society.

December 2013

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31