« Fabricated Complexity | Main | NStatic Presentation »

December 28, 2006

Hard Problems, Simple Solutions

In my previous post on Fabricated Complexity, I wrote about a quote that I found myself repeatedly agreeing with:

The solution to a hard problem, when solved, is simple.

A commenter remarked:

I disagree, though - certainly many things are overcomplicated due to reasons you state, but this doesn't mean every hard problem has a seemingly simple solution.

Two things I want to point out: (1) Notice that I didn’t say every hard problem has a simple solution, since not every hard problem is solvable. (2) This, by the way, is not a hard belief of mine.

I think the basis of my “belief” comes from experiences with functional style programming, where the solution of the problem often turns out to be its specification. I found that the new functional features in C#, iterators (essentially, an efficient implementation of functions that return sets) and anonymous functions, allow me to uncover more simple solutions.

What’s a moderately hard problem to use as an example? Let’s see. Perhaps, building a generalized regular expression matcher might be. Microsoft’s implementation of regular expression matching over strings is spread across 24 files and 14,455 lines of code including comments and whitespace.

Let’s see what happens if we switch to a functional style. We could use the implementation of lazy lists, LazyChain, from the blog of Wes Dyer (developer in C# team) and then port the 14 line Python implementation of a regular expression engine to C# using iterators and anonymous functions. The regularly expression would be composed functionally rather than through a string:

re = Sequence( term1, Plus( term2, Alternate( term3, term4 )), term 5)

and the results would be returned as a collection by applying the regular expression to the list to be searched:

results = re( list_to_search )

I contend that, with this approach, we can get a reasonable implementation of regular expressions over lists of arbitrary types in less than 200 lines of code, which is two orders of magnitude more concise than Microsoft’s more specific implementation over strings. I’ll produce a port of this one day and post it in my blog.

In another point, the fact that a human outperforms a computer in a given programming task serves as a cue to me that maybe the computer algorithm is poorly written. In such cases, I look for an alternative functional implementation which tends to more accurately model how we really think.

Let’s look at resolution-based theorem proving, for example. For a large problem, I wonder if its just faster for the prover to extract from the clauses in the knowledgebase a functional expression equivalent in truth value to the user’s query rather than actually performing the proof. Quantified expressions get turned into lambda expressions, and boolean operations into conditionals. This extraction might be possible in polynomial time, after which we could go about reducing the returned expression to normal form, which is far more efficient than performing a lengthy nondeterministic search.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345242f069e200d834674e9e69e2

Listed below are links to weblogs that reference Hard Problems, Simple Solutions:

» Functional Style Regex Engine in C# from Cook Computing
Wesner Moise’s recent post Hard Problems, Simple Solutions speculates about implementing regular expression matching using a functional style. He points to a regular expression engine in 14 lines of Python and suggests this could be ported to C# using... [Read More]

Comments

I've posted a C# version of the Python regex engine here:

http://www.cookcomputing.com/blog/archives/000564.html

Not quite as elegant as the Python code!

See this article on SureLogic:
http://thetartan.org/2007/2/5/news/computer

The company is a spin-off of Carnegie Mellon’s Fluid Project.

See this recent paper on regexp perf
http://swtch.com/~rsc/regexp/regexp1.html

> I contend that, with this approach, we can get a reasonable implementation of regular expressions over lists of arbitrary types in less than 200 lines of code, which is two orders of magnitude more concise than Microsoft’s more specific implementation over strings. I’ll produce a port of this one day and post it in my blog.


I don't believe you.

If it really takes about 200 lines, you should be able to do it over a lunch break or too. Hell, it should take less time than you spent writing this post and the follow-up.

Your time estimate is off.

I wrote a port of the original Python before. It does take more time than a lunch break.

It takes a few hours because I do have to think about it and test it. I'll get around to doing it later.

Cook has the slow version in the link in the first comment, which is less than 200 lines.

When I actually do the post, I'll refer to your comment.

Well this looks like the counter example for your basic premise. If people would be writing 3*(13-7) as Multiply(Three, Minus(Thirteen, Seven)) we would still be waiting for the appearance of calculus in this part of the universe :-)

Not to mention that these I-can-do-regex-as-function examples are so artificial that it hurts to read them. If you really, really don't know what kind of work is a regex expected to do these days and with what kind of real-world requirements, well there's a book ... :-)

One could analogously claim that "LALR(1) is too complicated" and "see how it can be done all with just functions" - except that it's LL(1), but hey, it's all "parser", right? :-))

So I'll leave you with a quote form an author that is known :-) : Things should be made as simple as possible, but not any simpler.

Oh and academic articles like that one bellow go claiming "perf issues" and bitching against NFA since they have found one case where it chokes, but only if you refuse to use non-greedy quantifiers - in order to publish an article for example :-)) Kind of like claiming that exp(n) is terribly slow while refusing to use anything but e+e+e+..., pardon plus(e,plus(e,plus(e, ...))) :-)

Operators are functions with special syntax.

a + b is really operator+(a,b).

Actually not :-) Quite a dangerous misconception though. The operator doesn't have just the function but also algebraic properties - and doesn't have to be reducible to a function at all. Operators are actually one of the major ways how a problem, once solved becomes simple.

That's why the attempt at manual function chaining to simulate your grandma's regex is such a countexample. Especially for .NET which compiles full scale Perl regex to the bare metal binary that runs thread-safe without locking.

For a further reaching example - compare the largely unreported beauty with which C# compiler generates the most efficient string concatenation out of a+b+c, by treating + as a genuine operator, while C++ compilers stumble and force people into using string buffers by clinging to the "opearators are just functions" dogma.

Functional representation may work for things that don't have a known efficient solution. When pushed on everything it just obfuscates and brings constant risk of exponential explosion. Like that innocent little n^n, being a constant time operation when written well, but turning exponential when written in terms of all pure add(a,b) functions, all the way down ..., or was it turtles? :-)

? WHAT THIS DOES NO GOOD!

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment