About

I am a software developer in Seattle, building a new AI software company.

Ads

April 2009

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    

Categories

Ads


November 11, 2007

The Awkwardness of Functional Programming

Both Reddit’s main page and programming subreddit includes a popular post “Admitting that functional programming can be awkward.” Each of these subreddits have elicited numerous interesting responses.

In it, James Hague recounts how a semi-successful Mac game he wrote called Bumbler is trivial to write in C, but that a purely functional approach appeared to be barely doable and may well be the wrong paradigm for these types of problems:

What's interesting is that it would be trivial to write this in C. Some incrementing, some conditions, direct calls to sound playing routines and insect spawning functions, reading and writing from a pool of global counters and state variables. For a purely functional approach, I'm sure the data flow could be puzzled out...assuming that everything was all perfectly planned and all the behaviors were defined ahead of time. It's much more difficult to take a pure movement function and say "okay, what I'd like is for this object to gravitationally influence other objects once it has bounced off of the screen edges three times." Doable, yes. As directly implementable as the C equivalent? No way.

That's one option: to admit that functional programming is the wrong paradigm for some types of problems. Fair enough. I'd put money on that. But it also may be that almost no one has been thinking about problems like this, that functional programming attracts purists and enamored students. In the game example above, some of the issues are solvable, they just need different approaches. Other issues I don't know how to solve, or at least I don't have solutions that are as straightforward as writing sequential C. And there you go...I'm admitting that functional programming is awkward in some cases. It's also extremely useful in others.

It’s not always obvious, perhaps because people often continue to think imperatively when finding an functional approach. Rather than thinking that functional programming is not very compatible with game development, there might be a radically different way of thinking about a problem. I have encountered one computer engineering thesis called “Functional Programming and 3D Games” which specifically deals with the topic.

I have been in a similar situation a few times, but I have always been able to come up with an elegant, often unexpected, solution after thinking about the problem from a high level.

In a functional pearl paper on the Zipper data structure, Gerard Huet writes about the apparent unsuitability of functional approach to many data structures, but then he proposed a new functional data structure called the Zipper to rectify these problems:

The main drawback to the purely applicative paradigm of programming is that many efficient algorithms use destructive operations in data structures such as bit vectors or character arrays or other mutable hierarchical classification structures, which are not immediately modeled as purely applicative data structures. A well known solution to this problem is called functional arrays (Paulson, 1991). For trees, this amounts to modifying an occurrence in a tree non-destructively by copying its path from the root of the tree. This is considered tolerable when the data structure is just an object local to some algorithm, the cost being logarithmic compared to the naive solution which copies all the tree. But when the data structure represents some global context, such as the buffer of a text editor, or the database of axioms and lemmas in a proof system, this technique is prohibitive. In this note, we explain a simple solution where tree editing is completely local, the handle on the data not being the original root of the tree, but rather the current position in the tree.

One of his examples of a tricky data structure—the database of axioms and lemmas in a proof system—hit close to home. I was trying to move my proof system to a more functional approach, which would offer me numerous benefits including immutability, automatic support for backtracking and shared copies as well as easy inclusion inside any other data structure. I did come up with a solution by utilizing a binary tree implementation with filters at each node, aggregated from each of the child nodes. These filters allow me to perform an arbitrary search in an efficient divide-and-conquer manner, whereby a subtree in which no child contains a desired property is left unexplored. It was faster and more light-weight than my previous imperative attempts, and also simpler, because I no longer needed to maintain and synchronize a separate data structure for indexing.

October 08, 2007

Immutable Data Structures in C#

In the post on Path Finding using A*, Eric Lippert, programmer in the Visual C# team, writes that immutable data structures are the way of the future in C#. As immutable data structures have been a frequent posting topic of mine, I am really happy about this direction towards immutable data and functional programming that C# is taking. My only quibble is that the last two words, "in C#," are not really necessary.

The frequent complaint of functional programming is that it is slower than procedural programming. However, I discovered something else: As you know, the immutable data structures for maps may be slower for some operations than their imperative cousins, but the running times for all the operations are typically acceptable and well-balanced. (Even then, I did bring up in a recent post a version of persistent hash-tables and array used by COQ that offers nearly identical performance). Plus, these data structures are very flexible--able to be used freely anywhere with no constraints.

Most of my data structures in NStatic and elsewhere are immutable. In my data structures, the equality and the copy operation are constant-time, even for large object trees. The immutability and balanced performance allows many otherwise exponential algorithms to be done efficiently. A good example is the use of dynamic programming in which function values are "memoized" (sic); this technique is absolutely essential for natural language parsing.

Wes Dyer, another developer in the C# team, wrote early this year that "imperative programming is sometimes reminiscent of a Rube Goldberg machine. Both require meticulous thought to ensure that a process works correctly despite a myriad of state transitions and interdependencies. It is amazing these complicated programs work at all." Hmm... could he be referring to any popular application software that we know of?

That Microsoft and the rest of the world programs imperatively and suffers the consequence of their decisions is the main reason that I can even consider developing the software that may indirectly compete with their own.

October 05, 2007

Type Names

I picked up a lot of interesting knowledge about C# in the course of developing and testing a parser for the language. I'll try to post a few tidbits as they come to mind again.

One nice tidbit is how to create new short type names without having to specify the full name of every non-primitive type, especially inside of generic type names.

Normally, using "using" to create a new type name, even in the presence of the appropriate namespace import, one has to fully qualify every type name that is not a keyword.

using System.Collections.Generic;
using T = System.Collections.Generic.List<string>

However, if the "using" is inside a namespace, then it will utilize any imports located in an outer namespace.

using System.Collections.Generic;

namespace MyCompany
{
    using T = List<string>;
}

August 22, 2007

Source Code

Scott Hanselman's post on Weekly Source Code 2 reminded me of my desire to post some of my source code. I'll probably do just that, but, with me and my short-term memory loss, I often forget to follow up until after a few years go by.

A couple of my classes come to mind:

  • Rational number class. This is an implementation of a general-purpose 16-byte data type that can precisely represent rational numbers (up to a certain precision), long (both signed and unsigned), and doubles. The code is written in IL to take advantage of extended-precision floating point. More info...
  • Fast, persistent hash-tables and array. This is one of my two sets of persistent collection classes. Similar to persistent arrays used by the COQ theorem prover for fast, functional union-find data structures, these data types are as fast as their imperative cousins and are ideal for backtracking. Unfortunately, my implementation is not thread-safe. My second set of persistent data structures is fully "functional" and thread-safe, but is more connected to my code-base.

If I haven't posted any source code by the end of September, give me a ring.


I have been meaning to post some regular expression matching implementations from my post in December: one using closures and another less efficient implementation using iterators. Both are functional, but imperative implementations can be compact too. I won't get around to doing it until after I ship NStatic, though.

Several years ago, I wrote a fast pattern-matcher and a slower unification engine (bidirectional pattern-matching) for expressions. Both algorithms provided a superset of regular-expression capabilities including wildcards and disjunctions.  (For unification, one regular expression pattern might be matched against another pattern.) They also handled other phenomena such as associativity and commutativity. By going completely functional, I was able to merge both algorithms into one, fast algorithm and dramatically reduce the total number of lines down to about 500.

I had written a regular expression engine for a college class based on NDFAs, the code for which was substantial. I also know that Microsoft's Rotor implementation includes a large number of files, so I was surprised by the compactness and directness of my functional pattern matcher implementation.

Since then, I have been (re-)writing most of my code in the functional style with the same consistent results. Programming in a functional manner entails heavy use of closures (anonymous methods), which is only really practical in C# 2.0+, and greatly eliminates the need for temporary or accidental classes.

July 26, 2007

Orcas Beta 2

Orcas Beta 2 is here, and it includes a go-live license. I plan a switch over (plus a clean reinstall of Vista) by the end of this month. I do prefer to use the latest tools to keep on top of technology, and, despite the installation being a distraction from my current work, I do feel rejuvenated by a sense of progress and change of scenery.

I have been developing in Orcas Beta 1 for about a month now, but haven't been using any of the new compiler features and libraries. Both Refactor Pro and Resharper have latest versions that will run on Orcas, though I don't think Resharper 3.0 supports the new language syntax yet. With the go-live license, it might even be possible to use the LINQ APIs even though the product has RTMed, but I'll wait.

It is hard for me to recall a specific benefit of using Orcas, but it is very stable and doesn't require Administrator privileges to run. Most of the benefits from using the IDE come are related to .NET 3.0/3.5 (which include WPF and LINQ).

I am not planning on using WPF in the near term, but it is something I am currently looking into the future. And if I were, it would initially be from inside a WinForms application using its interoperability features. I also don't see a compelling reason now to use it. I have seen a couple reports that WPF is not ready for prime time use, Rod Paddock, Eric Sink and Lhotka. I have seen ribbon and other components from ActiproSoftware, DevComponents and Infragistics, but the component libraries are still minimalistic and not fully developed. The runtime, required for Windows XP, also adds 2.8 Mbs to my setup package, and forecloses my ability to run in Windows 2000 (though, I must admit, I can't guarantee NStatic even runs in 2000 in its present state).

June 22, 2007

Jumping to Orcas and WPF

After releasing the first version of my product, I would most likely jump directly to using Orcas and WPF targeting 3.0, at which point I will begin blogging about my experiences and the new feature set. I firmly believe in moving to the latest released languages and technologies as soon as possible. I don’t want to invest in moribund technologies, and I do want the most productive tools as I have limited resources.

I wonder how practical it is for me to move to the Orcas beta right now. There will be probably be a number of problems that I encounter, but will it be more than made up by the expanded feature set of Visual Studio or will there be show-stoppers? Is compiler and IDE performance going to take a hit as it did in Whidbey? Both productivity tools, Resharper 3.0 and Refactor Pro/CodeRush, already support Orcas.

For now, I will play it safe and jump into Orcas after I released NStatic. Every change seems to drag my release ever farther away.

As for WPF, I halted my user interface development work in anticipation for WPF, since it would obsolete some of my existing work and also includes support for animations, 3D, and a declarative programming model. My 3D effects library that I spent some time developing, including perspective warping for images, will probably go to waste.

My plan is to switch over to WPF as the primary programming model and utilize its WinForms compatibility controls. I will purchase the WPF editions of the existing control library as there are already many WPF-based user interface components available now. I purchased Chris Sells and Chris Anderson’s books on WPF; I will probably also buy Adam Nathan’s based on Jeff Atwood’s recommendation. The WPF programming model is just so clean and beautiful, a sharp contrast from MFC.

March 07, 2007

Anonymous Recursion

I was planning on writing about anonymous recursion relating to work in NStatic, but Wes Dyer beat me to the punch with his post Anonymous Recursion in C#. In addition to my name, Wes Dyer shares my desire to push a more functional style of programming in C# (and he's also a member of the C# team, so we may win).

 

Loops and recursive functions in NStatic are converted to recursive lambda expressions, and then through various techniques like unrolling, recurrence solving, inductive proofs, are simplified to a closed form. The expressions above are intentionally unsimplified for illustrative purposes. 

Each of the lambda expressions displayed above are recursive. Nonrecursive lambda expressions are applied over their arguments, so they aren't normally seen unless standalone without any arguments.

I am not currently sure how I will eventually display lambda expressions in C#; the current display is attractive to my eyes, personally, but may turn off users unfamiliar with lambda calculus. I probably will ultimately go with a more C#-like syntax. My previous attempt was quite unreadable or quite verbose because the lambda operator => looks like the inequality operator and the use of parenthesis was quite excessive. Another possibility is to use C# 2.0 syntax with delegate and return keywords.

For those with sharp eyes, I introduced a fix operator to concisely display recursive lambda expressions. The fix function takes as its argument any lambda expression that includes a first parameter, representing the name of the function itself. That parameter can be used inside the lambda expression to recursively call itself.

In a language that does not explicitly have recursion operators, lambda expressions can be used to introduce recursion. It's not very pretty, but one such operator to do this is called the Y fixed-point combinator.

    public delegate Func<X,T> YFunc<X,T>(YFunc<X,T> f);
    public static Func<X, T> YCombinator<X, T>(Func<Func<X,T>, Func<X,T>> f)
    {
        YFunc<X,T> lambda = delegate(YFunc<X,T> y)
        {
            // return f(y(y)); -- only works with call by name
            return f(delegate(X x) { return y(y)(x); });
        };
        return lambda(lambda);
    }

The combinator does not allowed to use recursion, only function application, in its definition, since it's purpose is to introduce a form of recursion in the first place.

The Y combinator is possible in a typed language like C# because C# allows us to recursively define types--that is, YFunc was defined in terms of itself. A slight modification, an additional lambda expression, was also needed to allow f(y(y)) to be called by name.

A more efficient function for anonymous recursion takes advantage of assignment in the C# to avoid the creation of more than one closure is the following function.

    public static Func<X, T> Fix<X, T>(Func<Func<X,T>, Func<X,T>> func)
    {
        Func<X,T> fixfunc = null;
        fixfunc = func(delegate(X x) { return fixfunc(x); });
        return fixfunc;
    }

An example of its use in C# is the following:

    Func<int,int> func = Fix<int,int>(
        delegate(Func<int,int> f)
        {
            return delegate(int n)
            {
                return n<=1 ? 1 : n + f(n-1);
            };
        });
    
    Console.WriteLine(func(6));
kick it on DotNetKicks.com

Continuation Passing Style & Anonymous Methods

I thought that I would briefly explain the concept of continuation-passing style from my previous post. CPS is a way of simulating continuations in a language that doesn’t support the feature. Basically, we making a function call, one passes to the call an additional argument, which is another function that represents the continuation of the calling function.

Anonymous methods (closures) in C# 2.0 enables CPS with a very convenient syntax. Closures are frames constructed on the heap—allowing us to bypass limitations on function calls imposed by the CLR stack-based implementation.

public void Method()
{
     // Code before call
     T result = Call(a, b, c)
     // Code after call
}

becomes

public void Method()
{
     // Code before call
     Call(a, b, c,
     delegate(T result)
     {
           // Code after call
     });
}

The callee function returns a value by calling the continuation function passed in with the return value.

public void Call(int a, int b, int c, Action<T> func)
{
      // ...
      if (...)
      {
          func(result);
          return;
      }
      // ...
}

If a method itself returns a result, then we might return the result of the call like so among many possible ways.

public T Method()
{
     // Code before call
     return Call(a, b, c,
     delegate(T result)
     {
           T result2;
           // Code after call
           return result2;
     });
}

Unlike a stack-based implementation, the continuation can be called multiple times with different return values or not at all. This technique can be used for backtracking, multithreading, and many other uses. Richter previously used this for a Fire-and-Forget approach to locking. I found this technique useful with dialog boxes to eliminate modality.

 

kick it on DotNetKicks.com

March 01, 2007

Larry O'Brien

I have not been blogging frequently because I am attempting to deliver a product, but I decide to respond to one of Larry O’Brien’s two posts critiquing my own blog posts. In the other post, he puts my name on the title and links me to a strawman argument that I didn’t actually make: “Wesner Moise Claims IDEs Disprove ‘No Silver Bullets’: I Say ‘Are You Kidding?’”

In Bad Comparison: 14 Line Python RegEx evaluator versus Microsoft’s 14K lines, Larry writes.

Wesner Moise points to "generalized regular expression matching" as a moderately hard problem that might serve as the basis for comparing programming languages and approaches. He says "Microsoft's implementation of regular expression matching over strings is spread across 24 files and 14,455 lines of code including comments and whitespace." (I'm not sure how he'd know that -- I assume he's talking about Rotor source code.)

He wonders if a functional approach could be more efficient and points to a 14-line Python program.

No, no, no: they are two incredibly different capabilities.

The Python program implements something like the original definition of a regular expression --restricted to that which can be expressed in a single line of Extended Backus-Naur Form without recursion. "Regular expression support" for today's languages means something very, very different, starting with compatibility with Perl5 and going from there. Backslashes, named groups, etc. are complex features that require, in any language, something in excess of 14 lines.

Larry seems to miss the fact that the Python implementation could easily incorporate all of the features he described easily. The implementation is based on functions not finite state machines, so we aren’t limited by the expressiveness of regular languages. The Microsoft implementation does include compilation and parsing, so it’s not directly comparable, but if one wanted to create a reasonably performing regular expression engine for list structures besides strings with the same sets of capabilities, it could involve dramatically less code implemented in a functional style. I’ll eventually post source code for such an implementation, but right now I can’t justify the time when I still need to deliver a product and generate income.

Implementing a regular expression engine is not really a hard problem. My first implementation was difficult when I tried in college to extend the DFAs and NDFAs to accommodate features introduced in Perl, while at the same retaining the minimal processing that FAs offer. When I look at Spencer’s open source UNIX implementation of regular expressions, I noticed that he eschewed all the academic models for what I considered at the time “a quick and dirty” implementation, which while must faster in practice had theoretically inferior performance. 

Larry continues about functional programming and silver bullets:

Having said that, regular expressions are a good fit for functional approaches. But, just to point out the lack of silver bullets, attempting to parse a left-recursive grammar (a grammar with a production of the form A->AB) will hang a simplistic recursive-descent parser.  

Larry, whose job is to be an expert on programming languages, seems to lack any imagination. Larry takes a false premise about recursive-descent parsing and makes a false inference about the lack of silver bullets. He’s making a statement about a programming style that, being Turing-equivalent, is maximally expressive. It is possible to implement a highly readable, simple, table-free, recursive-descent parser for left-recursive grammars using continuation-passing style that performs as well as traditional table-driven approaches. I myself use continuation-passing style for implementing backtracking algorithms. One good paper on this is Derivation of a Typed Functional LR Parser. There are other approaches I have seen such as using data representations of functions in parser combinators.

Larry’s heavily wedded to the idea that there is no silver bullet. It has a Godelian ring that makes him look wise. I don’t buy into Brooks’ argument that there is no silver bullet to put down the software beast. It rests on assumptions such as that existing program languages have become high-level enough. He wrote this back in 1986, when the dominant languages of the time, assembly, C, Basic, Fortran pre-90, were quite primitive compared to now. Program languages can still become far more conceptual and declarative. I personally see the incorporation of functional techniques and symbolic manipulation as necessary steps towards that goals. My own belief in silver bullets, probably reflects my own push for fundamental advances in software that are happening too slowly in the big companies.

Lastly, Larry O’Brien claims in one post that he is nice during the day, but becomes a werewolf when he writes his blog post.

I think I'm quite nice. You might not know that if you just know me from my blog and articles. I try to always be fair when I write, but that's not the same as being nice, which I think I am in person.

Hopefully, someone like Charles Simonyi comes along and shoots a silver bullet through the beast, whose collapsing hairy body morphs back to the naked, pale-skinned body of a Hawaiian geek.

December 01, 2006

No Meeting Monday

I didn’t receive any confirmation for the .NETDA meeting on Monday night as I originally arranged for over a month ago, so I cancelled. It appears that I contacted the wrong people, the organization president and the events coordinator, when I should have contacted the meeting leader. It’s possible that my content did not fit well with the Web Developer meeting. I plan on scheduling a talk on January as well as one during the MVP summit.

This also gives me time to fix up my support for IL interpretation of API calls, which I am currently adding. This gives me automatic parameter validation and automatic extraction of algebraic properties of API calls.

Special thanks to the following companies for free components

I made a blanket statement in my last post about large component companies based on experience with one vendors, which is probably not entirely true. For example, what I have seen from DevExpress is very impressive.

Almost as if he was responding to my last post, Mark Miller, the chief architect at DevExpress, announced yesterday that the DXperience Enterprise, containing all DevExpress software, is being made freely available to all MVPs.