About

I am a software developer in Seattle, building a new AI software company.

Ads

April 2009

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    

Categories

Ads


« Office Open XML | Main | MS Development Process »

December 06, 2005

Expression Trees

IanG writes about the new Expression Trees feature in C# and indicates that it’s his favorite new feature. I also liked this new feature.

A few developers, however, like Mike Gunderloy complained that C# is introducing features that he will never ever use and is slowly acquiring the complexity of C++. I disagree with the faulty comparison, since this and other features are lightweight syntactic sugar with localized impact. Contrast this with 2.0 features like generics, which touched every part of the language, or with C++’s multiple inheritance, const modifiers, templates, member pointers, and overloading of any operator.

The language designers clearly opted to minimize the impact of the new changes. For example, extension methods require few syntax changes and kick in only when conventional method resolution fails. (Despite this, I do see some issues with reflection and IntelliSense.) Likewise, automatic construction of expression trees appears to kick in only when conventional conversion fails and the toplevel operator is a lambda expression.

The new C# features also require no additional support from the CLR, yet I do suspect that new language features may use more efficient implementations when compiled against the Orcas runtime. C#, for instance, switched to using constrained method calls when generating IL in Whidbey. Some future bets include anonymous types and VB’s new dynamic interfaces.

One thing that I agree with Mike Gunderloy is that “expression trees” is clearly a feature intended for advanced programmers and library designers.

My initial thoughts on hearing about expression trees was that the C# language was becoming more expressive, declarative, and Lisp-like by bridging the notions of code and data. I thought about possibilities like adding logic programming and indeed the Microsoft guys anticipated with a sample included with the LINQ preview bits.

Then, I wondered whether the benefits of expression trees were already available through the facilities provided by operator overloading. I currently use operator overloading for my own expression language, but have also seen other attempts like Quasi-LINQ.

However, expression trees offer greater generality than operator overloading.

  • Support for all operators (except for side-effects and comma operations) including ternary expressions, logical operators, casts, allocations, array literals, and member access.
  • Support for function calls and arguments.
  • Support for inline functions via lambda expressions.
  • Support for type information.
  • Capturing of local variables (a la closures) through the new FuncletExpression object.

In the LINQ Preview, expression trees came with a few limitations of their own.

  • The toplevel expression must be a lambda expression. I didn’t see a need for this restriction other than language simplicity, but there is a workaround by using the lambda expressions with no parameters syntax: ( ) => x.
  • No support for side-effects (increment/decrement operations or  assignments)

I also wondered whether constructed expression trees were semantically equivalently to their corresponding lambda expressions, especially regarding captured locals. The FuncletExpression object in the System.Query library relieved my doubts.

However, I realized another potential problem: The same operators behave differently in each language as well as inside the database. Some operators perform short-circuiting in C# (&&, ||, ?), but not in VB (And, Or, IIf ). Comparisons can differ based on notions of identity versus equivalence or on case-sensitivity. There are also differences in treatment of nullable operators and overflow arithmetic.

One last issue for me is that expression tree objects are skeletal objects, intended only for description. These objects provide no additional services such as interpretation, compilation to IL, or conversion to/from CodeDom. While it’s possible to work directly with these objects, I suspect, in practice, an additional conversion will be needed to convert these framework objects to a private representation with more capabilities. The additional conversion is less than ideal in terms of allocations and performance.

Comments

Unless I miss something, the operator short-cutting only is a concern when you're working with compiled form of the lambda(that is, the Linq delegate), not the tree form (which is what you get in DLinq). Ideally, of course, the tree would contain little more than the parsed form of the lambda, without any attempt to optimize. Keep in mind that the tree is intended to be passed to something that will chew on it, so that is where any optimization or transformation logic will (should) take place.

Looking at the extension methods for Linq and DLinq, the Linq all expect delegates of some form, while those for DLinq expect trees. That's the criterion the compiler is going to pick up on: delegate and tree are two possible implicit pseudo-casts for a lambda which are handled by the compiler as needed.

I agree that short-cutting and side-effects are of less concern with expression trees and make little sense when passed to database or other engine in which the expression is intended to be optimized, transformed or executed multiple times.

There are a few situations where being able to shortcut is useful... If one wanted to simplify the runtime code generation of code by offering a C# expression instead of a series of IL statements that one currently must do when using Reflection Emit today. If one wanted to make the building of CodeDOM trees in a much less verbose way... Still, it's hard to support the use of side-effects within expressions in these cases as well; there's likely to be confusion in the code in which an assignment or increment operation is deferred.

Also, expressions trees without a toplevel lambda expression aren't very useful in many situations, since there is little ability for the user to parametrize the expression.

The point is that any such effects would be dependent on the consumer of the tree -- not the generator.

And I agree, building a CodeDOM tree from a lambda would be nice. In that case, take the preview compiler, create a function that expects an expression tree of some signature, and call it using a lambda. Inside the function, if I'm correct, you should then be able to transform the tree into a CodeDOM structure. Almost directly.

For what it's worth, there's a gentleman on the Linq forums who's doing the reverse (IL to ExpressionTree) to make his own implementation of DLinq.

The comments to this entry are closed.