« Lang.NET Symposium | Main | Research Pipeline »

August 01, 2006

Spec#

At the Lang.NET Symposium, I met up a member of the Spec# research team, Mike Barnett, shared with him information about the tool I was working on, and gave him my contact information. As I suspected earlier, Rustan Leino, who previously worked at HP/Compaq on ESC/Java, joined Microsoft to work on the Spec# programming system.

I watched an impressive presentation on the tool that included real-time IDE integration and syntax highlighting. Spec# works off of IL with the implication that eventually all the .NET languages will eventually support specifications. Indeed, the presentation was called “Why Every Language Should (Will) have Specifications.”

Spec# requires that all calls to functions with specifications be provably true either through existing conditions within code or by using explicit invariants. In this sense, specifications act like an extended type system and can result in code growth by a substantial percentage; however, the benefit is the decreased need to add in unit tests for boundary conditions. This is quite different from my approach in which any assertions are simply checked not to be violated; as a result, no new syntax is required.

I referred Mike to a paper on ESC/Java that detailed prior experience with using ESC. One issue was annotation overhead. The authors indicated that up to 10% of code would need to include some kind of specification.

ESC/Java is an annotation-based checker, which relies on the programmer to provide annotations giving lightweight specifications for each routine. Thus, one of the costs of using ESC/Java is the overhead of writing the necessary annotations. In most cases, these annotations are straightforward
and they document basic design decisions, for example, that a method argument should never be null, or that an integer variable should be a valid index to a particular array.

Our experience in annotating Javafe and Mercator indicates that roughly 40–100 annotations are required per thousand lines of code…

For both of these programs, the annotations were inserted after the program was written using an iterative process: The program was first annotated based on an inspection of the program code and a rough understanding of its behavior;
this initial set of annotations was subsequently refined based on feedback produced by ESC/Java when checking the annotated program. Typically, we found that a programmer could annotate 300 to 600 lines of code per hour of an existing, unannotated program. This overhead is expensive for larger programs and is an obstacle to using ESC/Java to catch defects in large, unannotated programs. While it is possible to use ESC/Java only on selected modules, it is still necessary to annotate all the routines called in other
modules. We are investigating annotation inference techniques [17, 16] that help reduce the annotation burden on legacy code.

Instead of writing annotations after the program has been developed, a better strategy for using ESC/Java is to write appropriate annotations and run the checker as early as possible in the development cycle, perhaps after writing each method or class. Used in this manner, ESC/Java has the potential to catch errors much earlier than testing, which can only catch errors after the entire program or an appropriate test harness has been written. Since catching errors
earlier makes them cheaper to fix, we suggest that using ESC/Java in this manner may reduce software development costs in addition to increasing program reliability.

The experience is familiar to anyone who changes parameters types in an existing interface to incorporate C++ const modifiers or .NET generic constraints.

I did comment about about some of the limits of Spec# with exceptional phenomenon like iterators, closures, virtual functions, etc. According to the Spec# website, the tool is quite sophisticated, being able to deal with callbacks, threads, and inter-object relationships. The Spec# team does not use a closed-world assumption for handling virtual functions, but which actually does make sense for a stand-alone application versus a class library that whose classes could be extended.

I asked him about loop handling and indicated to him how I was using recurrence relations to deal with the problem. Early papers on Spec# indicated that assigned variables within a loop were assigned a havoc value, which essentially means that variables are assumed to contained anything. ESC/Java attempts a different approach.

Therefore, ESC/Java approximates the semantics of loops by unrolling them a
fixed number of times and replacing the remaining iterations by code that terminates without ever producing an error. This misses errors that occur only in or after later iterations of a loop…. By default, we unroll loops one and a half times (the half refers to an additional execution of the loop guard):

He seemed to indicate some complicated analysis being done in this area.

Mike entertained questions about decidability from another attendee. Spec# can verify all linear assertions. But nonlinear expressions, like x*y == 5, are not completely handled by the system and therefore could require explicit invariants. For undecidable cases, Spec# includes a timeout and conservatively issues errors in such case.

I also inquired about patents on Spec# mentioning Patent #7,058,925. He wasn’t actually aware of any patents and stated that all the work is actually published. The patent I mentioned was more likely filed by Microsoft research group that produces the SLAM tool, used in Office and Windows. Anyway, he mentioned that Microsoft’s use of patents is primarily defensive (although, Microsoft’s recently announced Windows 12 Principles does talk about reasonable terms in licensing patents.) 

Comments

I've been subscribed to the Spec# mailing list for quite some time. In that time, I've seen some real world issues that come in to play.

The Spec# verifier tool, Boogie, is sophisticated, but doesn't go too deep in some situations. IIRC, the something like the following code passes verification:

object foo = new object();
SetFooToNull(foo);
foo.Bar();

Some seemingly simple things like that caught my eye and made me wonder how useful such a tool really is.

Spec# also introduces non-null types, using the exclamation point syntax:

string! nonNullString;

While this seems like a good idea, there's a whole host of problems related to static initilization of non-null types, not to mention a lot of extra effort and wide code changes to change something from a regular reference type to a non-null type. There's so many extra pieces of code required and so many gotchas to verify these things work, I almost question whether it's truely worth the effort.

I also noticed whilst watching the mailing list that many people hit gotchas--for example, simple arithmetic operations--that that the Spec# complier doesn't or can't verify, so consumers have to fudge it by telling the compiler, "assume this is gonna be less than 10" or whatever happens to be your assumption. In other words, often times, your Spec# annotations will contain annotations basically reverting you back to regular C#.

Another things that raises eyebrows for me is that the pre- and post- conditions introduced by Spec# can call arbitrary methods, dubbed "Pure" methods. These "pure" methods are decorated with the Pure attribute...all that sounds fine. But the developer must guarantee that Pure methods don't modify any state. While that sounds reasonable, aren't we running in circles? Here we're introducing all sorts of burdens on developers in order to force them to write verifiable programs, but then in many parts of the software we are crossing our fingers and hoping they don't make false assumptions about some result (in order to "trick" the Spec# compiler into allowing something to happen, as described earlier), or by hoping the developer doesn't modify state in pure functions. Seems like we're doing a bit of backtracking here.

I suppose, though, that if Spec# results in your code having fewer bugs, it may be worth it in the long run. More time investment writing code up front, but less time fixing it in the long run.

"We are investigating annotation inference techniques"

I believe Spec# has something similar to "annotate" the entire .NET framework class library, giving it non-null types, pre- and post-conditions, etc.

The comments to this entry are closed.