« NStatic Presentation | Main | Continuation Passing Style & Anonymous Methods »

February 07, 2007

NStatic Presentation, II

This is a continuation of my previous post NStatic Presentation.

This half of the presentation was more about technical details. Many in the audience didn't really seem to understand what I was trying to say.

Slide# 11 -- Smarter Analysis

I stated in my presentation that my goal in this product is the illusion of true understanding. What do I mean by true understanding? Well, I think it's somewhat like the Turing test, which claims that a machine could be said to think, if it is indistinguishable in behavior from a human being.

Can I achieve some semblance of a human-like intelligence in my tool, with performance that is indistinguishable from a code reviewer with no prior experience with the code base? I am still far away from that point by my reckoning, but there are three important ingredients that get me closer to the dream: Symbolic manipulation, interprocedural analysis, and the application of general abstract principles rather than specific rules.

Slide #12 - 13 - Symbolic Manipulation

  • Everything is represented by expressions, manipulated algebraically
  • Expressions are reduced to canonical form
    • Fast, but requires many transformation rules
    • May contain lambda and conditional operators
    • Higher-order functions are naturally supported
  • Advantages of thinking symbolically
    • Analyze code fragments in isolation
    • Effectively deal with unknowns and ambiguities

The data structure that I used store and represent values is an expression in a symbol functional language, based on lambda calculus. Since the language is Turing-equivalent, we can represent and evaluate the value of any variable in the program. What's important is that operations in this language can work directly with symbols rather than numbers. (Numbers by the way are just accidents. While numbers in C-based language seems to be omnipresent, programs are regularly written in sizable functional languages like Lisp and Haskell without any use of numbers.)

Slide #14 - Symbolic Manipulation: Arithmetic

Through the magic of functional pattern matching, we evaluate expressions by replacing patterns with the expression with simpler forms that known to be equivalent. When we evaluate expressions in this way, the result is often a simplified version in a canonical normal form. The benefit of reducing expressions to normal forms is that we can quickly compare them two expressions for equality. 

The examples below illustrate some basic simplifications of arithmetic and comparison operations.

Slide #15 - 17 - Symbolic Manipulation: Conditionals

Our symbolic manipulation also extends to conditionals as depicted in the following example.

Traditionally, instead of constructing a functional if expression, a static analyzer might either use ternary values (true, false, don't-know), in which case data would be marked with "don't-know," or the analyzer might insert into a knowledge base a disjunctive clause such as p < 1 && data == "hello" || p >= 1 && data == null.

Here we see the value of data in an extended version of the previous code. Because we know the value of p, we actually can simplify the value of data to "hi."

The conditional operator arises naturally in other contexts such as when a field accessor is being modified.

Slide #18 - Symbolic Manipulation: Loops

  • Denotational Semantics

Each procedure is converted to side-effect- free functions

    • Completely general, precise translation
    • No more dichotomy: control flow becomes data
      • Imperative constructs converted to equivalent functional form in lambda calculus
      • Control statements are modeled using recursive lambda expressions instead of a control flow graph

We use denotational semantics to represent our code. A functional representation can be analyzed and simplified much more easily.

Changed loop variables are assigned the result of an applied recursive lambda expression or a closed form thereof.

This gives us the ability to relate the results of loops with each other and with other functions. In this example, if fact is a recursive method that computes the factorial function, then f==fact(n) would be true, since they have the same canonical form.

Slide #19 - 22 - General Principles

I spoke about General Principles versus Specific Rules in a prior blog post on "Intelligence versus Intellisense." I wrote of a basic philosophical goal of mine, which is to replicate the actual human reasoning process as faithfully as possible, often choosing less efficient methods for better fidelity.

One of the first types of analysis is Exception Analysis:

  • Exception Analysis
    • System exceptions
    • User-defined exceptions, unequivocally fired
    • Failed asserts
    • Framework parameter validation via IL interpretation

The tricky part is determining whether an exception is fired.

Another type of analysis relies on notions of redundancy and effectiveness.

  • Notions of Redundancy and Effectiveness
    • Compares expressions and state of the world before and after evaluation
  • Examples
    • Complex expressions (including function calls) that evaluate to constants
    • Assignment to a variables is same as current value
    • Redundant parameter - parameter is a function of other parameters/globals
    • Infinite loops, no side effects

Again the difference here from other products is how I actually compare expressions and the state of the world before and after evaluation. Nothing is based on low-level analysis of syntax but rather a higher-level semantic analysis.

Slide #23 - Interprocedural Analysis

An important part of understanding code is knowing the relationships between each of the methods. Analyzing each method in isolation doesn't lead to intelligent analysis, if all the callee functions are not examined.

Below is a call stack window showing a stack trace in Rotor, depicted an interprocedural call.

I also brought up the point of the "Ghetto," which is how existing tools focus only on primitive types and low-level system exceptions like null references.

I do want to move beyond that point and get out of the ghetto, finding errors on complex, user-defined types and functions, not just primitive types. I also want to effect manipulate higher-order functions.

Slide #24 - IL Interpretation

Closely connected with interprocedural analysis is IL interpretation. We could analyze our codebase, but still miss a big part of code that is executed. The framework and third-party libraries are just as important for intelligent static analysis.

Since I discovered a way to perform interprocedural analysis very quickly, it made sense for me to pursue IL interpretation of system frameworks, rather than hardcode parameter validation.

In the examples above, I show how IL interpretation can occur symbolically with the results from Math.Max(a,b) in mscorlib.dll.

Slide #25 - Performance

As I mentioned earlier, I used a functional approach, that corresponds to directed higher-order equational logic (aka universal algebra). This functional approach is a quite different from the logical approach used by existing advanced static analyzers. The functional approach is faster, because it has the efficiency and determinism of program execution and not the wastefulness of nondeterministic search. The logical approach is not just slower, but also more limited being a decision procedure, which can only answer yes/no questions.

Slide #26- Possible Future Features

I listed a set of possible future features.

Half of these related to integrating with a process--command-line execution, custom rules and transformations, XML export. The other items include specifications and enhanced debugging capabilities like immediate window and friendlier breakpoints. I am also contemplating VB.NET support and IL-level analysis for other unsupported .NET languages.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345242f069e200d83433c85853ef

Listed below are links to weblogs that reference NStatic Presentation, II:

» Can a .NET code analysis tool approach human intelligence? from DotNetKicks.com
You've been kicked (a good thing) - Trackback from DotNetKicks.com [Read More]

» Wesner Moise presents NStatic from Anatoly Lubarsky
[Read More]

Comments

Hello!

My question is:
Did you consider analyzing multi threaded applications?
I mean statically checking deadlocks / race conditions / missing memory barriers?
Or is it outside the theoretically possible static verification?

"command-line execution"

Wes, that would be a big plus for us. We could integrate that into a continuous integration (CI) server so that after each code commit, NStatic is run via command line by the automatic CI server. Right now we can have a CI server run all our unit tests, run FxCop...it would be nice to have it run NStatic as well. Obviously, for that to be possible, NStatic would need to be command-line operable.

Wes, I suspect that if this presentation part 2 didn't go over well with the audience, it is because it is very technical, getting into AI and deep functional programming concepts. Most devs I know have very little understanding of AI (myself included), and very few know what functional programming is; only have heard of it a little in college.

FWIW, I found the post very interesting despite my limited AI knowledge. It will be interesting to see where this focus on AI and functional techniques lands you and how it will separate your tool from other static analysis tools.

I haven't seriously thought about threads; I need to see how other products like FindBugs, TeamSuite deal with threading first. I suspect any bugs found in those products would be possible to implement inside my product.

If any error manifests itself as a recognizable pattern within the code, this would be something easy for me to catch.

However, I dont think I would be able to capture any errors that require "lane" analysis, which I think in used in RaceTrack and Spec#.

When I do command-line execution, I want to do it well and get a lot of feedback, but it isn't a priority in the initial release.

Dumping rich information into the command line is problematic. There are issues that I am need to look into.

If I dump an XML file, should I be able to open it again and view the contents in NStatic?

Should I dump graphical snapshots of each error with all the highlighting? Would each bug have it's own folder?


Wes, I'd say go for some XML file. For example, one could open up NStatic through the UI, setup all the settings, then save the project to this file. NStatic could then later open this file through the UI, but we could also send this file to the command line NStatic.

I know of at least 2 other dev tools we use that do something similar. For example, XenoCode obfuscator does this, where their XML files are yourProject.postbuild which you can save and load from the UI, but can also can supply to the command line.

From command line, errors are simply output to standard output as text. I'm not sure how well this works for NStatic, but it's a thought.

Regarding multi-threading, seems to me a first step is to get threaded code correct and bug free before even looking at potential multi-thread bugs. Once in place, perhaps another layer could be added to NStatic to check for potential multi-threading errors.

I'll look at the continuous integration features once I release the first version, and take into account your desires...

It's too early for me to consider them, when I have to get the base technology out.

As for multithreading, I also look at that later. It turns out that there is a way to represent concurrency denotationally, but it necessarily introduces complexity and performance issues, such that there would probably have to be a separate mode to turn it on.

where can i get this tool

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment