This is a continuation of my previous post NStatic Presentation.
This half of the presentation was more about technical details. Many in the audience didn't really seem to understand what I was trying to say.
Slide# 11 -- Smarter Analysis
I stated in my presentation that my goal in this product is the illusion of true understanding. What do I mean by true understanding? Well, I think it's somewhat like the Turing test, which claims that a machine could be said to think, if it is indistinguishable in behavior from a human being.
Can I achieve some semblance of a human-like intelligence in my tool, with performance that is indistinguishable from a code reviewer with no prior experience with the code base? I am still far away from that point by my reckoning, but there are three important ingredients that get me closer to the dream: Symbolic manipulation, interprocedural analysis, and the application of general abstract principles rather than specific rules.
Slide #12 - 13 - Symbolic Manipulation
- Everything is represented by expressions, manipulated algebraically
- Expressions are reduced to canonical form
- Fast, but requires many transformation rules
- May contain lambda and conditional operators
- Higher-order functions are naturally supported
- Advantages of thinking symbolically
- Analyze code fragments in isolation
- Effectively deal with unknowns and ambiguities
The data structure that I used store and represent values is an expression in a symbol functional language, based on lambda calculus. Since the language is Turing-equivalent, we can represent and evaluate the value of any variable in the program. What's important is that operations in this language can work directly with symbols rather than numbers. (Numbers by the way are just accidents. While numbers in C-based language seems to be omnipresent, programs are regularly written in sizable functional languages like Lisp and Haskell without any use of numbers.)
Slide #14 - Symbolic Manipulation: Arithmetic
Through the magic of functional pattern matching, we evaluate expressions by replacing patterns with the expression with simpler forms that known to be equivalent. When we evaluate expressions in this way, the result is often a simplified version in a canonical normal form. The benefit of reducing expressions to normal forms is that we can quickly compare them two expressions for equality.
The examples below illustrate some basic simplifications of arithmetic and comparison operations.
Slide #15 - 17 - Symbolic Manipulation: Conditionals
Our symbolic manipulation also extends to conditionals as depicted in the following example.
Traditionally, instead of constructing a functional if expression, a static analyzer might either use ternary values (true, false, don't-know), in which case data would be marked with "don't-know," or the analyzer might insert into a knowledge base a disjunctive clause such as p < 1 && data == "hello" || p >= 1 && data == null.
Here we see the value of data in an extended version of the previous code. Because we know the value of p, we actually can simplify the value of data to "hi."
The conditional operator arises naturally in other contexts such as when a field accessor is being modified.
Slide #18 - Symbolic Manipulation: Loops
- Denotational Semantics
Each procedure is converted to side-effect- free functions
- Completely general, precise translation
- No more dichotomy: control flow becomes data
- Imperative constructs converted to equivalent functional form in lambda calculus
- Control statements are modeled using recursive lambda expressions instead of a control flow graph
We use denotational semantics to represent our code. A functional representation can be analyzed and simplified much more easily.
Changed loop variables are assigned the result of an applied recursive lambda expression or a closed form thereof.
This gives us the ability to relate the results of loops with each other and with other functions. In this example, if fact is a recursive method that computes the factorial function, then f==fact(n) would be true, since they have the same canonical form.
Slide #19 - 22 - General Principles
I spoke about General Principles versus Specific Rules in a prior blog post on "Intelligence versus Intellisense." I wrote of a basic philosophical goal of mine, which is to replicate the actual human reasoning process as faithfully as possible, often choosing less efficient methods for better fidelity.
One of the first types of analysis is Exception Analysis:
- Exception Analysis
- System exceptions
- User-defined exceptions, unequivocally fired
- Failed asserts
- Framework parameter validation via IL interpretation
The tricky part is determining whether an exception is fired.
Another type of analysis relies on notions of redundancy and effectiveness.
- Notions of Redundancy and Effectiveness
- Compares expressions and state of the world before and after evaluation
- Complex expressions (including function calls) that evaluate to constants
- Assignment to a variables is same as current value
- Redundant parameter - parameter is a function of other parameters/globals
- Infinite loops, no side effects
Again the difference here from other products is how I actually compare expressions and the state of the world before and after evaluation. Nothing is based on low-level analysis of syntax but rather a higher-level semantic analysis.
Slide #23 - Interprocedural Analysis
An important part of understanding code is knowing the relationships between each of the methods. Analyzing each method in isolation doesn't lead to intelligent analysis, if all the callee functions are not examined.
Below is a call stack window showing a stack trace in Rotor, depicted an interprocedural call.
I also brought up the point of the "Ghetto," which is how existing tools focus only on primitive types and low-level system exceptions like null references.
I do want to move beyond that point and get out of the ghetto, finding errors on complex, user-defined types and functions, not just primitive types. I also want to effect manipulate higher-order functions.
Slide #24 - IL Interpretation
Closely connected with interprocedural analysis is IL interpretation. We could analyze our codebase, but still miss a big part of code that is executed. The framework and third-party libraries are just as important for intelligent static analysis.
Since I discovered a way to perform interprocedural analysis very quickly, it made sense for me to pursue IL interpretation of system frameworks, rather than hardcode parameter validation.
In the examples above, I show how IL interpretation can occur symbolically with the results from Math.Max(a,b) in mscorlib.dll.
Slide #25 - Performance
As I mentioned earlier, I used a functional approach, that corresponds to directed higher-order equational logic (aka universal algebra). This functional approach is a quite different from the logical approach used by existing advanced static analyzers. The functional approach is faster, because it has the efficiency and determinism of program execution and not the wastefulness of nondeterministic search. The logical approach is not just slower, but also more limited being a decision procedure, which can only answer yes/no questions.
Slide #26- Possible Future Features
I listed a set of possible future features.
Half of these related to integrating with a process--command-line execution, custom rules and transformations, XML export. The other items include specifications and enhanced debugging capabilities like immediate window and friendlier breakpoints. I am also contemplating VB.NET support and IL-level analysis for other unsupported .NET languages.