About

I am a software developer in Seattle, building a new AI software company.

Ads

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Recent Posts

Ads


January 11, 2008

Principle of Most Power

Turing Completeness

Yesterday was Donald Knuth's seventieth birthday. I used to own the three volumes of his Art of Computer Programming, and have even quoted the book once in my blog regarding corountines. Although the content was valuable, I could not tolerate the use of the MIX assembly language with which Knuth used to implement his algorithms.

Donald Knuth also invented TeX, a markup language for typesetting. As proof of its greatness, Mark Chu-Carroll notes TeX remains dominant 30 years after Knuth wrote the original version. It also happens to be Turing complete. There are other widely used languages, which are Turing-complete, such as PostScript for generating graphical images. The template facility in C++ is Turing-complete by accident, and is used for metaprogramming and performing all sorts of optimizations that used to be done by the compiler.

Principle of Least Power

I encountered a few years ago, Tim Berners-Lee "Principles of Design," which lists a number of design principles for the web architecture.  One of the principles listed in the "Principle of Least Power," which states that we should use a low power language that is easy to implement and use.

In choosing computer languages, there are classes of program which range from the plainly descriptive (...HTML) though logical languages of limited power ...which include limited propositional logic, though declarative languages which verge on the Turing Complete (PDF) through those which are in fact Turing Complete though one is led not to use them that way (XSLT, SQL) to those which are unashamedly procedural (Java, C).

The choice of language is a common design choice. The low power end of the scale is typically simpler to design, implement and use, but the high power end of the scale has all the attraction of being an open-ended hook into which anything can be placed: a door to uses bounded only by the imagination of the programmer.

Computer Science in the 1960s to 80s spent a lot of effort making languages which were as powerful as possible. Nowadays we have to appreciate the reasons for picking not the most powerful solution but the least powerful. The reason for this is that the less powerful the language, the more you can do with the data stored in that language. If you write it in a simple declarative from, anyone can write a program to analyze it in many ways. The Semantic Web is an attempt, largely, to map large quantities of existing data onto a common language so that the data can be analyzed in ways never dreamed of by its creators. If, for example, a web page with weather data has RDF describing that data, a user can retrieve it as a table, perhaps average it, plot it, deduce things from it in combination with other information. At the other end of the scale is the weather information portrayed by the cunning Java applet. While this might allow a very cool user interface, it cannot be analyzed at all. The search engine finding the page will have no idea of what the data is or what it is about. This the only way to find out what a Java applet means is to set it running in front of a person.

The principle makes sense for the Web in which a new device or a new application with unforeseeable constraints may need to interact with or browse the Internet. However, I do not think that we should automatically apply this principle to other domains, yet I have seen this particular design principle crop up again in a few blog posts since then. There are widely used "standard" languages like PostScript, mentioned earlier, that are Turing-complete. When I look at XAML, I see a language crying for Turing power, because I see a lot of half-measures like resources, triggers, property expressions, templates, etc.

Another View -- Principle of Most Power

In "Software Factories and the modeling problem," Steve Maine posits that a triangular tradeoff between efficiency, generality, and precision exists for any modeling language, for which I don't think is necessarily true. I think that his examples of languages favoring generality but sacrificing efficiency involve procedural languages like C# and Cw, which are more complicated that purely functional languages.

My own work over the years have focused on designing a language that is Turing-complete in some respects and provides many of the advantages of functional and logical languages. My concern was that some of these declarative languages used in academia were not practical or fast enough for building applications.

The primary language is C#, but programs in the secondary language are constructed and executed at runtime. Being functional, the language consists of immutable symbolic expressions, which are much easier to analyze that procedural programs. The language also incorporates various forms of logical reasoning. The language is used for representing all types of data and knowledge including documents, mathematical expressions and natural language.

My other concern was non-termination. Execution of the language is through partial simplification, always terminating in a reasonable amount of time, enabling its use within an interactive application. I satisfy myself with a notion of Turing-expressiveness in which the partially simplified expression is extensionally equivalent to its fully simplified value.

Many Different Languages

Usually, advocates of the "Principle of Least Power" also believe that, in the future, people will be using all sorts of languages, because no one language can serve every purpose.

I do not agree with the idea of a need for a dozen different languages. I guess I would prefer just one. With all the new LINQ features, I am seriously considering moving all my scripts from various dynamic languages to C# 3.0 with Dot Net Script or Code Runner.Net.

One of the original goals of C++ was to end the use of many specialized languages by supporting the creation of libraries for custom data types through templates, operator overloading, and object orientation. One problem with operator overloading in the "C"-based languages was operator "overloading," because there were very few standard operators offered in the language.

Emerging trends include metaprogramming and integrated support for DSLs within the languages, allowing these DSLs to use the symbols and rich computational capabilities of the host language.

I think the various advantages and disadvantages of dynamic and static languages (and other language classifications) are based on existing implementations, and they should disappear over time.

This is not to say there won't be many languages in the future. We now have a common language runtime and a dynamic language runtime that raises the base level of new languages with automatic support for garbage collection and reflection. Microsoft is also working on the Phoenix project that promises to a common compiler infrastructure to enable many different languages to benefit from advances in parsing, compiler, and optimization technologies at the same time.

January 02, 2008

Human-like Reasoning

My goal in my static analysis work is not to solve general problems like the Four Color Theorem, but rather to simulate closely the human reasoning process to solve problems that are generated, comprehendible, and solvable by normal people. Practical and deep, rather than complete.

Humans write programs to be understood by other humans. Human understanding comes about instantaneously or with relatively little effort compared to the work performed by many analysis tools. We typically don't unroll loops and recursion, but instead "get" the meaning of programs by recognizing a few standard patterns. We look at a program and instantly see a sorting operation taking place, for instance. Programs typically consist of these recurring patterns. In many programs, the only complex data structures used are a standard few like arrays, hashtables, linked lists and stacks. My approach to reasoning adheres closely to the way humans think, even when more efficient computational approaches exist, so as to align my software with human performance; the one exception is when non-human-like approach is strictly better in every way, yet yields identical results.

Stephen Wolfram, inventor of Mathematica, wrote a post Mathematics, Mathematica and Uncertainty touching on some of these themes. Even with the more advanced symbolic capabilities of Mathematica--virtually infinite compared to more than my own software package, Wolfram encountered limits trying to use his own system to verify itself.

Sometimes we use the symbolic capabilities of Mathematica to analyze the raw code of Mathematica. But pretty quickly we tend to run right up against undecidability: there's a theoretical limit to what we can automatically verify.

Yet, I think that our approaches deviate somewhat in that Mathematica isn't geared to testing programs. Also, the Mathematica code base is not typical. Being a program for doing mathematics, it naturally consists of advanced mathematics, difficult for mortals to understand and involving non-linear arithmetic, which is theoretically undecidable.

The code was pretty clean. But it was mathematically very sophisticated. And only a very small number of experts (quite a few of whom had actually worked on the code for us) could understand it.

Another point that Wolfram makes is that, contrary to my approach, Mathematica typically avoids imitating humans but instead uses complicated, non-human-friendly procedures extending hundreds of pages in length, roughly the size of the entire analysis portion of my code base.

Think of almost any mathematical operation. Multiplying numbers. Factoring polynomials. Doing integrals. There are traditional human algorithms for doing these things.

And if the code of Mathematica just used those algorithms, it'd probably be quite easy to read. But it'd also be really slow, and fragile.

And in fact one of the things that's happened in Mathematica over the past decade or so is that almost all its internal algorithms have become very broad "polyalgorithms"---with lots of separate algorithms automatically being selected when they're needed. And routinely making use of algorithms from very different parts of the system.

So even if there's some algorithm for doing something that's written out as a page of pseudocode in a paper or a book, the chances are that the real production algorithm in Mathematica is hundreds of pages long---and makes use of perhaps hundreds of other sophisticated algorithms in other parts of the system.

But such algorithms were not constructed to be understandable---and much like things we see in physics or biology, we can see that they work efficiently, but they're really difficult for us humans to understand.

And in general, the way Mathematica works inside just isn't very "human friendly"... Like when Mathematica does an integral. It doesn't use all those little tricks people learn in calculus classes. It uses very systematic and powerful methods that involve elaborate higher mathematics.

Computation as the Ultimate Metaphor

Rodney Brook, an AI professor at MIT and author of Flesh and Machine, wrote this response "Computation as the Ultimate Metaphor" to a question posed by the Edge Foundation, "What have you changed your mind about during the 2007?"

The Edge Foundation is a site that I have long subscribed to by email. [No RSS feed is available]. The Edge promotes discussion by the "third culture," which, according to its words, consists

of those scientists and other thinkers in the empirical world, who through their work and expository writing, are taking the place of the traditional intellectual in rendering visible the deeper meanings of our lives, redefining who and what we are.

Rodney recently questioned his belief in computation as the ultimate metaphor. Computer scientists, he writes, are ultimately biased towards their discipline when they attempt to explain human behavior in terms of computers, just as molecular biologists see their "their level of mechanistic explanation as being ultimately adequate for high level mechanistic descriptions such as physiology and neuroscience to build on as a foundation."

Those of us who are computer scientists by training, and I'm afraid many collaterally damaged scientists of other stripes, tend to use computation as the mechanistic level of explanation for how living systems behave and "think".  I originally gleefully embraced the computational metaphor.

Such a pattern, he argues, in fact recurs regularly at different stages in the history of technological advancements.

If we look back over recent centuries we will see the brain described as a hydrodynamic machine, clockwork, and as a steam engine.  When I was a child in the 1950's I read that the human brain was a telephone switching network.  Later it became a digital computer, and then a massively parallel digital computer.  A few years ago someone put up their hand after a talk I had given at the University of Utah and asked a question I had been waiting for for a couple of years: "Isn't the human brain just like the world wide web?".  The brain always seems to be one of the most advanced technologies that we humans currently have.

I must admit that I do subscribe to the view of computation as the ultimate metaphor, at least in its applicability to the human thought process. While I admit that I may be biased as a software developer, I still retain that belief despite his arguments. It may even be true that the both the computer scientist and the molecular biologist are both right in their thinking, but that each one is exploring a different facet of human behavior that is more appropriate to each discipline.

The universe is a computer, and everything that happens in it--everything produced--is the result of a computation. Everything is a program. By the principle of universality, we can construct a functional expression for it all. My static analysis tool, for example, instantly produces a functional expression for every variable in the program. Each expression is basically a program showing how the value could be computed, but the expression can also be viewed as the value itself, possibly unsimplified.

The same applies to our thoughts and all our natural language utterances. My own work focuses on representing and manipulating natural language as little programs or functional expressions rather than using traditional forms of knowledge representation. Words are "evaluated" by applying their definitions. This has the nice benefit of a tighter relationship between syntax and semantics, better handling of ambiguity, and ease of use similar to other primitive data types. I only became aware of other research manipulating natural language in a functional way through this paper entitled "Realization of Natural Language Interfaces Using Lazy Functional Programming" mentioned in Lambda the Ultimate. "Oh, my god! they stole my idea," I thought until I discovered that functional Montague grammars were invented way back in the 1960s.

November 14, 2007

Semantic Computing

Walter Stiers from the Academic Relations Team at Microsoft wrote that Microsoft Research is accepting proposals for Semantic Computing. His post caught my eye, because of what I am working on, which can basically be described as semantic computing.

I like the phrase used, "semantic computing." It reminded me of the another phrase coined by the Data Access Team "conceptual layer" to refer to the higher level of abstraction offered by LINQ and expression trees.

My main concern was whether Microsoft invading my turf and whether I would have to put up a fight. It was just a false alarm as Microsoft seems preoccupied with understanding the web to aid in search requests: The program is entitled "Beyond Search" and has two tracks, Semantic Computing and Internet Economics. Microsoft's intent is laid out explicitly in the actual program description, which is to improve search and then milk the advertising revenue: "The total spent by Internet advertisers in 2005 is estimated at $8.3 billion, a growth of 13.3% from 2004."

I noticed that the industry's natural language and other "semantic" efforts seem focused on narrow range of uses such as search engines and command-and-control. Search seems to be in Microsoft's vision because it is demonstrably monetizable and currently dominated by mortal enemy Google. The company seems wary appropriately of putting money on risky black hole projects with poor business prospects.

It's going to be a while before we see deep understanding in every application. Well, it already takes forever for new features to be added to existing applications, where fifty developers add less than fifty features over a couple years.

May 15, 2007

Symbolic Computing

I picked up from Slashdot that Mathematica 6, a program for doing computer algebra, was released. Among the features are equational theorem proving, which is similar to the work that I am doing.

More than any other product, Mathematica embodies symbolic computing, and this recent post on the Wolfram blog, Symbolic Programming Visualized hints why. The Mathematica language is based entirely on repeated transformations on symbolic expressions through pattern-matched rules.

I have never seen any mention of symbolic computing in anyone's dreams for future programming languages. Instead, I see a laundry list of incremental features for the next "big" programming language. Why can't the language be small, incorporating only the most expressive ideas.

Symbolic computing is the blind spot of the technology community. Even Turing thought of both brains and computers as manipulating symbols. Methinks, several years from now, we will probably see again the next revenge of LISP, which are operations on symbols.

According to Microsoft researchers in their roadmap  "towards 2020 science," symbolic computation will not be integrated into programming languages until 2012. (We do see a precursor in LINQ expression trees, being introduced in Visual Studio Orcas. There are also glimpses in some of the newer functional languages.) This gives me five years headstart.

If we want intelligent software, symbolic computing is the way to get there.

March 16, 2007

Code and Data

Usually, when I mention functional programming to other developers, they will unconsciously close their arms and legs—a sure sign of resistance. These developers are naturally suspicious of another new, better “programming” paradigm, so I have learned to zip my mouth. They don’t see the possibility of code becoming more declarative and looking exactly like specification. My blog traffic has diminished after I have started talked about programming in a more functional way.

In the MVP summit, I talked to a developer, who was a known functional programming advocate, about how I merge the notions of code and data.

I mentioned my static analysis tool, NStatic, and how I use everywhere an “expressions” data structure, which is like an immutable abstract syntax tree, but also a functional language in which I can evaluate any expression to its normal form. Expressions are used to represent such things are traditional algebraic expressions, code, natural language, and all sorts of other documents.

Document Operations & Transforms

In the case of a wordprocessor that I am also developing, documents are represented as expressions. In addition, document operations are constructed as functions in my expression language which are then applied to the document expression to return an entirely new expression.

The advantage of this approach is that I can support application-wide functionality by applying a transform to all document operations in my application. Transforms are functions that convert any function into a new function by recursively walking through an expression and applying changes based on pattern-matching (very  much like a derivative or Fourier transform). Control-flow operators like lambdas and conditionals themselves have higher-order transformations that work with any function, and thus any transform.

Transforms can only easily be implemented over code written in a functional way. Code in other paradigms would need to be converted to a functional representation first, as I have done with NStatic, and then converted back (or retained) after the transformation. A common refrain among FP enthusiasts is that functional code is easier to prove, but what the real benefit is that functional code is easier for a computer to analyze and manipulate--ie, easier to transform.

Have you ever written an application in which the same transformation needed to be manually coded into a number of different classes? The transform is stored inside our head and then mentally applied to an each method.

Such transformations couldn’t be previously automated because functions were opaque and the language mechanisms like virtual functions are too coarse. With transforms, new application-wide functionality can be written just once, not interleaved within code in several other unrelated classes. I can also easily integrate thirty-party functionality or externalize features this way, because the transformations are not burned in at compile-time.

In Microsoft Secrets, Michael Cusumano wrote about Excel's "Am I Done Yet" list in page 319, a list of concerns that must be considered for every new application feature added:

Microsoft projects have also been compiling metric-based checklists to help determine feature and product completion. For example, Excel has a six-page list of criteria entitled "Am I Done Yet?" This groups completion criteria into twenty-six categories, such as menu commands, printing, interaction with the operating system, and application interoperability (see Table 5.4). A program manager, developer, or test uses the checklist to help evaluate whether a feature is complete.

I wonder how many of these items in the entire "Am I Done Yet" list would just disappear, if program features were written using transforms. Probably a lot. Transforms would make things like revision marking, selection tracking, simultaneously editing a lot easier to write. The application developer could just focus on writing the essential feature and be rest assured that transforms would take care of much of the rest. 

Code & Data And Lisp

The developer whom I spoke to in the summit then brought up that Lisp supports treating code as data, simply by quoting code and performing explicit evaluation using the “EVAL” operator. Having coded in the language in the late ‘80s, I was already familiar with Lisp.

The problem with LISP is that it is not as tight and clean as it could be. Unfortunately, the way it mixes code and data is seen as the correct approach, and thus people look no further at alternative cleaner approaches.  LISP still distinguishes between code and data: Functions are opaque (their definitions are inaccessible) just as in imperative programming languages. Also, code stored as data can only be executed once all free symbols are removed. For example, (+ x 1 2) would evaluate to undefined symbol error or, if the symbol x were quoted, a type-mismatch error rather than the correct approach (+ x 3).

However, if the standard LISP functions could already operate on symbols, then both quoting and calling the EVAL function, would become unnecessary. In addition, if execution proceeded lazily, LISP macros would also become unnecessary as every function would also be a macro.

December 28, 2006

Hard Problems, Simple Solutions

In my previous post on Fabricated Complexity, I wrote about a quote that I found myself repeatedly agreeing with:

The solution to a hard problem, when solved, is simple.

A commenter remarked:

I disagree, though - certainly many things are overcomplicated due to reasons you state, but this doesn't mean every hard problem has a seemingly simple solution.

Two things I want to point out: (1) Notice that I didn’t say every hard problem has a simple solution, since not every hard problem is solvable. (2) This, by the way, is not a hard belief of mine.

I think the basis of my “belief” comes from experiences with functional style programming, where the solution of the problem often turns out to be its specification. I found that the new functional features in C#, iterators (essentially, an efficient implementation of functions that return sets) and anonymous functions, allow me to uncover more simple solutions.

What’s a moderately hard problem to use as an example? Let’s see. Perhaps, building a generalized regular expression matcher might be. Microsoft’s implementation of regular expression matching over strings is spread across 24 files and 14,455 lines of code including comments and whitespace.

Let’s see what happens if we switch to a functional style. We could use the implementation of lazy lists, LazyChain, from the blog of Wes Dyer (developer in C# team) and then port the 14 line Python implementation of a regular expression engine to C# using iterators and anonymous functions. The regularly expression would be composed functionally rather than through a string:

re = Sequence( term1, Plus( term2, Alternate( term3, term4 )), term 5)

and the results would be returned as a collection by applying the regular expression to the list to be searched:

results = re( list_to_search )

I contend that, with this approach, we can get a reasonable implementation of regular expressions over lists of arbitrary types in less than 200 lines of code, which is two orders of magnitude more concise than Microsoft’s more specific implementation over strings. I’ll produce a port of this one day and post it in my blog.

In another point, the fact that a human outperforms a computer in a given programming task serves as a cue to me that maybe the computer algorithm is poorly written. In such cases, I look for an alternative functional implementation which tends to more accurately model how we really think.

Let’s look at resolution-based theorem proving, for example. For a large problem, I wonder if its just faster for the prover to extract from the clauses in the knowledgebase a functional expression equivalent in truth value to the user’s query rather than actually performing the proof. Quantified expressions get turned into lambda expressions, and boolean operations into conditionals. This extraction might be possible in polynomial time, after which we could go about reducing the returned expression to normal form, which is far more efficient than performing a lengthy nondeterministic search.

December 20, 2006

Deduction versus Reduction

In my last post, Software Verification, I criticized the field’s heavy reliance of theorem provers based on predicate logic. I mentioned some obvious problems such as exponential proof procedures, but a Coverity presentation “Selling an Idea or a Product” pointed me to other nonobvious problems. A slide (#17) titled “Myth: more analysis is always better” hinted that more analysis does not always improve results; it can sometimes produce worse results.

  • Ease of Diagnosis – The longer the chain of reasoning, the harder the bug is to reason about, since the user has to manually emulate each step, but, unfortunately, long chains of reasoning are also needed to find some high-level bugs.

    Another related example is the Simplify theorem prover, which can determine if linear arithmetic constraints are contradictory using the Simplex Method. However, such a general procedure may not be very advantageous, since people will typically write simple inequalities that they can easily reason about. A general analysis consumes more resources than a narrow solver. Secondly, a human being may not be able to understand a complicated error (an empty intersection of several inequalities) reported by the Simplex solver and simply dismiss it as a false positive. Once again, such errors are not likely, because humans do not typically write such code in the first place. [I believe that the Simplify prover actually uses a more optimized algorithm, Nelson-Oppen, for simple inequalities.]
  • False Errors – The greater the number of steps, the more likely that the analysis contains an error. If there is false information in the knowledge base, then any assertion can be proved via a refutation-based proof procedure.

The alternative to logical deduction is functional reduction in which expressions are continually reduced to normal forms, so that equivalent expressions will eventually look the same. In other words, algebra not logic. The benefit is greater efficiency due to the elimination of much nondeterminism, deeper inferences, and more opportunities for dynamic programming to tame the unwieldy beast. This approach is also referred to as equational logic and is very similar to high-school algebraic proofs.

Last year I wrote a post, On Intelligence, which is relevant to this post. In it, I quote the words of Stephen Wolfram, on the subject of logic and rules…

But what about capabilities like logical reasoning? Do these perhaps correspond to a higher-level of human thinking?

In the past it was often thought that logical might be an appropriate idealization for all of human thinking. And largely as a result of this, practical computer systems have always treated logic as something quite fundamental. But it is my strong suspicion that in fact logic is very far from fundamental, particularly in human thinking.

For among other things, whereas in the process of thinking we routinely manage to retrieve remarkable connections almost instantaneously from memory, we tend to be able to carry out logical reasoning only by laboriously going from one step to the next. And my strong suspicions is that when we do this we are in effect again just using memory and retrieving patterns of logical argument that we have learned from experience.

The primacy of reduction rules (patterns stored in memory) over logical inference in human thought shouldn’t be surprising given the efficiency of the former; the latter is equally as “laborious” for humans as it is for computers.

December 06, 2006

Bullets Flying My Way

Larry O'Brien has been questioning my critique of Fred Brook's essay, "No Silver Bullets" in my post Lego Programming. Actually, he turned my post into a strawman argument and tied my name to an argument that I didn’t really make—that IDEs are a silver bullet. IDEs certainly have introduced a number of tenfold improvements in area; in most cases, however, IDE improvements are not the bottlenecks in software development. The ability to build an interface in a few hours instead of a few days doesn’t significantly impact a six-month project. Support for third-party controls and object-oriented frameworks, on the otherhand, have eliminated bottlenecks and resulted in dramatic improvements in delivery times. Lotus was able to develop Improv in six months on the NeXT, because of its object-oriented capabilities.

I wrote some facetious comments, which he took up on a second post "Bullets Over Wrong Ways." I'll answer those posts with a more persuasive arguments in another month. My silence now does not indicate agreement or resignation.

I don't want to be drawn into a debate right now, while I am trying to bring a product into a beta. I'll just reprint an edited version of my comment to his last post:

To some extent, I was humoring you in my comments; but I don't think the functional paradigm has been fully fleshed out, but will most likely become the dominant approach decades from now. Much of the Brook's thoughts on silver bullets rests on artifacts of imperative programming like state and side effects. If we eliminate state, for example, our program becomes more like a specification, which I feel also impacts some of the other costs Brook's mentioned such as testing and communication.

I am going to make my point in an upcoming post.

I don't think object-oriented programming is entirely distinct from functional programming. To the extent that object-orientation supports compositional programming, to the extent that an object can be interpreted as an inert function whose arguments correspond to the object's members, an argument can be made that it is close to a functional style.

That's a subject of a future post.

October 01, 2006

Turing Test and Loebner Prize Competition

A couple of weeks ago, the 2006 Loebner Prize competition was held. Back when I was at Harvard in the early 1990s,  the annual Loebner Prize Competition was created as the first real-life version of the "Turing Test," described in Turing's article "Computing Machinery and Intelligence" to answer the question "Can Computers Think?" Turing wrote:

It is proposed that a machine may be deemed intelligent, if it can act in such a manner that a human cannot distinguish the machine from another human merely by asking questions via a mechanical link.

The test is a natural extension of his earlier work on Universal Turing Machines, which can simulate any other machine, to the human mind. The test, however, is controversial. Searle famously counterargued with the Chinese Room experiment. Also, Turing himself made an unsuccessful prediction that the Turing test would be passed by 2000, but tech visionary Ray Kurzweil has willingly bet that Turing Test will be passed by 2029.

The first competition made big news on campus, especially since it was held locally, and I followed the results of the initial competition closely. Since I had a prior interest in AI and natural language processing, I envisioned that one day that I might be the one to actually win the $100,000 prize; unfortunately, the terms of the grand prize has since been expanded to include audio and visual input. The direction of my work has been increasingly intersecting with the aims of the competition, and so maybe one day (perhaps in ten years) it might actually compete.

Instead of summarizing the competition, I'll refer to the text in the Loebner Prize website:

The Loebner Prize for artificial intelligence ( AI ) is the first formal instantiation of a Turing Test. The test is named after Alan Turing the brilliant British mathematician. Among his many accomplishments was basic research in computing science. In 1950, in the article Computing Machinery and Intelligence which appeared in the philosophy journal Mind, Alan Turing asked the question "Can a Machine Think?" He answered in the affirmative, but a central question was: "If a computer could think, how could we tell?" Turing's suggestion was, that if the responses from the computer were indistinguishable from that of a human,the computer could be said to be thinking. This field is generally known as natural language processing.

In 1990 Hugh Loebner agreed with The Cambridge Center for Behavioral Studies to underwrite a contest designed to implement the Turing Test. Dr. Loebner pledged a Grand Prize of $100,000 and a Gold Medal (pictured above) for the first computer whose responses were indistinguishable from a human's. Such a computer can be said "to think." Each year an annual prize of $2000 and a bronze medal is awarded to the most human-like computer. The winner of the annual contest is the best entry relative to other entries that year, irrespective of how good it is in an absolute sense.

Here's a sample transcript from 1996 contest winner. There are many other transcripts available from the prize website. A conversation with some of these contestants can be had online: [TuringHub] [ALICE] [Jabberwacky] [Others]

My computer science professor Stuart Shieber, who, by the way, stoked my interest in natural language, wrote an critique of the contest, "Lessons from the Restricted Turing Test," in the Communications of the ACM journal.

Stuart noted the technology used by most, if not all, of the contestants are still very primitive. They are basically variants of the 1966 computer program ELIZA, just with a larger database of responses. These programs don't do any natural language parsing--relying instead of simple string searches and manipulation--and don't do any logical inferencing.

For example, a long time winner of the annual competition was the ALICE (ArtificiaL IntelligenCE) chatterbot. I downloaded the open-source software six years ago and was dismayed to see how primitive its technology was. Essentially, the program consisted of a database of common questions (with some wildcard support) and canned answers. Another winner, MegaHal, uses statistical methods (Markov models) to generate responses based on prior data such as movie dialogues, encyclopedias, popular quotatons, and hand-crafted sentences. 

Despite the simplistic technology, a few judges over the history of the competition have been fooled into thinking a computer to be a person, and, interestingly, some humans have been thought to be computers. Curiously, the original ELIZA did fool the assistant of its creator, Weizenbaum, into revealing personal information as did another low-tech program in this webpage detailing "How my program passed the Turing Test!"

The Loebner Prize Competition has mostly relied on tricks, such as simulating typing speed and entering non-sequitors, rather than smarts. If I ever had some free time in the distant future and joined the competition, I would rely on genuine attempt to replicate intelligence. It's the thought that counts.

Although Turing's 2000 prediction of a successful Turing Test failed to come true, Turing did rightly predict that computer programs would eventually defeat men in chess. The ability to play chess was Turing's hallmark example of human intelligence. Interestingly, "Turing Test and Intelligence," claims that in chess the Turing Test has already been passed.

Kasparov claims to be able to distinguish computer play from human play. During a simultaneous event over the Internet, he once stopped playing a certain game, saying that he was up against a computer (it was supposed to be human opposition). He was losing at the time! Kasparov has also claimed, while losing against a computer, that the computer was being given human assistance!!

It's interesting to see modern variations of the Turing Test include CAPTCHA's for preventing robots from posting spam, and Amazon's mechanical Turk. (Turk, by the way, was a chess playing machine centuries ago that was powered by a hidden human being.) 

September 08, 2006

Great Developers

Joel have been talking about finding great developers. Every bloggers seems to have their own ideas about what makes a great developer. [1] [2] [3] [4] Maybe I do too.

We could ask widely-acknowledged great programmers, like Stiff did, what makes a great programmer. Ironically, one of these great developers, Linus Torvald, has terrible habits that violate the criteria of each blogger.

Paul Graham, who wrote about Great Hackers, also notes that is difficult to recognize a great hacker without working with one.

So who are the great hackers? How do you know when you meet one? That turns out to be very hard. Even hackers can't tell. I'm pretty sure now that my friend Trevor Blackwell is a great hacker... But when I first met him, I thought he was a complete idiot.

Paul says its also difficult to recognize one's self as a great hacker. Naturally, every one thinks himself to be at least a very good developer. How do you know for sure?

Someone at Reddit asked "How do you know if you're a good programmer?" Interestingly, one replied that only good programmers ask themselves that question. Another commenter, BigTech replied,

To me, a good programmer optimizes code--they can get more done with less code, for example:

If X = 0, then X = 1 else X = 0 End If

becomes X=ABS(X-1)

By BigTech's own definition, he doesn't seem like a good programmer. (He also uses VB--a warning sign.)  His expression could be optimized down to 1 - x or even !x. Apparently, BigTech mistakenly believes he's a good programmer; he didn't realize that he defined a function over two points, which could be directly expressed as a linear function. I also disagree with his optimization theory, but then again Steve Wozniak spent "his whole life trying to optimize things."

BigTech's overassessment of himself reminded me of Chris Sell's post on how to recognize expertise. (Apparently, Chris didn't have a mirror at home.) Chris later mentioned an article called "Unskilled and Unaware of It. How Difficulties in Recognizing One's Own Incompetence Lead to Inflated Self-Assessments," which Alan Bellows has a good summary on:

When asked, most individuals will describe themselves as better-than-average in areas such as leadership, social skills, written expression, or just about any flavor of savvy where the individual has an interest. This tendency of the average person to believe he or she is better-than-average is known as the "above-average effect," and it flies in the face of logic… by definition, descriptive statistics says that it is impossible absurdly improbable for a majority of people to be above average. Clearly a large number of the self-described "above average" individuals are actually below average in those areas, and they are simply unaware of their incompetence.

The inability to recognize one's incompetence is not unlike the experience of being color-blind. [2] [3]

Just as we might overestimate ourselves, so too we might overestimate the abilities of others we regard as great. John Rozewicki argues that we turn great people into superhumans, but in reality no one is a genius. (How else can three housewives outsmart computer engineers, rocket scientists and NSA?) Guardian Unlimited writes how Einstein struggled with his grand theory - and the maths. Einstein also had a smaller than average brain, which probably doesn't mean anything. Another research article, "The Expert Mind," suggests that anyone can become a genius. Even for child prodigies like Mozart, Gauss, and Bobby Fischer, heavy labor spanning over a decade was required to prepare each genius.

Stevey, who equates greatness with fame, writes "get famous by not programming." His programming heroes, he realized, become heroes either because they were more vocal by writing a popular book or blog or were associated with a popular product, neither of which necessarily equate to being a good programmer.

Do you have any programming heroes? I do! Oddly enough, though, I've never really seen much of their code. Most of the famous-ish programmers I respect have actually made their impact on me through writing, and it's usually just prose, with maybe a little code interspersed....

.... But when someone builds a framework — any environment that we live in and actually enjoy programming in — and there's one person who's chiefly identifiable as the primary author of that framework, then I think we tend to admire that person, and unlike other programmers, the person starts to become famous...

Even if they're a crappy programmer.

Joel, I am pretty sure, considers himself a great programmer.  The post suggests that he may just be famous because of the articles he's written and how he leverage that into book sales, product sales, and forum activity. As for whether he's a great programmer, he's has the same natural tendency to inflate his worth as any other developer. He could simply be tooting his own horn, which is larger than everyone else's. Is FogBugz really that technologically impressive? Well, Joel does have one attribute of genius mentioned by Bertrand Russell.

One of the most important elements of success in becoming a man of genius is to learn the art of denunciation. You must always denounce in such a way that your reader thinks that it is the other fellow who is being denounced and not himself; in that case he will be impressed by your noble scorn, whereas if he thinks that it is himself that you are denouncing, he will consider that you are guilty of ill-bred peevishness. Carlyle remarked: ``The population of England is twenty millions, mostly fools.'' Everybody who read this considered himself one of the exceptions, and therefore enjoyed the remark.

In another ironic twist, Bertrand Russell also says that one must delude himself (a common refrain of mine) in order to become a genius.

Ignore fact and reason, live entirely in the world of your own fantastic and myth-producing passions; do this whole-heartedly and with conviction, and you will become one of the prophets of your age.

If we are to delude ourselves, how do tell if we're geniuses or simply incompetents with inflated views of ourselves.

September 07, 2006

High Level Languages & Performance

I suspect that the performance advantage of procedural languages does not necessarily exist over newer exotic languages. Over time, this should the performance gap should lessen. An ecosystem of processor features, developer tools, and algorithms have evolved over the past quarter century that favor the Algol-based languages like C and C++. Fortran once regularly outperform C, probably in part, because it was developed and popularized much earlier.

A large part of performance advantages in a language is due to the quality of the compiler and tools , which is usually dictated by the popularity of a language, its commercialization, and the resulting positive feedback of revenue into the compiler's continued development. New languages are perpetually at a disadvantage as a result.

Logic programming (primarily, Prolog) has always been regard as slower, but there have been a few efforts to make improvements in this area. The Aquarius project aimed to implement a fast compiler for Prolog; the researchers presented their results in Can Logic Programming Execute as Fast as Imperative Programming? Researchers of the Mercury language attempted an even faster industrial strength implementation with a more general logical programming language that supports full first-order predicate logic, not just the Horn subset, and recently added constraint logic programming. Developers can provide determinism modes for each predicate defined, and the compiler will reorder terms appropriately. The compiler automatically recognizes loops and performs aggressive inlining.

New chips are typically targeted at the prevailing language of the day. One article, Next Generation Stack Computing, describes how stack-based chip architectures are superior in performance to conventional register-based cousins in a number of important areas such as faster procedure calls and reduced complexity (short pipeline and simpler compilation) but was accidentally missed in an expensive detour by the computer industry. The approach reminds me of Intel's original design for its 64-bit chip, which Intel forwent for greater 32-bit compatibility. I wonder if such an approach might actually have favored functional languages.

David Chisnall of InformIT debunks the Myth of High-level Languages being slower. Nate criticizes Dave's assertion that high-level languages like Java provide more information to the compiler to do its magic. I think that Dave might have been closer to the truth if he had expanded his discussion to declarative languages with much more optimization opportunities.

Even with the advantages of being first, low-overhead and low-level languages like C and C++ are still being challenged by languages that ought to be slower (see the Computer Language Shootout). C# and Java use the combination of dynamic JIT compiler with runtime information and a very efficient garbage collector with a superfast allocator optimized for locality. Strict functional compilers, like OCAML, sometimes outperform C++ in benchmarks. Recently, Haskell are able to use its laziness to avoid wasted work. New multicore processors tend to favor declarative languages which are more easily parallelizable.

I haven't even touch on the debate of whether relative language performance is important any more or not. Dynamic languages like Ruby and Python are proving to be especially popular and acceptable performance-wise for delivering commercial GUI applications.

Model-View

Last April, I wrote about a profound design change I was making into my document-based applications, which was to make all my document data structures immutable--standard practice in some functional programming languages.

My original motivation was to make each of my model objects an expression that I could work with in a functional way and apply transformation rules to. (My first attempt a few years ago failed because I earlier assumed that mutability was required for editable documents.) The result is substantially less code and tight integration between my document and my AI framework. From the user's point of view, the application would appear more intelligent, permitting more complex document analysis and manipulations than seen in the past.

The document essentially becomes a value. The ease of changing a document in this fashion is comparable to using a TextBox control, which is simply accessed and modified by getting and setting the Text property of the TextBox control.

There is an additional cost of path copying from the root of the document tree to the changed node, but this is more than offset by a number of other direct benefits such as trivial multithreading, elimination of undo work, and other simplified algorithms (eg, easier document comparison/merging). Maintaining an undo log is actually expensive, when the document changes are high during an operation. My algorithms tend to be simpler because of the immutability variant and the access to the before and after state of the operation. My data structures became simpler, because they only need to be descriptive; anything extra is typically caching computed information. I no longer needed to store parent or other information, which enables extensive sharing.

There are some complications. I have lost object identity and each node has to be the document tree now needs to be referred to by its path, but the path can be encapsulated inside its own object and it turns out to be rarely need. Object identity presents problems when persisting, deleting, copying or pasting data (especially multiple times). If identity is really important, the node should contain an ID.

I also don't store an observers collection; I don't need to. After an entire operation is completed, I present the root view with the document before and after an operation. The view proceeds recursively downward to update its own nested views.

Intuitively, this approach makes sense. The model should be pure data, just as a string is in a TextBox ; it shouldn't containing any references to any view. The view should be able to rebuild itself based on the new and old states of the data. If a recently document node contains many children, there is a logarithmic time approach to compare before and after state of document node and find the range of children that were changed that I mentioned in a previous post.

I recently also came to the conclusion that my view data structures should also be immutable. In the course of researching the impact of changing the view, I noticed that a number of special purpose classes would no longer be needed resulting in a substantial decrease in code. I also discovered that I could dispense with a parent pointer and a reference to the model object in the view; these two could be externalized and provided by the parent. This also enables flyweights as well as virtual views, where a rendered model node is not actually backed by its own view node.

The changes are likely to improve performance because the modest costs of path copying are offset by the elimination of additional other processing (and accompanying allocations) and the impact of heavy sharing. The other processing includes layout, notifications between the model and view, and initialization/disposal code. Perhaps immutable objects could make Avalon less complex; however, Avalon might need to forgo the naturalness of property assignment.

This reminds me a blogger post that design-patterns aren't needed in functional programming, because they exist to make up for deficiencies in procedural languages.

Software Design Philosophy

 A year ago, I wrote that I would soon describe my software design philosophy in developing commercial desktop applications. I never actually did write directly on my philosophy. I still had unresolved design issues, which I have since resolved through making document and view objects immutable. I also felt that it was perhaps too early to disclose some information, especially after recalling Joel Spolsky's article "Mouth Wide Shut."

On the other hand, I did offer glimpses of my approach through various seemingly random posts like Silver Bullet and Software Reliability. I also did mention a preliminary wordprocessor I was working on, which, in reality, is a universal document-processing framework that can simply and efficiently store, manipulate, and display all kinds of documents. Despite Avalon architect Chris Anderson's wishes, don't count on Microsoft delivering full-featured application frameworks that could potentially make it easier for competing products to be developed; the MFC framework included a special restriction against developing wordprocessors and spreadsheet software.

My philosophy includes programming in a more declarative and functional programming style. My software is built on the unifying concept of immutable expressions (rather than objects) for representing data, be it mathematical, linguistic, or document. The term-rewriting system for manipulating expressions was inspired by a similar system in Mathematica. The system, which is rooted on the simple, powerful, and well-understood theory of lambda calculus, leads to surprising possibilities by removing the distinction between code and data. (More on that shortly.)

I noticed this article "Out of the Tar Pit," which makes similar points to the ones I raised in my Silver Bullet post. A Turing Tar Pit is an overly-simplified, Turing-Complete language in "which everything is possible but nothing interesting is easy." The classic example is the taped-based Turing Machine, but the authors of the article also implicate imperative programming languages, which are still modeled on a Random Access Turing Machine, which introduces accidental complexity due to state. The authors instead advocate functional relational programming.

I left as a software developer in the applications group at Microsoft in 2000, but I never actually stopped developing desktop applications. I did go through a period of unlearning development approaches used at Microsoft and much of the rest of the computer industry. Imperative programming as well as the evolutionary programming development are fundamentally flawed ways of developing document-processing applications; their continued use hinders the development of intelligent applications.

Programmers in 60's discovered maintainability issues with "goto," and came up with structured programming with decorated gotos with names like "while" and "switch;" this made programming more readable but not any more expressive. We still also live with the word-by-word "assignment bottleneck." As a result, most of the operations in today's applications can be summed up as fundamentally property assignments and vector operations (insert, delete, map) rather than more complex tree transformations.

I noticed that some thinkers at Microsoft are contemplating some of the same issues that I am addressing. Tony Williams, co-inventor of COM, in a recent Channel 9 interview mentioned his interest in the declarative construction of applications. He also referred to Charles Simonyi's obsession with tree transforms, which mirrors my use of "expressions" and transformation rules as unifying concepts in my application. The data access team also refers to a "conceptual layer" made possible by the LINQ technology. (Tony mentioned he was currently working on a new component application framework, debuting in Office 2007. Googling reveals that the framework is probably built on top of COM and going by the name "Octarine." If so, I am pessimistic of any real improvements outside of some reuse. )

June 21, 2006

Direct UI

Direct UI is the notion of treating a document window as more than just a WYSIWYG display. This includes treating the window as a control surface, inserting various markup, treating document elements as controls, and applying transforms/animations that allow the elements to momentarily transcend their document layout. It’s about making the document come alive.

It’s something that I thought heavily about in building my next document-based application. It’s also one of the reasons why I opted out of using any of Microsoft’s text-editing solutions, because of poor layout extensibility. Avalon text may support render transforms and animations of text, but those options are still too limiting. My NStatic tool also includes some direct UI, which has yet to be revealed, in order to present a more effortless user experience.

In Office, we see documents as a limited control surface with form controls, smart tags, Excel’s drag handle and Office 2007’s floatie. Visual Studio offers a variety of Intellisense controls. Office also includes markup for spelling errors, comments, and change tracking. With the possible exception of PivotTables and Table structures in Excel, document elements, however, are still bound to their layout and don’t typically exhibit control-like behavior. Such a feature typically requires a good understanding of the underlying document.

Refactor Pro probably makes more extensive use of direct UI than any other non-image-processing application. It includes extensive use of markup, in-document controls, and animations.

621refactor

Designed for refactoring, Refactor Pro needs to acquire a fuller understanding of the user’s documents in order to relieve the user of tedious brain work. This also describes the smart applications that I am building. As applications become smarter, they will make greater use of markup and in-document controls.

Why do controls have to be alongside documents, you ask? It’s about moving the operation closer to the operand, eliminating the multi-step process of selection and command searching, interacting in an instantaneous and anticipative way.

I have also written posts about 3D user interfaces as well as a 3D framework that I built on top of GDI+.  In addition to increasing realism, 3D interfaces also serve a major function. As more controls accompany documents over time, three-dimensional graphics will allow these controls to produce less clutter as rotation and perspective transformations tend to reduce the amount of screen real estate used.

June 13, 2006

Software Reliability

Based on a manifesto by Kirk Glerum, Office started thinking about reliability around 2000. As a result, a new improved Dr. Watson identified application crashes and hangs and relayed errors back to Microsoft, so developers could fix the most common errors. One statistic has that 50% of crashes were caused by 1% of the bugs. Office also added document recovery, in which documents are periodically auto-saved. Documents may actually be saved in the midst of a crash, risking potentially corruption. More recently with the open XML formats, these new file formats were designed to ease recoverability and limit corruption. Additional features in Windows Vista now support shadow copies, etc. 

This approach to addressing reliability is rather crude. A more thorough approach is to address the issue in a more fundamental way like using a different programming style. Such a thinking has lead me to the use of a more functional style of programming, characterized by onetime assignments and persistent data structures. With mutable data structures, any single misplaced or incorrect assignment could introduce inconsistency into the system. While it is also possible that functions can also return incorrect values, the lack of side effects keeps errors isolated from the rest of the system; proper sequencing is not an issue for function calls as is for assignments. When an exception occurs, I do not have to undo any state changes. I can simply forget about the work that lead to the exception and revert to the last known good copy of my data. My application no longer needs to abort, since I traveled instantly back in time.

Erlang is one of those hot new languages right now. It’s declarative, functional, and concurrent. Damien Katz says that reliability is why he chose Erlang, citing an article on Erlang in Byte.com.

Meanwhile, back at Ericsson, some Erlang-based products that were already in progress when the "ban" went into effect came to market, including the AXD 301, an ATM switch with 99.9999999 percent reliability (9 nines, or 31 ms. downtime a year!), which has captured 11% of the world market. The AXD 301 system includes 1.7 million lines of Erlang: This isn't just some academic language.

Reliability is a major design philosophy as revealed in this thesis on Erlang, “making reliable distributed systems in the presence of software errors.” (The powerpoint version.) The programming model espoused by Erlang makes extensive use of “fail-fast process,” also known as “Let it crash!” When a function encounters an error, it cannot handle, it should simply fail, in which case the caller, which is using the original copy of the state can decide whether to restart the function, attempt a simpler routine, or itself fail.

The process approach to fault isolation advocates that the process software be fail-fast, it should either function correctly or it should detect the fault, signal failure and stop operating. Processes are made fail-fast by defensive programming. They check all their inputs, intermediate results and data structures as a matter of course. If any error is detected, they signal a failure and stop.

Damien Katz more fully expounded on the Erlang approach to handling exceptions in his popular post on “Error codes or Exceptions? Why is Reliable Software so Hard?” He writes about the dangers of “Reverse the Flow of Time” error handling, made necessary by the use of mutable data or “destructive update.” This problem is more insidious if an exception occurs during the exception handling.

The better approach is not to “undo your actions, [but] just forget them,” an approach not natively supported by any of the popular languages. By eliminating destructive updates, actions do not need to be undone, just forgotten. Damien first points to two approaches to achieving this are to (1) make a copy of everything up-front (low-tech but expensive) or (2) make objects immutable. Alternative, one can keep keep object mutation to a single operation in isolation—atomic update; here, undoing is limited to a single operation.

The ideal approach is to use a functional programming language, for which Damien recommends Erlang:

Erlang, which started me thinking about these issues, is a functional programming language that gets reliability right in a simple and elegant way that I think is fairly easy for an experienced OO programmer to pick up (you don't even have to learn about monads). Erlang is different in that it's far more dynamic and "scripty" than other functional languages (it even has an interactive command line mode), making the development process more incremental and approachable.

Erlang is marvelously beautiful in the way it meshes the concepts of immutability, messaging, pattern matching, processes and process hierarchy to create a language and runtime where extreme concurrency and reliability means adhering to a few simple design principles. Someday I'm going to explain the whole Erlang development philosophy and why it's so damn awesome.

Finally Damien ends with the same conclusions I made about software reliability.

The bigger problem in software reliability isn't how we communicate errors, it's the state we are in when the error happens. Any attempts to "Reverse the Flow of Time" in code are bad. Avoid it. Instead convert your code to avoid mutations and use "Get the Hell out Of Dodge" error handling instead. You'll thank me later.

If we look at how Office does document-recovery, it is essentially through copying the data by “Auto Saving” at regular, short intervals. In the threading space, a newly proposed method for replacing locks, called “software transactional memory,” works by copying the original values of the data structure to be changed; despite the overhead of copying, this lock-less approach actually outperforms and scales better than locks on multiprocessors.

May 26, 2006

Software Built to Replace a Human

I just noticed this blog post on “software built to replace a human” referred to by someone in Joel’s forms. That’s the whole raison d’etre for my software company, SoftPerson — hence the name.

Currently at work I'm writing some EDI (electronic data interchange) software that I know for a fact will perform a job currently done by a human being. Whether this person will be assigned to a new duty or laid-off I don't know, though it is a little disconcerting to think about the later.

There must be many, many developers out there working on software that will replace the job of a human being; how do they feel about it?

personally I'm not going to lose sleep over it (I don't get enough as it is!), though I can't help but feel a slight "moral twinge". I am firm believer in survival-of-the-fittest and I'm not about to give up my own job so someone can keep theirs, but at the same time I still have a sense of morality.

I don’t feel any moral twinge. If I do my job well, perhaps some people will lose their jobs, but many more people will be able to do things they never reasonably expected to before.

(Personally, in Excel, when I was helping develop new OLAP functionality into Excel PivotTables, I was well aware and feeling somewhat guilty knowing that many OLAP companies would lose business with free functionality that came with the combination of Excel 2000 and SQL Server Analysis Services. I am over it now, though.)

May 13, 2006

Functional Programming

In my quest to incorporate more declarative programming techniques (especially, from functional and logical programming) in my applications and eliminate all the cruft left behind from traditional imperative programming, I found a number of items on the web.

There are some in the blogosphere that are newly converted to functional programming. Larry O’Brien wrote “Premature Non-Functional Programming the Root of All Evil?” Bruce Lewis wrote that functional programming is “the Programming Style that Saved my Marriage.”

For those who miss their beloved assignment operator, I found this quote from a Lambda the Ultimate archive.

What if an object deep in the hierarchy must be updated?
For example, when processing an XML DOM, an attribute of an XML node many levels deep may be changed. How is that supposed to be handled with referential transparency? does the whole tree need to be copied?

It's one thing that I have not understood how to handle in functional languages. Does anybody have an answer on this?

Yes. You copy the whole tree, but because referential transparency allows you to share implementations, you only reconstruct the parts of the tree which have changed. This means that the path from the root to the changed node is the only thing that's different, which means changes cost you about log N in the size of the tree. Still more expensive than in-place update in terms of space cost, but I believe that SVN is more or less implemented this way, and it seems to work just fine for that.

The reason why I like functional programming.

  • Conciseness. Functional programs tend to be a fraction of the size of imperative programs.
  • More Reliable. Functional program more closely resemble the specification, are more understandable, and tend to work the first time. Values are always internally consistent. Subtle bugs are rare.
  • Persistence. I always have access to previous versions of my data from which to compare with my more recent data.
  • Optimization. There are more opportunities to eliminate infrastructure taxes, allowing the computer to do more of the work that traditionally programmers would do by hand.
  • Lazy Evaluation & Lazy Data. This is the ability to delay the evaluation of arguments of a function until they are needed.

With functional programming, suddenly all the cruft I use to deal with disappears. I also think that functional programming is much better for reuse, which is the theme of this article on the Rise and Fall of Object Orientation. Composition’s impact on reuse is multiplicative, whereas inheritance’s is merely additive, which is why features like generics (which are functions on types) and LINQ.are so expressive.

Member functions (or methods, as they are called in other programming languages) are a fundamental aspect of OOP. Think for minute: What could be more logical and properly designed than a member function that is familiar with the intimate details of its object?

Obvious advantages not withstanding, member functions are disfavored in state-of-the-art C++ libraries. The problem is that a member function is useful only for a single class. This is the bane of code reuse. Think, for example, of a library of containers. If each container class defines its own size() member function, you will end up writing numerous size() functions that basically do the same thing.

Don Box captured my sentiment yesterday, “if only I could wish away a lot of the stuff that distracted/stunted the industry back in the late 80's and 1990's, but I can't.” All the new innovations we are seeing in languages herald back to the earlier era of functional programming.

The naysayers say that functional programs are less efficient. I don’t think functional programs need to be slower or require more memory. A functional representation of a document may be more compact than a traditional one, because recurring elements within the document are shared. For example, in my application, text is represented in terms of sentences, words, and paragraphs, better suited for functional manipulation. The variable cost in memory of entering a new word in such a document is 4 bytes (a shared object reference) whereas it would be about 12 bytes in a document organized as a character stream (where an average word consumes 5 characters and one space).

I picked up on a few resources on the Web on efficient functional data structures. I noticed this thesis by Chris Okasaki on Purely Functional Data Structures, who also has a book at Amazon of the same name.

I also discovered treaps for efficiently implement large lists—immutable or not. Treaps which combine trees and heap to create randomized balanced binary trees. Treaps are simpler to write and understand than other structures I use like red-black trees, splay trees and skip lists and run just as fast. Why haven’t I noticed them before? I looked at my Cormen’s Introduction to Algorithms text and realized that they were hidden in the problems section at the end of chapter on Red-Black trees. They are my new favorite data structure.

May 11, 2006

Wordprocessing

I can’t spend too much time on a problem, or I become bored and unproductive. Sometimes, I alleviate this tendency by frequently switching from development to testing mode. Periodically, I switch temporarily to some other long-term project of mine. I don’t engage in non-software recreational activities, since I am not currently earning an income and would not be making any progress on my long term-goals.

So, I briefly turned my attention to an old wordprocessing project of mine. As I mentioned before, I am looking at building a graphical code editor on top of a wordprocessor backend. The goal of the code editor exercise is to simultaneously build and sell an interesting product while bringing my wordprocessing engine to maturity.

511wordpro

I am not trying to compete directly with Microsoft, although, if I were and could only muster “just” a few million dollars in annual sales, that’s still enough income to support a very comfortable lifestyle for me.

I refused to believe that current set of desktop applications put out by Microsoft defines the whole space of possible, highly useful applications. Ten times more money is being poured in the building games like Oblivion instead of new general-purpose applications, because companies don’t want to be Microsoft’s next meal, so games are where all the innovation is happening.

Text handling is likely important in many future killer applications; current applications today rely on the limited capabilities provided by MSHTML and the RTF control because building a wordprocessor is too hard.

I need advanced text-editing functionality to build the type of application that I envision with advanced natural-language processing and AI capabilities.  I find existing controls too limiting, even the upcoming Avalon RichTextBox control, which will have better support for international text, accessibility, etc than I can ever hope to provide, but is not extensible enough to provide for my very specific and advanced needs. Internationalization is less important for me, since my natural language processor only works with English.

In building my text engine, I must be careful to avoid the beaten path that Microsoft and other companies have exhaustively followed in developing their various text controls and applications. By trying another angle and focusing on disruptively innovative capabilities, that are architecturally prohibitive to replicate in today’s software, I am more likely to be successful. Such techniques include employing a more functional programming style and representing text in terms of natural language elements rather than raw characters.

 

April 29, 2006

A Modest Design Change

I have recently had a lot of success programming in a certain nontraditional style at a small scale. Programming in this style has numerous advantages, such as better support for multithreading, less bookkeeping, simpler code. 

This programming style, though, is completely impractical at a global level. Or is it? Functional programming languages are actually based on this style.

The idea is that all document data structures are immutable; data is instead copied or composed. Copying data in an immutable world is very cheap, because deep copies are equivalent to shallow copies since all references are sharable. Equality tests are also cheap, due to a couple of quick comparison tests on identical references and cached hashcodes.

The simplicity of this approach carries over to advanced algorithms built on top of the immutable data. For example, a trivial example of an undo implementation is to stash away the former copy of the document.

While immutable data solves or alleviates a lot of problems, it introduces one huge problem (reminiscent of trade-offs introduced by garbage collection) by eliminating the efficient assignment from the programmer’s toolbox. I do think that this problem is solvable, and that the collective advantages of immutable data may actually outweigh the inability to directly modify data in-place.

February 26, 2006

Use Patterns

The next version of C++0x introduces a new feature called “concepts” to improve template metaprogramming. Concepts are based on “use patterns,” a powerful yet simple technique for compilers writers to enhance a programming language easily without adding new syntax.

concept Mutable_fw<typename Iter, typename T> {
    Var<Iter> p; // a variable of type Iter.
    Var<const T> v; // a variable of type const T.
    Iter q = p; // an Iter must be copy-able 
    bool b = (p != q); // must support ‘‘!=’’ operation,
                       // and the resulting expression
                       // must be convertible to ‘‘bool’’
    ++p; // must support pre-increment, no
         // requirements on the result type
    *p = v; // must be able to dereference p,
            // and assign a ‘‘const T’’ to the
            // result of that dereference; no
            // requirements on the result type
};

A C++ concept indicates what operations must be supported by a type simply by its use within the body of the concept. The body of a concept is syntactically identical to that of a method. The compiler reuses the same rules for parsing methods to parse concepts and automatically infer constraints on types. The compiler defers processing of steps such as type checking until the type deriving from the concept is known. Once the type is known, the compiler confirms the type matches the concept by applying type-specific conversions (both inherited and custom) and method resolution.

The alternative of constructing explicit signatures leads to combinatorial explosion in the possible ways a compiler can match an expression. The C++ paper refers specifically to the various ways in which the plus operator can be invoked by a compiler.

+ // built-in operation for X
X operator+(X,X);
X operator+(const X&, const X&);
X X::operator+(const X&) const;
X X::operator+(X);
const X& X::operator+(const X&);
const N& operator+(const& N, const N&); // X converts to N
X operator+(X, N); // N converts to X

“Use patterns” reduces the complexity of compiler writing, but also benefits language consumers by providing a compact, natural and direct syntax with no new syntax or learning involved.

I have been relying on the technique of “use patterns” in my AI and natural language work, including my rules language in NStatic.

Indeed, it’s a common technique used in symbolic programming, which templates somewhat resembles because of the way functions contained within templates are invoked by name. Unification and pattern matching in Prolog(and other languages are fundamentally based on the notions of “use patterns.”

In my natural language work, the definitions of new words are automatically inferred from usage in a few different ways.

  1. From context. The meaning is automatically inferred from its use within a sentence by analyzing the semantic dependency graph generated from a parse of the sentence.
  2. From natural-language definition. Alternatively, a person (usually a template designer) can define a word using a plain, simple natural-language definition of the word, and the relationships and properties of the word are automatically inferred from the possible usage of the definition itself.

“Use patterns” in natural language programming means that the user never needs to know a special syntax, just English.

January 27, 2006

AI Hubris

Chris McKinstry, a researcher in artificial intelligence, recently committed suicide, after posting suicide notes in his personal blog and a discussion board on Joel on Software. (Ryan Park has more details.)

Joel’s discussion board, which I read regularly, provides very useful information for software development and marketing. The suicide notes Chris left were in Joel’s little known off-topic discussion board ?off. This action directly led Joel to shut down the uncensored board to squelch “psychopathic” behavior as he wrote:

I used to have, hidden so deeply that almost nobody found it, a discussion forum on this site colloquially called ?off. It was the official off-topic forum and was virtually uncensored. It was foul, free-wheeling, and Not Safe For Work. It generated quite a few severely antisocial posts, which were sometimes funny.

Anyway, it wasn't really appropriate. It didn't have anything to do with software, I didn't participate, and there was no compelling reason to host it on my servers. Over time, the number of reasons to shut it down increased. Today, the last straw was broken when one of my actual friends (you know, a real-life person) told me that the discussion group was getting downright disturbing. Some of the participants in the group had probably crossed the line from common obnoxious online behavior to downright psychopathic behavior. In a discussion group which prides itself on "anything goes," this was impossible to control.

At 6 pm today, I closed that discussion group, having learned an important lesson about anarchy.

What made an impression on me was that Chris founded Mindpixel, a Web-based collaborative artificial intelligence project that accumulated a database of human facts, similar to both Cyc and ConceptNet.

Chris was even interviewed by Slashdot for his ambitious project. In the interview, he made some remarkable statements.

My primary inspiration for the project comes from observation: I observed that computers are stupid and know nothing of human existence. I