About

I am a software developer in Seattle, building a new AI software company.

Ads

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Recent Posts

Ads


« October 2007 | Main | December 2007 »

November 24, 2007

Triple Nine

Sorry for this self-indulgent post. It will probably be of interest to only 0.1% of you.

I have been interviewing high schools students applying to Harvard College for the past three years. There are twice as many applications today as there were in my day, and the admittance rate has correspondingly dropped by half. None of the applicants that I previously interviewed were accepted, and only one, the top student in a class of 600, was wait-listed. All of the applicants though were very talented and qualified as if the weak students self-selected themselves out. In my past MBA life, I have also read applications for the business school with a similarly low acceptance rate and only one of the 15 applications I examined was accepted.

A new student I just interviewed is promising... She has all the right ingredients and knows how to market herself. I googled her on the Internet and found a web trail of achievement starting from middle school. I probably connected with her because of her perfect ACT score, which she claimed only one in the state and 22 in the nation. I checked this statistic on the ACT website and there are actually 500 perfect ACT scores (or 1 in 4,000) in the nation, so the rank was probably just on the instance of the test.

I scored the equivalent of a perfect 1600 in today's SAT, which bests Bill Gates's own 1590. Before the 1995 recentering of the test from an average score of about 900 to 1000, the SAT was scored more stringently with an average of seven 1600s out of over a million in the nation per year (getting a perfect score then made the news); nowadays, it is closer to 700. I also scored a 790 out of 800 in the GMAT, which was the single highest score of my MBA program in my year and subsequent years (except for the most recent year in which an 800 was recorded).

My high school maintained anonymous scores for the past four years, and my score was the highest among a total of 1,600 student's across all those years--an outlier among outlier scores-- despite my school admitting the top third based on a competitive examination. It also was substantially above my Harvard class average.

With a 99+ percentile ranking for both sections of the test, it was clear to me that I made a triple nine (99.9% or 1 in 1000 in my composite score and probably in my individual ones as well).  A triple nine is equivalent to a IQ of 149 (std 16); a double nine, 137.

I had taken a college statistics course at Columbia University during my last year of high school, and attempted to see if I made a quadruple nine (99.99% or 1 in 10,000; corresponding to 160 IQ) by computing a percentile from my composite score by assuming a normal distribution and estimating the variance and correlation of math and verbal scores. A normal distribution was a fair assumption because of how questions are "normed" from similarly distributed populations from past tests. I learned that was right on the threshold of a quad, but that the result was extremely sensitive to my estimates.

I decided after the interview to use the web to conduct my research, which was not at my disposal in 1990. Unfortunately, the recentered perfect score tops out at 99.98% or 2 in 10,000 (the original SAT topped out at 99.9995%), so I have to use my original scores.

I decided to look up the qualifying scores for various intelligence societies for the elusive 1 in 10,000 indicator. I was never really impressed with Mensa, because it used scores at the 98th percentile, which are below the average scores of the top public and private schools in the nation, but there are several intelligence societies with more stringent limits.

The intelligence society, Mega Society, which takes one in a million, obtained from an ETS statistician an actual histogram of SAT scores from the five year period between 1984-1988, of which one of the scores is my own, from which I can calculate my actual exact percentile. The process unfortunately is a bit tedious and not worth that much to my ego.

Fortunately, I can let others do the work for me. The Triple Nine Society publishes qualifying scores for various tests. I meet the bar for triple nines for both the SAT and GMAT, each by a wide margin. There is no society which admit members exactly at the quadruple-nine level. The Prometheus Society admits those who meet 1 in 30,000 or roughly 4 sigmas. I am lower than their cutoff but still within the range of statistical insignificance. (Fig 8.3.3)

It doesn't matter that my scores are high, since people still assume I am an idiot. It's also not really satisfying knowing that the tests are inherently flawed and not just for the limited material tested. For instance, I know many non-native English speakers, generally intelligent and gifted in math, feeding the bottom with dismal verbal scores.

Distractions, II

I just finished my move and am developing again. I was away from my computer and the Internet for over a week as I focused on getting the move behind me.

Right now, I have to make up for lost time. This move was unplanned and forced upon me by my rather unusual living situation and some opportunism. I lived in a house, which I sold to my ex-wife, for seven years.

I lost almost two months worth of significant productivity in anticipation and completion of this unnecessary move ahead of my product release.

November 14, 2007

Smart Machines

Recently, I got a email from the reader...

You are only about money, yes? You want to start a new company for research and development into AI technology.. For the money.

What else? How else could anyone be so ignorant of consciousness? It is money the blinds.

I searched my posts related to consciousness, but only uncovered this one on Will Machines Become Conscious?

We are naturally resistant to the idea that machines can think because of society, religion, or maybe our natural desire to be special--to be center of our universe. A good illustration is the AI effect, where people discount advances in AI as AI after they have been accomplished. "AI is whatever hasn't been done yet." Wikipedia sums up:

This change of perception can be traced to the mystery being removed from the system: that being able to trace the cause of events implies that it's a form of automation rather than intelligence. Michael Kearns suggests that "people subconsciously are trying to preserve for themselves some special role in the universe".[1]. By discounting artificial intelligence people can continue to feel unique and special.

This tendency occurs with our views of animals, women, and other races as well. Psychologists joke, that throughout history they have often tried to defined mankind as the "only animal that can..." feel, think, reason, plan, reflect, and so on--only to narrow their assertions in the wake of new evidence such as animals with human-like abilities like the sign language monkey or the super smart parrot. (We also know that children raised in the wild lose many of these same abilities).

A related effect has been noted in the history of animal cognition and in consciousness studies, where every time a capacity formerly thought as uniquely human is discovered in animals (e.g. the ability to make tools, or passing the mirror test), the overall importance of that capacity is deprecated.

I think, in order to make advances in technology such as producing smart software as I am doing, one must abandoned preconceived human-centric notions of the world.

I sometimes think that AI research in the large companies are compromised by coworkers and executives (who control funding and project goals) who fundamentally don't believe in computer intelligence. One female pioneer, who invented COBOL, was criticized for her attempt by an executive and warned that “computer programs could not understand English." Nowadays, it's companies diverting natural language research to search technology.

In order for Alan Turing to envision computing, he had to believe that a machine could think. It may have been easier for him to take such an outsider’s perspective because his homosexuality at the time was taboo and considered a mental illness. He may also had to contemplate that the mind was also machine. As Jenna Levin  illustrates in her book A Madman Dreams of Turing Machines, Turing had a controversial mechanistic view of the human mind: (Via Lispmeister)

The human mind can also be reduced to a machine. This idea drives all the others as he runs on grass, past trees, over bridges, through cattle. States of mind can be replaced by states of the machine. Human thought can be broken down into simple rules, instructions a machine can follow. Thought can be mechanized. The connection isn't perfectly clear, but it is there, the catalyst of a great crystal. It is not just that thought can be mechanized. It is mechanized. The brain is a machine. A biological machine. The idea cools him from head to toe, a wave of understanding washing clean his confusion, his muddled notions, and his breath. Shock feels like this: There is no sky or earth. No time, no meaning. It's a throb—a hard silence, a pulse. It is colorless, tasteless, senseless. A white-hot explosion[…]

At the age of twenty-three and for the rest of his life he embraces, without reservation, a mathematics that exists independently of us—although we, by contrast, do not live independently of it. We are biological machines…. bound to mathematics and mathematics is flawless. This has to be true.

While the above was a fictional example based on facts, we do have an AI pioneer, Marvin Minsky, who shares thoughts more explicitly on whether computers can think, understand or even be conscious; he even questions whether humans are self-aware.

But if mind is machine, where does that leave free will? Like another AI pioneer, John McCarthy, I am a compatibilist determinist, which I think all determinists really are anyway. I am a firm believer that free will is an illusion as is Scott Adams. One of my readers, after having lunch with me, asked if I believed in determinism as if he able to glean this from my many blogs posts. I hesistantly said yes, not wanting to stir up a debate. In fact, my very first college paper was on determinism for my English composition course; as if predetermined, another classmate also wrote on determinism but from a biochemical rather than physical angle.

The New York Times article, “Free Will: Now You Have It, Now You Don’t,” reported early this year of experiments suggesting conscious choice to be an illusion, even as some philosophers and physicists continue to disagree. The New Scientists also chimed in soon afterwards with “Free Will – you only think you have it.” Hofstadter has an interesting video, Victim of the Brain, containing thought experiments on identity, consciousness, and determinism.

Semantic Computing

Walter Stiers from the Academic Relations Team at Microsoft wrote that Microsoft Research is accepting proposals for Semantic Computing. His post caught my eye, because of what I am working on, which can basically be described as semantic computing.

I like the phrase used, "semantic computing." It reminded me of the another phrase coined by the Data Access Team "conceptual layer" to refer to the higher level of abstraction offered by LINQ and expression trees.

My main concern was whether Microsoft invading my turf and whether I would have to put up a fight. It was just a false alarm as Microsoft seems preoccupied with understanding the web to aid in search requests: The program is entitled "Beyond Search" and has two tracks, Semantic Computing and Internet Economics. Microsoft's intent is laid out explicitly in the actual program description, which is to improve search and then milk the advertising revenue: "The total spent by Internet advertisers in 2005 is estimated at $8.3 billion, a growth of 13.3% from 2004."

I noticed that the industry's natural language and other "semantic" efforts seem focused on narrow range of uses such as search engines and command-and-control. Search seems to be in Microsoft's vision because it is demonstrably monetizable and currently dominated by mortal enemy Google. The company seems wary appropriately of putting money on risky black hole projects with poor business prospects.

It's going to be a while before we see deep understanding in every application. Well, it already takes forever for new features to be added to existing applications, where fifty developers add less than fifty features over a couple years.

Supercompilation

I noticed work by SuperCompilers, LLC on supercompilation that takes on the myth of the sufficiently smart compiler (which I previously wrote in my post on Humans vs Computers). Their technology seems very similar to my work with its reliance on partial evaluation.

The company deals with the same sort of criticisms about NP-completeness and undecidability in their white paper "Supercompiling Java Programs."

The problem of fully optimizing an arbitrary computer program, when mathematically formalized, turns out be an incomputable problem, we do not need to optimize arbitrary mathematically given computer programs, we only need to optimize programs written by humans. And humanly-written programs tend to have a lot of special structure too them, including both obvious and subtle redundancies that lead to inefficiencies.(this is related to algorithmic information theory; see, Chaitin, 1987 and Bennett, 1986). There is no possibility of ever creating an algorithm that can fully optimize a general computer program. Fortunately, though, the practical problem of program optimization -- though still very difficult -- is a lot easier than the general mathematical problem to which it corresponds. Because, in reality

Supercompilation does not try to solve the impossibly hard problem of fully optimizing an arbitrary computer program. Rather, it is an extremely valuable heuristic technique, which seeks to find ways of significantly optimizing a wide variety of computer programs. Most real-world computer programs are chock full of inefficiencies, which can be removed via the supercompilation process. These inefficiencies are often hard to see in the source code, but they are highly transparent when the code is translated into a more mathematical form, as is done inside the supercompiler. Some of these inefficiencies are introduced by sloppy programming; others are placed there intentionally, by programmers interested in goals besides efficiency, such as maintainability and comprehensibility of code, or speed of software development. In many cases the most efficient way to implement a given piece of code is a truly humanly infeasible way, involving literally hundreds of times more lines of code than the ways that occur naturally to humans.

Another post "Illogical Arguments in the Name of Alan Turing" in O'Reilly's website criticizes bad arguments by people, that static code analyzers are not useful at all because they are limited by the halting problem, or that logic flags can never be found because of the halting problem.

Humans are subject to the same theoretical constraints as computers, but yet they do not perform the same laborious, brute-force search. Instead, humans employ a strategy of recursive pattern matching and replacement when thinking about and checking their programs. The kinds of programs that people write are those that they can easily reason about and review later; being maintainable, these programs belong to small subset of possible programs and are inherently tractable for analysis.

One paper "Solving difficult SAT instances in the presence of symmetry" suggests that the structure of a problem can affect its performance. That, in practice, a technique such as SAT-solving which is known to be NP-complete, often doesn't break down as it theoretically should. The authors point to the presence of large number of symmetries in the problem as somewhat of a requirement for explosive behavior.

Research in algorithms for Boolean satisfiability and their implementations [23, 6] has recently outpaced benchmarking efforts. Most of the classic DIMACS benchmarks [10] can be solved in seconds on commodity PCs. More recent benchmarks take longer to solve because of their large size, but are still solved in minutes [25]. Yet, small and difficult SAT instances must exist because Boolean satisfiability is NP-complete.

A 1994 Infoworld article quoted a researcher saying that deep static analysis requires years of analysis time. Upon reading that quote, I immediately thought that the researcher's approach might be very inefficient, because it implied that, after some depth, a human could outperform a computer on the same job. Perhaps, if we mimicked actual human reasoning, we could produce more effective tools.

November 13, 2007

What's Wrong With Reason?

Earlier this year, Slashdot pointed to a set of Flickr photos of someone’s visit to a newly built creationist museum in Kentucky. I have often assumed that many creationists live in households and communities, where access to information was heavily regulated. After looking through the photos, I discovered that many creationists actually do see the same things that non-creationists see, but simply reached different conclusions. Below are three photos from a set of about a half dozen juxtaposing human reason as a faulty device in opposition to God's Word.

image image  image

I guess it make sense, if one takes scripture as absolute, literal truth to think that reason, despite its necessity for understanding scripture, could also lead us astray if it contradicts scripture.

Roman Catholicism (the faith in which I was raised) regards the stories of creation as metaphorical and assures us that reason is fully compatible with truth. Other Christian denominations, lacking a centralized organization and authority, are more willing to accept a literal interpretation. The Koran, the holy book of Islam, even proclaims itself to be free of imperfection.

There were other museum photos touching on topics such as Noah's flood or reconciling apparent contradictions between the world and scripture. I was a bit stunned by the directness of the comparisons, which could potentially introduce doubt to faithful visitors by exposing them to disturbing arguments alongside their creationist explanations.

Maybe, I shouldn't be surprised as I have seen such direct confrontation before. I have, for some time, been following the Uncommon Descent weblog, which promotes William Dembski's ideas on Intelligent Design attacks the "materialist" beliefs of its principal antagonist and promoter of evolution, Richard Dawkins. One of Dembski's frequent claims is the impossibility of computer intelligence, or, indirectly, of my software. I don't agree with most of Dembski's arguments, but reading his posts does help me to recognize any of my own biases and flaws in reasoning.

November 11, 2007

The Awkwardness of Functional Programming

Both Reddit’s main page and programming subreddit includes a popular post “Admitting that functional programming can be awkward.” Each of these subreddits have elicited numerous interesting responses.

In it, James Hague recounts how a semi-successful Mac game he wrote called Bumbler is trivial to write in C, but that a purely functional approach appeared to be barely doable and may well be the wrong paradigm for these types of problems:

What's interesting is that it would be trivial to write this in C. Some incrementing, some conditions, direct calls to sound playing routines and insect spawning functions, reading and writing from a pool of global counters and state variables. For a purely functional approach, I'm sure the data flow could be puzzled out...assuming that everything was all perfectly planned and all the behaviors were defined ahead of time. It's much more difficult to take a pure movement function and say "okay, what I'd like is for this object to gravitationally influence other objects once it has bounced off of the screen edges three times." Doable, yes. As directly implementable as the C equivalent? No way.

That's one option: to admit that functional programming is the wrong paradigm for some types of problems. Fair enough. I'd put money on that. But it also may be that almost no one has been thinking about problems like this, that functional programming attracts purists and enamored students. In the game example above, some of the issues are solvable, they just need different approaches. Other issues I don't know how to solve, or at least I don't have solutions that are as straightforward as writing sequential C. And there you go...I'm admitting that functional programming is awkward in some cases. It's also extremely useful in others.

It’s not always obvious, perhaps because people often continue to think imperatively when finding an functional approach. Rather than thinking that functional programming is not very compatible with game development, there might be a radically different way of thinking about a problem. I have encountered one computer engineering thesis called “Functional Programming and 3D Games” which specifically deals with the topic.

I have been in a similar situation a few times, but I have always been able to come up with an elegant, often unexpected, solution after thinking about the problem from a high level.

In a functional pearl paper on the Zipper data structure, Gerard Huet writes about the apparent unsuitability of functional approach to many data structures, but then he proposed a new functional data structure called the Zipper to rectify these problems:

The main drawback to the purely applicative paradigm of programming is that many efficient algorithms use destructive operations in data structures such as bit vectors or character arrays or other mutable hierarchical classification structures, which are not immediately modeled as purely applicative data structures. A well known solution to this problem is called functional arrays (Paulson, 1991). For trees, this amounts to modifying an occurrence in a tree non-destructively by copying its path from the root of the tree. This is considered tolerable when the data structure is just an object local to some algorithm, the cost being logarithmic compared to the naive solution which copies all the tree. But when the data structure represents some global context, such as the buffer of a text editor, or the database of axioms and lemmas in a proof system, this technique is prohibitive. In this note, we explain a simple solution where tree editing is completely local, the handle on the data not being the original root of the tree, but rather the current position in the tree.

One of his examples of a tricky data structure—the database of axioms and lemmas in a proof system—hit close to home. I was trying to move my proof system to a more functional approach, which would offer me numerous benefits including immutability, automatic support for backtracking and shared copies as well as easy inclusion inside any other data structure. I did come up with a solution by utilizing a binary tree implementation with filters at each node, aggregated from each of the child nodes. These filters allow me to perform an arbitrary search in an efficient divide-and-conquer manner, whereby a subtree in which no child contains a desired property is left unexplored. It was faster and more light-weight than my previous imperative attempts, and also simpler, because I no longer needed to maintain and synchronize a separate data structure for indexing.