In the past, I have talked about conversational interfaces with posts like the “Turing Test and the Loebner Prize Competition.” My interests are not purely theoretical, as I have actively explored integrating natural language deeply into applications in such ways as interpreting all text inside documents and code files and presenting a conversation stream.
The company I founded, SoftPerson, LLC, develops “smart software,” which are desktop applications that utilize mostly symbolic artificial intelligence including natural language processing. The overarching design criteria for my software is the capture of human thought process—human-like reasoning—into the codebase, so that software ultimately acts as an intelligent agent—a “soft person” or a virtual replacement for a human. In a sense, the applications are similar to Siri, which describes itself as a virtual assistant.
(The desktop market may seem mature, but there are many more undiscovered document types and, even in existing categories, there are additional ways of differentiations. Software applications are a high margin business. Despite Microsoft having a monopoly on some desktop application, niche versions of existing types of applications make considerable sums for their owners, and many successful companies are based on a single product: WhiteSmoke, FinalDraft, Moos ProjectViewer, SmartDraw, Quantrix, and Ventuz.)
In a sense, computers have historically been conversational, with a command line console being a form of conversation in which the user communicates to the computer in the computer’s own language with its limited grammar. The trend is moving towards the other end, where the computer understands more and more the language of the user.
The business plan I wrote for SoftPerson in 2002 featured a natural language writing product that incorporated a conversational interface in one of three different writing modes. Rather than feeling forced, the conversation interface was more natural than the one it would replace. The software would become one’s own personal ghostwriter. The plan was a finalist in two national business plan competitions and won prize money in the 2002 New Venture Championship competition held in Portland, Oregon. I have been working on this product for some amount of time. It includes a natural language parser that is an improvement on the Link parser from Carnegie Mellon University; however, the parser may be switched to the Stanford parser, which is more accurate, for a licensing fee.
I won’t talk much about the aforementioned product for intellectual property reasons, but I did look into the possibility of creating a graphics program by simply describing the desired image through words or sketching or other forms of input just as much as one would dictate to a human artist. In effect, the computer becomes one’s own personal artist.
Wouldn’t it be great if a computer can dynamically produce any image a person would so desired? How much more versatile could software be if it could render arbitrary scenes depending on the context as part of their operation instead of canned photographs? In addition to the complexity of determining meaning through words, a large library of graphical assets would be needed in mathematical form. A recent TED video of PhotoSynth (at 3:40) actually suggests that these assets can be data-mined from tagged Flicker images using computer vision techniques.
In the course of posing this question and researching the practicality of it in 2002, I discovered existing research called Word’s Eye that utilizes a conversational interface to construct images. The technology is described in the research paper WordsEye: Automatic Text-to-Scene Conversion System, but more accessible descriptions and examples are available in the following Creators Project blog post, “Wordseye Is An Artistic Software Designed To Depict Language.”
The above is an example of an uncanny WordsEye rendering an scene based on a text description. The graphics were licensed from a library of 3D model and are transformed to fit into the scene. This can be taken a step further to produce non-photorealistic rendering effects.