
I. Introduction
Andrew Stern and Michael Mateas’ “Facade” is a work of art that anyone interested in interactive media should play at least once in their life. Although its authors describe their work as “interactive drama”, and I would call it as a video game, I think there are many reasons to discuss it here, in the context of interactive cinema. Facade was an attempt at a revolutionary change in interactive media. Its authors have attempted to build an interactive experience combining the virtues of Aristotelian drama with the concept of player agency found in video games.
The player of Facade is cast as the friend of a thirty-something couple whose marriage is on the rocks. All of the action takes place in one room of the couple’s small apartment. The player has been invited over for cocktails and the scenario takes about twenty minutes to play out from start to finish. The environment is rendered from a first person point of view in 3D, and the player communicates with Grace and Trip (the couple in question) by typing in text using the keyboard. The player can also use the arrow keys and the mouse to move about the room and click on objects of interest.
Facade is an ambitious and complicated piece of software that defies examination from any one framework, so I think it will be most illuminating to discuss it from multiple points of view, beginning with a description of Facade as a game or series of games.
II. Facade as Game
Although the player is never never given any explicit goals in Facade, the authors have designed the conversation with Grace and Trip as a sequence of two simple “social games”. The first is a game of affinity. Grace and Trip will speak about whatever they wish to speak about, or respond to the actions of the player. Their dialogue and actions are scripted but randomized, and each character has more potential actions and statements than could be played back in one session. Every time the player says something during the “affinity game” phase, the artificial intelligence will decide if their statement should be interpreted as being in favor of Grace, or of Trip. If the player attempts to take a neutral position, the system will keep the affinity game going indefinitely until the player shows a preference for one of the characters.
The affinity game is followed by a “therapy game” where Grace and Trip get into a serious argument about their problems with each other. Depending on what the player says, they can cause Grace and Trip to either break up or patch up their relationship. This conclusion is not binary, rather there are about five different degrees of therapeutic success that can be obtained. If they wish, the player can decide to be antisocial and aim for a breakup rather than a reconciliation.
While the fundamental choices being made here are simple, the interface is rather sophisticated. While most games would resolve questions like a character asking if you think they’re in the right or not though one entry in a fixed dialogue menu, Facade mediates these choices through a series of verbal statements by the player that aggregate to determine a final outcome. To achieve their desired ends, the player must have the right idea, and say it in such a way that it is understood by the system. The nature of this challenge can be described as “rhetorical” and perhaps also “empathetic”.
III. Speech Recognition Technology
An interactive conversation that uses a text parser as an interface requires some degree of natural language recognition. Getting a machine to interpret human language is incredibly difficult and has always been a shaky area of artificial intelligence research. Although I am not privy to all of the details behind Facade’s speech recognition system, I feel comfortable describing Grace and Trip as goal oriented descendants of “Eliza” and other chat-bots. Although the player is free to type in anything they want, the game characters can only try their best to guess at an interpretation of what the player meant by responding to certain key words that are known to be relevant to the current conversation state. Like its Turing Test predecessors, Facade’s speech recognition system disguises its limited interpretive ability through clever character design. Grace and Trip both have personalities that are incredibly self absorbed. When they don’t have a good idea of what the player is trying to tell them, they will simply change the subject, giving the impression that they are insensitive and aloof human characters rather than machines who lack the ability to comprehend basic language.
IV. Drama Management
Facade has one other massive piece of technology under the hood, the Drama Manager. As far as I am aware, this was a completely novel software tool at the time of its release in 2005. The drama manager is a piece of software that tries to make the player experience feel like a well crafted dramatic story, regardless of what the player says or does. It works by modifying the things that Grace and Trip say in response to the player’s dialogue based upon a position along the narrative trajectory, which advances when certain dialogue events occur. The overall goal of the drama manager is to make sure the player has a good time while preserving their agency, their ability to act as they see fit.
V. Impressions of Facade
I have a ludist perspective on games and interactive art, so its difficult for me to evaluate Facade on the terms specified by its designers. While I think that games have a lot of elements in common with stories, I don’t believe them to be a form of narrative. Stern and Mateas describe drama as a subset of narrative, so I can’t say if they have succeeded or not in creating “interactive drama” because I don’t believe that “interactive drama” is a possible concept. Nonetheless, playing through Facade can definitely feel “dramatic” in an everyday sense, as indicated by this simulated personal ad:
SWM. Looking for friends, maybe more. No Drama Please!
While Facade’s drama manager is pretty good at making interesting things happen to the player, I’m not as sure that it preserves agency. If agency means being able to achieve a certain end through our actions then it succeeds. If the player’s influence on the experience as a whole matters then the whole concept is flawed to begin with. Perfect agency means accepting the possibility of a boring play session if that is the logical result of a player’s actions.
Agency or not, Facade does at time feel like you’re speaking with real characters, with some purpose in mind, and to that end I salute it. The natural language recognition is good, but not a massive leap forward. If the player is trying to “fit in” and say things they think Grace and Trip will understand then a lot of the time the system works, but sometimes the interpreter will choke. This is more true in cases where the player is trying to break the AI, but anyone who is playing “in good faith” will encounter a lapse in language interpretation at least once per game session.
Facade delivers a purposeful and believable conversation with artificial beings in a more convincing package than any previous interactive experience I have had, but is it a watershed in interactive art? With regards to its drama manager, this is a matter of taste. If you value player agency over reliable drama at any margin then Facade is a dead end. If some curtailing of agency is permissible or desirable then the drama manager concept is definitely something to pursue in the future.
On the conversation front, I have to say that while I would welcome more games about talking to people, I’m less certain that natural language recognition is the way forward. It took a lot of labor on the part of Stern and Mateas to get a partly functional interpreter, and more work could end in diminishing returns. Although this month’s Jeopardy exhibition match featuring IBM’s “Watson” AI has demonstrated the possibility of a truly next-generation human language reading machine, its not obvious how such a technology would integrate with the art and practice of interactive drama authoring or game design.
Do Stern and Mateas, or IBM believe that they could package a speech parser in API form, to be licensed by developers the way they would buy a physics system or a graphics engine? Pursuing this possibility would represent a massive shift in the way labor is allocated in interactive work. We are used to spending millions of dollars to realize detailed environments. Are we ready to spend the same amount on implementation of language recognition? Even if mainstream studios were willing to do so, I don’t see this is a possibility for independent developers like me. I have to declare natural language recognition to be incompatible any projects that I might embark upon in the foreseeable future.