The Monk's Brew: Feedback For Better IF Parsing

Over on Twenty Sided, Shamus Young posted a blog today on, of all things, interactive fiction. Seems he's been playing "Phantom of the Arcade" lately, an Inform-based text adventure written recently by Susan Arendt, his editor at The Escapist, and made available online. This spurred him to ponder the familiar issues and frustrations related to the IF parser, and particularly how the parser handles unknown or unacceptable input. "Feedback itself," he states, "is a reward" -- feedback that shows that, even though the command entered is invalid, the author and/or parser has anticipated it enough to provide useful information rather than a generic "I beg your pardon?" response.

One idea he came up with for this involves modifying the IF environment (particularly one that is web-based) to create what he calls a "feedback parser":

To do this you’d just need a bit of functionality added to the parser: Whenever it encounters something it doesn’t understand, it needs to submit something to a database on the website with the subject, verb, and room. (And maybe a couple of other tidbits for housekeeping purposes.) Within a few hours of going live, the author should have a very clear picture of where the rough spots in the game are (what rooms had the most dud entries) and what the commonly attempted player actions are in those rooms. This would be much smoother and more seamless than simple playtesting, and would include the input of all players instead of just a handful of dedicated testers. It would make the designer better at their job, and help focus effort onto the most likely responses.

I had a long, drawn out response to this on his blog, which I thought might be useful to reproduce here as well for discussion. Shamus touches upon one of the more interesting arguments in the IF world, one that has been bantered around for some time: Do we need a better IF parser?

I actually started writing a blog entry with this title a while back, but never finished it. I was thinking more along the lines of a parser than can recognize a broader range of input. The jist of the discussion was going to be No, we don't -- you can argue that the tools are already out there, it just requires more work and dedication on the part of the author to have a system that is better able to handle unrecognized or unacceptable input and provide useful feedback, as Shamus suggests. The issue is not necessarily that we need parsers that can recognize and handle more players actions, but ones that can recognize and gracefully handle more unacceptable input.

Players usually cite the parser as the main problem with IF, as Shamus describes -- players want to try different things, but when the parser just responds with a generic "I don't know what you mean," players get turned off, particularly when this happens over and over. Players get frustrated when the parser doesn't provide sufficient feedback to understand why a particular action didn't work or was not understood. They also get frustrated when the parser doesn't recognize input that it probably should understand (like when players refer to an item described in the narrative, but which is not implemented as an object in the game world). Players are also frustrated when the parser doesn't accept a wide range of input, such as when they try to use adverbs (quickly, carefully, angrily, etc).

Some of these things are issues that the author needs to address, and the community has discussed for some time about how adequate testing is needed to uncover most of these things. Obviously, there's only so much testing one can do -- and often authors use other IF authors or veteran IF players for their testing, which doesn't always reveal some of the problems that might come up when less advanced players try the game. Still, after playing enough games you can generally spot the ones that have undergone thorough testing and those that haven't. Many IF games and engines can capture the full text transcript of a game from start to finish, which the author can then review to see what players did (or tried to do), and learn from that. A good example is Aaron Reed, whose 2005 IF game "Whom the Telling Changed" was a finalist at SlamDance in Park City. He saved all of the transcripts from when the game was on display at the competition, and eventually compiled, analyzed, and published the data on his web site. It was a very revealing look at how many (mostly newbie) players approached IF.

Some of the other issues mentioned above, though, are less author-dependent, but there are still things that authors can do to prepare for such things. The Inform system, for instance, supports a huge library of extensions which can make the parser work better in some cases -- particularly for newbies who don't necessarily understand how parsers typically work. The aforementioned Aaron Reed, for instance, has written a customizable Inform extension called "Smarter Parser", which allows the parser to understand a broader range of input and can direct newer players towards proper parser syntax. There are also extensions which can perform basic typo correction, so that players don't have to re-type commands just because of a simple mistype. And there are plenty of others.

Part of the problem with IF parsers is that they need to operate under a defined set of rules, and it's important for player to learn those rules. But I think it's a valid argument to say that it's largely the author's responsibility to help teach the player those rules (as well as the rules of that particular game world). Those rules are taught through sophisticated and comprehensive error trapping, so when the player tries to do something that breaks either the parser rules or the game world rules, the player is given a good explanation of why he or she cannot perform the desired action. If that happens, players are generally more accepting and willing to continue; in the absence of that, they are more likely to just say "screw it" and get back to blowing things up in an FPS.

This is not to say the solution Shamus describes is not a useful idea -- I think it's a great idea, in fact. What it represents is just a new way of expanding the "test base" for a game in a dynamic fashion. It hasn't been done up to this point probably because playing IF within web browsers is still a very new advance for IF. As the technology develops (there are still a number of issues to resolve), I suspect we will definitely see more solutions like this.

I think the handling of unrecognized input would have to be handled a certain way, because the list of errors could potentially be huge. I think what would probably be more useful is a system that just saves every game transcript to a database and sends it to the author, and if there is any unrecognized input, it can be (for instance) highlighted in red so authors can spot it easily. That way, the author will have the full context of the error on display, without having to figure out when, where, and why it happened.

But essentially, I think what Shamus is talking about is a different, more comprehensive approach to (continually) testing and refining the games that we make. Which is definitely a good idea worth pursuing.

3 comments:

Bruce said...: I've been working on a web-based IF engine based on this very concept, where the parser uses standard NL corpus tagging, hinted by statistical inputs based on gameplay (globally or specific to a game). It's a simple idea that applies best in collaborative environments.

In addition to statistical methods for noun or verb extensions, you can build up libraries of sentence structures, helpful for properly dissecting colloquialisms and shorthands that arise within given games or age groups of users.

The approach I've taken with this problem is to collect the transcripts for each game, presenting them in the author view along side the chapter/location/object definitions as missed cases (or events). A summary is available of the common problems globally as well, to give an author an idea of both the sticking points in the game, and opportunities for better immersion.; Nov 5, 2008, 3:48:00 PM
Anonymous said...: grrrr it lost my comment (last time I try the openid option).

Basically I don't think this kind of after-release testing is probably needed; beta testing is. Three things:

1) why not implement tool support to make beta-testing easier, eg a command that marks the last response as inadequate.

2) are there already sites to help authors looking for beta-testing, if so can these be promoted inside authoring software

3) can there be community encouragement of testing eg restrict competition entries or reduce prominence of IF Archive listing for games that aren't sufficiently beta-tested

'New to IF' users pose a different problem, I think this can better be addressed by interpreter/system-provided tutorials.; Nov 6, 2008, 12:32:00 PM
El Clérigo Urbatain said...: Curiously enough I've dropped a suggestion about that kind of system (a tester interface) in the Emily Short's blog. I don't pretend to press her to build that kind of thing, but I find her "collaborator interface" of her project "Alabaster" to be a first step in this direction everyone is talking about here. You can cast an eye to it at:
http://emshort.wordpress.com/2008/11/09/alabaster-interim-status-report-release-18/; Nov 10, 2008, 5:07:00 AM

November 5, 2008

Feedback For Better IF Parsing

3 comments:

> Describe The Monk's Brew

> Describe Rubes

> Describe Vespers

> Show blogroll

> Show Blog Archive

> Subscribe to The Monk's Brew

> Show FeedBurner FeedCount

> Show Followers

> Show BlogNetwork