As the developer of an "indy" game (meaning I don't have the backing of a big publisher, though Google is nice enough to give me 1 day per week of work time to spend on this), I have a problem that a lot of us indy developers share: play testing is hard. Play testing is the tried-and-true method of determining the quality of a given game design by watching users who are not experts play. When I worked in the game industry we'd bring kids in to the office, plunk them down in front of a TV that had a game machine and VCR hooked up, hand them a controller, and record their entire play session. This is hugely valuable information because problematic areas in the design are almost immediately identifiable. Nothing settles a design question faster than watching a real user fail to understand what you are trying to get them to do over and over again.
But as an indy developer, I don't have an office, I don't have a pool of play testers to draw from, and I can't easily connect my Android device to a VCR. My experience has been that play testing can vastly improve the quality of a game, so I really don't want to skimp on it. And the common refrain repeated by developers of universally loved, amazingly high-quality games is that they tested and iterated and tested and iterated and tested and iterated again until their game felt right.
For example, Naughty Dog, the rockstar developers behind the Crash Bandicoot series, the Jak and Daxter series, and most recently, Drake's Fortune, has been talking about the value of play testing for years. Back in the Playstation 1 days they pioneered methods for recording metrics about player style so that their levels could be tuned to increase in difficulty gradually. On top of that, their Crash Bandicoot series is a poster child for Dynamic Difficulty Adjustment (DDA), an automated system for dialing a game's difficulty up or down as the player plays to keep the level of challenge constant (full disclosure: I worked on a couple of Crash games for GBA at Vicarious Visions). Their goal is for the user to never see the game over screen; if the player dies repeatedly in the same spot, that indicates a failure of the design. With DDA and play testing, Naughty Dog has been extremely successful in smoothing out these "difficulty cliffs" and, as a result, their games get a lot of love.
So with these lessons in mind I released a build of Replica Island to Google employees about two weeks ago. Since I can't stand over every player's shoulder and watch them play, I asked for people to tell me about bugs that they encountered and provided a very short survey form at the end. Though a lot of people downloaded the game and tried it out, very few bothered to fill out the survey and even fewer wrote me directly about their thoughts. I learned a few things about my game but I did not collect not nearly the amount of information I need to tune my level design and game play decisions for a mass audience.
Fortunately, I also built some automated metric recording functionality into the game. Taking a page from Naughty Dog's book, I decided to focus on death locations: every time a player dies, I want to know about it. Looking at the aggregate results over a number of players should reveal hotspot areas where lots of people died; these areas are the first candidates for polish and tuning. So I added a very simple function to ping a web server every time the player dies and restarts the level. The ping carries no information about the user himself: just which level he is playing, where he died, and a session identifier so I can see if people died multiple times in the same place. After I released the game to Google employees the data began to roll in, and after a while I started to have enough information to be statistically relevant.
Once I had the data, I needed to decide how to use it. This is still a work in progress, but here's how I'm using it so far.
This is an image of an early level in Replica Island. The little colored dots represent death events from various users (each user is a unique color). As you can see, there's a couple of obvious spots where a lot of users died. (Actually, this is only part of the data I have received so far--the quality of the results increases as more players respond).
This spot is where the player encounters an enemy robot, one of the first in the game. This is a significant example because it takes place so early in the level; deaths in this location mean that the user got hit here at least three times, as there's nothing earlier in this level that can reduce the player's life. Also, now that the data has singled this location out, I can see why it's problematic; the low ceiling makes the Android's butt-stomp move (which is the only attack he has in the game) hard to pull off for novice players, and the slope likely causes people who are not used to the physics to roll down right into the enemy. This is a failure of the level design: I really don't want first-time players to die after playing a few seconds into the first or second level. Fortunately, the solution is simple: in this case I can just remove the enemy from that location.
Though this type of metric reporting and visualization is pretty rudimentary, it has already proven extremely valuable in identifying problematic level segments. It's also confirmed one of the conclusions that I drew from the survey feedback: nobody reads on-screen text, so in order to teach the player how to play I need to use big, visual indicators and design levels to require specific moves.
I am so happy with these results that I'm thinking of expanding the level and frequency of the events that I report; it would be great to know where players take the most damage, or which pits they fall down the most, or how long player spent traversing specific level segments. Now that the framework for event reporting and visualization is in place, it is pretty easy to add new types of events to it.
In fact, I could even release the game with this system turned on. It's probably a good idea to let the user opt-out if they want (even though I don't transmit anything personal), and I'd rather do my test and polish before the game is released, but leaving these metric reporting systems in place for the full release is very tempting. It would give me real, concrete data to tune and improve Replica Island even after it has been released.