I'm sure everyone who plays War Worlds regularly will have noticed that there was a bit of an "incident" over the last couple of weeks. This post is basically a quick postmortem of what went down, and what I've learnt from the experience (tl;dr: a lot!).
Before we get started, first a little background about how the game actually works. The server-side is basically just a web server, that receives HTTP requests from clients. Everything is stored in a PostgreSQL database and the web server provides a layer of business logic on top of that database. The actual request/response is encoded using Protocol Buffers, which are an efficient binary encoding scheme (at some point, I'm thinking of also switching to SPDY instead of HTTP as well, for extra efficiency).
So for example, when you go to build a fleet of ships, say, the details of the request (such as the design you want to build, number of ships and so on) are serialized in a protocol buffer, and the request is posted to https://game.war-worlds.com/realms/beta/buildqueue.
Generally, I think I do a pretty reasonable job of that, but I've been a somewhat lucky to date, since it's slightly more difficult to directly post requests to the server outside of the game client compared with a web form. Slightly more difficult, though not impossible!
Enter Proto Baggins
Proto Baggins is a game developer himself, he works for a major game company. So reverse-engineering the protocol buffer-based protocol is a pretty straight-forward affair. Initially, he was just using it to automate some tasks (presumably builds and expansions).
In addition to the automation, he also built some interesting tools to export visualizations of the game. For example, here's a visulation of his empire:
This is actually a pretty interesting visualization, since it shows the poisson distribution of stars pretty well (that's the algorithm I use to ensure stars are placed randomly, but never too close or too far apart). I go into the details of that algorithm a bit in this old post from a few years ago.
Here's another visualization Proto made, this time showing the entire War Worlds universe:
This time, you can see distribution of empires is roughly circular. This is because when a new player signs up, I first try to slot him in a star somewhere in the middle (usually where another empire joined, but then abandoned). If there's nowhere in the middle to add them, then new signups get added to a piece of empty space as close as possible to the "centre", leading to a roughly circular distribution. Also interesting here you can see all of the early players clustered around the middle, with some of the newer player's large empires around the outside. There's roughly 1.7 million stars represented here.
Now, somehow, during Proto's probing of the game (which, I will say, is something I am not necessarily opposed to), he managed to get a fleet with a very large negative number of ships. I'm not entirely sure how this happened, but I suspect there was an integer overflow somewhere. An integer overflow happens when you have a very large integer value (> 2 billion) and it wraps around to a very large negative number instead. The issue here is that now, instead of costing money to send ships to other stars, you would be earning money to send ships to other stars!
Now, this doesn't actually get us infinite money, but it's quite easy to generate as much as you'd like. However, there was another glitch which did allow you to get infinite money, which I'm dubbing the Infinite Money glitch.
This glitch was actually much more serious, in that it it actually allowed you to just generate money on demand (by depositing it into an alliance, and then having someone else take it out). I'm also not entirely sure how it was triggered, but it seems to have something to do with adding upgrades to fleets with "0" ships in them. Accelerating these builds seems to have caused the empire's "cash" value to go to NaN, which is a rather special number that tends to "infect" every calculation you make with it.
What Went Down
The exploits of Proto Baggins is what actually tipped people off to something going awry. People noticed his negative ship count, and that was quite suspicious. Next, they noticed that players who had only been in the game for a relatively short time were somehow able to amass massive armies of millions of ships, which they would launch against unsuspecting empires, who then had to scramble to deflect it.
What I discovered is that two empires, Carnage and Fatcow240, had hit the "Infinite Money" glitch and were depositing vast sums of cash into the alliance "Central Bank". They would then let their friends join that alliance, withdraw cash, and the cycle would continue.
Here's an example of what players would've seen before possibly getting completely wiped out:
After I figured out what was going on, I decided to ban the players who were generating money. So that's Proto Baggins, Carnage and Fatcow240. Not everyone has been happy with that decision, but I felt it was warranted for two reasons:
- While there are glitches from time to time, and glitches can sometime last for a while before I am able to fix them, this particular glitch was esspecially bad in that it only available to a few players who were "lucky" enough to have triggered the edge condition needed. Other glitches, such as the one that allowed players to create thousands of wormhole generators, were available to everyone: nobody was unfairly disadvantanged (at least, not once they were made aware of the glitch's existence), and
- The advantage gained from the exploit was disproportionately large. Essentially infinite. The wormhole generator glitch was relatively harmless, in that once you've created the wormhole generators, there's not much you can do with them, since they're so expensive to move anyway. This exploit allowed a player to launch massive armadas of ships, which are basically impossible to defend against by anyone who was not also in control of an infinite supply of cash.
In addition, I went through the audit history of the "Central Bank" alliance, and anybody who withdrew cash from that alliance has had all their cash confiscated. I thought about also removing any ships that they accelerated the build of after getting their cash, but decided that it would be quite a lot of work to go through everybody's fleet list and remove the ones that seemed to be affected. I figure if you have no cash to actually move those ships, then it's not as bad.
What Went Well
There were a couple of things that helped resolve the situation.
Firstly, every single cash transaction in the game is stored in a special "audit" table. Every time you accelerate a build, move a fleet of ships, or deposit or withdraw to/from an alliance, a record is added to the "audit" table. This allowed me to easily go back and find out everyone who had withdrawn cash from the "Central Bank" alliance. It also means I could pinpoint when Carnage and Fatcow240 had managed to trigger the "infinite cash" bug, which then allowed me to narrow down the cause to an accelerate of a "0-ship" build.
Secondly, while my server-side code isn't perfect at filtering out bad requests, I think it's done a fairly decent job overall so that Proto Baggins, even though he was making requests directly to the server, wasn't able to do too much damage (apart from triggering the negative ship thing, as I noted above, which I consider to simply be a bug somewhere). This could still be somewhat improved, though.
The bugs themselves have now all been fixed, and I managed to fix a couple of other glitches which have been bugging people as well (such as the aforementioned wormhole generator bug, and another one which let you build boosters and such for cheap).
Finally, the players themselves have been quite supportive of some of the tough decisions I had to make, and also at reporting the issues. People sent me screenshots, snippets of conversations, and all sorts of details which greatly helped to track down the issues. Thanks to everyone who has reported an issue in the past!
What Went Wrong
Not everything was perfect, of course. The biggest problem is that I didn't realise anything was wrong until quite a long time after it started happening. This is mostly my own fault, because my email inbox receives quite a bit of mail from not just players, but also app store marketers (spammers, essentially), and all sorts of people, so things tend to slip through the cracks.
Also, my "audit" table is literally just a table in the database. So actually trawling through it look for evidence is a little time-consuming, concocting SQL queries and parsing the output into csv files and so on.
Finally, I'm just one person. I wish I was able to dedicate more time to community moderation and monitoring, but I generally only have a couple of hours per day that I am able to dedicate to War Worlds (usually after the kids are in bed), and I'd much rather be working on new features of fixing bugs than tracking down people exploiting bugs...
Now, all of the actual issues should have already been fixed (i.e. the bugs that caused these glitches in the first place), but a couple of additional action items have come out of all this:
- Spend more time in the chat. Keeping up with all of the email I get is a little tricky, but simply jumping on the in-game chat every now and then seems like a great way to keep on top of things (I do play the game as well, but I usually don't have time throughout the day to read chat as well),
- Get a proper issue-tracking website. This one seems like a no-brainer, but some issues can linger for weeks or months in my inbox because I don't have a central way of tracking everything that people report. This has the added advantage that
spammersmarketers won't be contacting me there.
- Monitoring and alerts! I need better monitoring for suspicious activity. If I'd been monitoring for things like spikes in requests per second, 400-errors per second and so on, I would have easily caught Proto Baggins before he'd been able to amass his large army.
So look for some of these changes coming in the next few weeks!