Negative Ships and Unlimited Money Postmortem

Posted by

I'm sure everyone who plays War Worlds regularly will have noticed that there was a bit of an "incident" over the last couple of weeks. This post is basically a quick postmortem of what went down, and what I've learnt from the experience (tl;dr: a lot!).

Background

Before we get started, first a little background about how the game actually works. The server-side is basically just a web server, that receives HTTP requests from clients. Everything is stored in a PostgreSQL database and the web server provides a layer of business logic on top of that database. The actual request/response is encoded using Protocol Buffers, which are an efficient binary encoding scheme (at some point, I'm thinking of also switching to SPDY instead of HTTP as well, for extra efficiency).

So for example, when you go to build a fleet of ships, say, the details of the request (such as the design you want to build, number of ships and so on) are serialized in a protocol buffer, and the request is posted to https://game.war-worlds.com/realms/beta/buildqueue.

Now, anyone who's ever built a web form will know, it doesn't matter whether you validate all your inputs in JavaScript in the form, you still need to validate them again on the server-side, because anyone can build their own form to post to your server. Well, the same is true here. Even though all inputs are validated in the game client, they also need to be validated by the server as well.

Generally, I think I do a pretty reasonable job of that, but I've been a somewhat lucky to date, since it's slightly more difficult to directly post requests to the server outside of the game client compared with a web form. Slightly more difficult, though not impossible!

Enter Proto Baggins

Proto Baggins is a game developer himself, he works for a major game company. So reverse-engineering the protocol buffer-based protocol is a pretty straight-forward affair. Initially, he was just using it to automate some tasks (presumably builds and expansions).

In addition to the automation, he also built some interesting tools to export visualizations of the game. For example, here's a visulation of his empire:

This is actually a pretty interesting visualization, since it shows the poisson distribution of stars pretty well (that's the algorithm I use to ensure stars are placed randomly, but never too close or too far apart). I go into the details of that algorithm a bit in this old post from a few years ago.

Here's another visualization Proto made, this time showing the entire War Worlds universe:

This time, you can see distribution of empires is roughly circular. This is because when a new player signs up, I first try to slot him in a star somewhere in the middle (usually where another empire joined, but then abandoned). If there's nowhere in the middle to add them, then new signups get added to a piece of empty space as close as possible to the "centre", leading to a roughly circular distribution. Also interesting here you can see all of the early players clustered around the middle, with some of the newer player's large empires around the outside. There's roughly 1.7 million stars represented here.

Negative Ships

Now, somehow, during Proto's probing of the game (which, I will say, is something I am not necessarily opposed to), he managed to get a fleet with a very large negative number of ships. I'm not entirely sure how this happened, but I suspect there was an integer overflow somewhere. An integer overflow happens when you have a very large integer value (> 2 billion) and it wraps around to a very large negative number instead. The issue here is that now, instead of costing money to send ships to other stars, you would be earning money to send ships to other stars!

Now, this doesn't actually get us infinite money, but it's quite easy to generate as much as you'd like. However, there was another glitch which did allow you to get infinite money, which I'm dubbing the Infinite Money glitch.

Infinite Money

This glitch was actually much more serious, in that it it actually allowed you to just generate money on demand (by depositing it into an alliance, and then having someone else take it out). I'm also not entirely sure how it was triggered, but it seems to have something to do with adding upgrades to fleets with "0" ships in them. Accelerating these builds seems to have caused the empire's "cash" value to go to NaN, which is a rather special number that tends to "infect" every calculation you make with it.

What Went Down

The exploits of Proto Baggins is what actually tipped people off to something going awry. People noticed his negative ship count, and that was quite suspicious. Next, they noticed that players who had only been in the game for a relatively short time were somehow able to amass massive armies of millions of ships, which they would launch against unsuspecting empires, who then had to scramble to deflect it.

What I discovered is that two empires, Carnage and Fatcow240, had hit the "Infinite Money" glitch and were depositing vast sums of cash into the alliance "Central Bank". They would then let their friends join that alliance, withdraw cash, and the cycle would continue.

Here's an example of what players would've seen before possibly getting completely wiped out:

Fallout

After I figured out what was going on, I decided to ban the players who were generating money. So that's Proto Baggins, Carnage and Fatcow240. Not everyone has been happy with that decision, but I felt it was warranted for two reasons:

  1. While there are glitches from time to time, and glitches can sometime last for a while before I am able to fix them, this particular glitch was esspecially bad in that it only available to a few players who were "lucky" enough to have triggered the edge condition needed. Other glitches, such as the one that allowed players to create thousands of wormhole generators, were available to everyone: nobody was unfairly disadvantanged (at least, not once they were made aware of the glitch's existence), and
  2. The advantage gained from the exploit was disproportionately large. Essentially infinite. The wormhole generator glitch was relatively harmless, in that once you've created the wormhole generators, there's not much you can do with them, since they're so expensive to move anyway. This exploit allowed a player to launch massive armadas of ships, which are basically impossible to defend against by anyone who was not also in control of an infinite supply of cash.

In addition, I went through the audit history of the "Central Bank" alliance, and anybody who withdrew cash from that alliance has had all their cash confiscated. I thought about also removing any ships that they accelerated the build of after getting their cash, but decided that it would be quite a lot of work to go through everybody's fleet list and remove the ones that seemed to be affected. I figure if you have no cash to actually move those ships, then it's not as bad.

What Went Well

There were a couple of things that helped resolve the situation.

Firstly, every single cash transaction in the game is stored in a special "audit" table. Every time you accelerate a build, move a fleet of ships, or deposit or withdraw to/from an alliance, a record is added to the "audit" table. This allowed me to easily go back and find out everyone who had withdrawn cash from the "Central Bank" alliance. It also means I could pinpoint when Carnage and Fatcow240 had managed to trigger the "infinite cash" bug, which then allowed me to narrow down the cause to an accelerate of a "0-ship" build.

Secondly, while my server-side code isn't perfect at filtering out bad requests, I think it's done a fairly decent job overall so that Proto Baggins, even though he was making requests directly to the server, wasn't able to do too much damage (apart from triggering the negative ship thing, as I noted above, which I consider to simply be a bug somewhere). This could still be somewhat improved, though.

The bugs themselves have now all been fixed, and I managed to fix a couple of other glitches which have been bugging people as well (such as the aforementioned wormhole generator bug, and another one which let you build boosters and such for cheap).

Finally, the players themselves have been quite supportive of some of the tough decisions I had to make, and also at reporting the issues. People sent me screenshots, snippets of conversations, and all sorts of details which greatly helped to track down the issues. Thanks to everyone who has reported an issue in the past!

What Went Wrong

Not everything was perfect, of course. The biggest problem is that I didn't realise anything was wrong until quite a long time after it started happening. This is mostly my own fault, because my email inbox receives quite a bit of mail from not just players, but also app store marketers (spammers, essentially), and all sorts of people, so things tend to slip through the cracks.

Also, my "audit" table is literally just a table in the database. So actually trawling through it look for evidence is a little time-consuming, concocting SQL queries and parsing the output into csv files and so on.

Finally, I'm just one person. I wish I was able to dedicate more time to community moderation and monitoring, but I generally only have a couple of hours per day that I am able to dedicate to War Worlds (usually after the kids are in bed), and I'd much rather be working on new features of fixing bugs than tracking down people exploiting bugs...

Action Items

Now, all of the actual issues should have already been fixed (i.e. the bugs that caused these glitches in the first place), but a couple of additional action items have come out of all this:

  1. Spend more time in the chat. Keeping up with all of the email I get is a little tricky, but simply jumping on the in-game chat every now and then seems like a great way to keep on top of things (I do play the game as well, but I usually don't have time throughout the day to read chat as well),
  2. Get a proper issue-tracking website. This one seems like a no-brainer, but some issues can linger for weeks or months in my inbox because I don't have a central way of tracking everything that people report. This has the added advantage that spammers marketers won't be contacting me there.
  3. Monitoring and alerts! I need better monitoring for suspicious activity. If I'd been monitoring for things like spikes in requests per second, 400-errors per second and so on, I would have easily caught Proto Baggins before he'd been able to amass his large army.

So look for some of these changes coming in the next few weeks!

Android in your car!

Posted by

So I promised to explain why I've been a little absent from War Worlds over the last few weeks, and now that Google I/O is on, I can finally reveal! At work, I work on the Google Maps for Mobile team, and we've been super busy getting Google Maps ready for the big reveal at I/O this year!

It's been a long couple of weeks, working hard, but now the big demo has gone off without a hitch, so hopefully I will be able to return to my regularly-scheduled role as benevolent overload and bringer-of-updates :)

If you want to learn more about Android Auto, here's the bit from the keynote where we got introduced. I'm so proud of my team!

(if it doesn't jump straight to the right second, you can fast-forward to 1h 35m in)

I'll be happy to answer any questions you have, though I'm not entirely sure how much I'm allowed to talk about just yet :)

Anyway, it's back to work on War Worlds for me! There's been a few updates over the last few days where I've tried to start pairing down some of the major crashes. I hope to have another update in the next few days that'll make a really big difference, but I think it'll be better if I spend a bit more time testing... we've had a few false starts over the last few days and don't want to repeat that!

Finally got so sick of MySQL I ditched it for PostgreSQL

Posted by

So after my last post dissing MySQL, I've had a few more issues with MySQL that's made me decide it's far more trouble than it's worth to keep it, so I'm ditching it for PostgreSQL.

The final straw was partially my own fault. I added a new table to the game's database, but forgot to add it to the Blitz database. So my blitz database started filling up with error messages (error messages get added to the database so I can analyze them later). It only took a couple of days for ibdata file to fill the entire disk. Now, you'd think this wasn't all that big of a deal, right? Just truncate the table, delete all the rows and it'll shrink back down again, right? Wrong, there is no way to shrink the the ibdata file! What you have to do, if you want to shrink your ibdata file, is export your data, blow the whole thing away, and re-import it again!

Worse, I can't just fix the table and leave the big file, because table definitions are stored in the same directory, and the disk was already full!

(There are other reasons for not using MySQL as well, but the general feeling of distrust towards Oracle also helps).

But in the end, since blowing the whole database away and starting again was my only option anyway, I decided that I'd finally migrate off MySQL.

Migrating the actual data

This was actually the simplest part, using pgloader, you just set up the mapping and it runs in no time. The biggest issue I faced was that the tool is written in lisp and the version of list that Debian runs is quite old, so I had to compile the latest version by hand. But with that out of the way, the migration is actually very quick (about 5 minutes for 4GB of data).

Migrating the code

When developing the server code, I was careful to limit the number of MySQL-isms, but it wasn't entirely possible to eliminate all of them. The first step in porting the code was to switch to PostgreSQL's JDBC driver, which is luckily not that difficult. It took me only a couple of hours one evening to port the rest of code away from MySQL-isms to PostgreSQL-isms, and another couple of evenings testing it all (this is where unit tests would have come in handy...)

Conclusion

I've only been running PostgreSQL for a few hours, but so far the performance has been pretty good. We'll see how it goes in the future.

Game Server status monitor page

Posted by

After various outages and problems we've had in the last week or so (which seems to have improved after moving to a faster server, yay!) I decided to add a small status page to the website where you can go and check whether the server is down for everybody, or if maybe it's just something local to you. So now if you visit www.war-worlds.com/status, you'll see something like this:

Here you can see we have a brief outage last night when I accidentally detactched the disk which had the database on it (whoops!) but the good thing about this new status page is that it also emails me when it can't contact the server (or when the server returns an error).

The way it works is, every 5 minutes, I do a request to the game. The request does a bit of database reads, a bit of CPU and a bit of database writes. This will hopefully test everything that normal requests will do, so that it gives a good indication of what you can expect from the actual game server. Then it displays the last 24 hours in this nice graph format. At some point, I might start doing more historical data as well, but for now at least, the last 24 hours is it.

One interesting point here is the spikeyness of the response times. It seems like there's a bit of a spike in activity around midnight (Sydney time: all the times are local, and I'm in Sydney), which is OK, but I'm not sure what's causing the spikes at the other times, which all seem to happen fairly regularly... something to investigate!

Wormholes

Posted by

The wormholes have landed!

There's a couple of things that have changed to support Wormholes. Firstly, you'll notice you need to update your Shipyard to level 2 before you can build the Wormhole Generator ship:

Once you've upgraded your shipyard, you can build a Wormhole Generator:

As you can see, it's pretty expensive, but not prohibitively so. Once built, you need to move the generator to a piece of empty space that's not too close to an existing star. If you try to select a spot that's too close, you'll get a red circle like below indicating you need to choose a different area:

 

If there's no red circle, you're good to go. Click move and wait.

Managing your wormhole

Once the wormhole is in place, it doesn't actually do anything until you "tune" it to a destination wormhole. Open up the new wormhole and click on "Tune Destination". You'll see a list of all the other wormholes your alliance owns:

You can see at the bottom of this dialog is an indicator of the time it will take to tune. The very first time you tune a wormhole, the tuning time is instant. After that, tune time is 2 hours for the next tuning, then 4 hours, 9 hours, 16 hours and so on. After two weeks, the tuning time resets to 2 hours.

If the wormhole is tuning, you cannot send fleets through it, and you'll get an indicator of how long is left before the tuning completes.

Once the wormhole is tuned to a destination, you can send fleets through it by selecting them at the bottom and clicking "Enter". Travel through a wormhole is instant and you can use the "View Destination" button to switch to the destination wormhole.

Behaviour of fleets with allies

Another important change in this update is the one you can see in the picture above. Fleets which belong to empires in your alliance will not attach each other. This means you can now seen troop to re-enforce your buddies, but it also means you can send fleets through your wormholes without worrying about whether your allies will attack you or not.

Note that if you leave an alliance, your wormholes may still be tuned to their wormholes (and vice versa), so that's definitely something to be aware of!