Using custom properties to reduce the number of entities you need to store in App Engine

Posted 24 Jul 2012 by Dean Harding

One of the issues I've run into with the backend of War Worlds is the sheer number of entities. The universe is made up of sectors, each sector contains around 30 stars and each star has (on average) 5 planets. Up until now, I'd been storing each of those things as a separate entity (so one entity for the "sector", one for the "star" and one for the "planet").

Proliferation of Entities

The problem with that is it means there's around 1,629 entities in a 9x9 grid of sectors (the game works on a 9x9 grid of sectors and as you scroll around, we try to keep your view centred on that 9x9 grid). This is a pretty major cost not only in terms for performance, but also in terms of actual dollar cost (App Engine charges you per entity read/written).

Everything is kept in memcache, of course, but even with that, just doing some testing by myself, I could easily use up App Engine's "free" quota for data store reads in a single day (you get 50,000 entity reads for free)

I wanted to find some way to reduce the number of entities, without over complicating the code (i.e. without totally re-inventing the data store myself). Thinking about it, I came up with two solutions:

Since the random number generator used to generate the sectors was fixed and predicable, I could simply not store the stars and planets at all, and just re-generate them on demand from a fixed random number seed, or
Somehow "consolidate" the disparate entities into a binary blob stored in one container entity.

#1 is quite tempting, but I eventually decided against it because it means once a sector has been generated, not only do I need to know the seed used to re-generate it, I need to know the exact algorithm I used. That is, if I ever decided to change the generation algorithm (to tweak parameters or whatever), I would need to keep the old algorithm around just to keep re-generating the same stars/planets as before -- you can't have a sector where a player has colonies and so on change out from under him just because I tweak a few parameters!

The other problem with #1 is that if I ever decide to add a new "kind" of planet (say), I wouldn't be able to retroactively go through existing sectors and just add/update some of the planets.

So that leaves #2, and it turns out it's actually not that difficult to do.

Custom Properties on your model

I decided that the most bang-for-buck could be had by storing the planets as a collection of objects directly in the "star" entity. Making just that change would reduce the number of entites in a 9x9 grid of sectors from 1,629 to 279.

So to begin, App Engine only has a few built-in property types. If we want to store a new kind of object as a property, then we need to come up with our own property type. Our planets are already described by a protocol buffer, pb.Planet, so I just hijack that type to store in the data store as well:

class PlanetsProperty(db.Property):
  def get_value_for_datastore(self, model_instance):
    """Takes the list of pb.Planet objects that we have in the model
    converts it to a db.Blob for storing in the data store."""
    planets = super(PlanetsProperty, self).get_value_for_datastore(
                                               model_instance)
    protobuf = pb.Planets()
    protobuf.planets.extend(planets)
    return db.Blob(protobuf.SerializeToString())

  def make_value_from_datastore(self, value):
    """Converts the stored blob back into a list of pb.Planet
    objects."""
    protobuf = pb.Planets()
    protobuf.ParseFromString(value)
    planets = []
    for planet in protobuf.planets:
      planets.append(planet)
    return planets

  def validate(self, value):
    """Ensures the given value is valid for this property."""
    value = super(PlanetsProperty, self).validate(value)
    if value is None:
      return value
    if isinstance(value, list):
      for elem in value:
        if not isinstance(elem, pb.Planet):
          raise db.BadValueError(
              "All elements of %s must be of type pb.Planet" % self.name)
      return value

    raise db.BadValueError("Property %s must be a list of pb.Planet"
                           % self.name)

App Engine only requires the get_value_for_datastore and make_value_from_datastore, but it's also a good idea to implement validate as well, so that we can make sure you're populating the property with the correct types, and so that you can get a nice exception if you use the wrong type, rather than some cryptic serialization error otherwise.

As you can see, the code will convert the list of pb.Planet objects into a serialized protocol buffer value, and then return that as a db.Blob (which is a subclass of str that doesn't try to do any encoding magic). On the way back out from the data store, we deserialize the protocol buffer back into a list of pb.Planets again.

One advantage of using protocol buffers here is that it lets us modify the planet type in the future without having to re-store all of the data again (for example, if we add a new optional field, it'll be deserialized as None)

Using this new property type is just like using any other model property in App Engine:

class Star(db.Model):
  sector = db.ReferenceProperty(Sector)
  name = db.StringProperty()
  size = db.IntegerProperty()
  . . .
  planets = PlanetsProperty()


def generateSector(x, y):
  sector = mdl.Sector()
  sector.x = x
  sector.y = y
  . . .
  sector.put()

  for _ in num_stars:
    star = mdl.Star()
    star.sector = sector
    star.name = "Star Name"
    . . .

    for index in num_planets:
      planet = pb.Planet()
      planet.planet_type = #whatever
      . . .

      star.planets.append(planet)

    star.put()

This is just a snippet, of course, but gives you an idea. Using the planets property of the star is quite simple and the changes required throughout my code to support this change was actually surprisingly fairly minor.

Looking Further

There's actually quite a few entities that hang off the star which could potentially benefit from this kind of optimization. For example the "colony" entity is bound to a planet, so why not serialize the colony itself with the planet?

The problem I found with taking this model too far is that it makes it impossible to fetch the entity outside of the container itself. For example, one of the queries we do (for the "Empire Overview" screen I talked about just yesterday) is a "fetch all colonies beloning to this empire".

If the colonies were serialized as part of the planet (which itself is serialized with the star) then I would have to modify that query to "fetch all stars which have a colony beloning to this empire" which then means I would need to store as part of the star the fact that a colony exists for one or more empires.

It's doable, but in the end, I think it would make the code too complicated overall. Just reducing the planets to serialized blobs has significantly reduced the amount of entities in the game, and I think that's good enough for this rather early stage of development. Of course, we could come back and revisit this at a later date.

Screenshots

Just because I hate to have a wall of text and no pictures, here's some random screenshots for you: