I mentioned a while back that I wasn't all that happy with my random name generator. So I've been working on it a little, trying to tweak the algorithm to generate names that are "good". I define "good" as:
- Easily pronounceable – unpronounceable names are hard to remember, you need to know where you have your big armada of ships, and you're not going to be able to do that if you can't even say the name of the star.
- Interesting – this goes without saying, really
- Allow for a large corpus – it's no good if every second star is called "Sol" or whatever
The algorithm I had been using before was one that I'd just taken from somewhere on the internet. It wasn't that great, because the input files were really hard to generate and maintain. The old generator would take files that looked like this:
-a
-au +c
-bi
-br +v
bu
nul +v
au +c -c
+tor
(Obviously, this is just a sample of the file.) Basically, lines that start with "-" represent the beginning of words. Lines that start with "+" represent the end of words. A "+c" means "can be followed only by a consonant", "-c" means "can only be preceeded by a consonant" (similarly, "v" for "vowel").
Now, this is actually quite powerful. To generate a name, you just randomly choose a start syllable, then start filling in the syllables from the rest of the file, based on whether they start with a consonant or vowel, etc. There are three main problems with this:
- Coming up with the rules and tweaking them is really hard
- There's no real way, other than duplicating rules, of saying "this particular combination is really common" or "this particular combination is really rare", and
- Coming up with the rules and tweaking them is really, really hard
So I decided to scrap that and think of something myself.
A simpler solution
I figured an easier algorithm would just be to take a whole big corpus of existing names, and mix them up. If the size of the corpus is big enough, we should be able to generate names that follow the same basic rules of the corpus, without having to manually tweak them. And we'd get the frequency of combinations about right, too.
The code to load up a corpus file is below:
def _parseCorpus(file_name):
"""This class parses a corpus file.
The corpus file is really just a list of words in a
certain "vocabulary" (e.g. Viking names, English names,
etc). We parse the words from that file, and for each
pair of adjacent letters we determine which letters are
most comment next to each other.
Returns:
A mapping of letter-pairs to a list of "subsequent"
letters and the number of times we saw that letter
after the letter pair.
"""
letter_frequencies = {}
def addLetter(last_letters, letter):
if last_letters not in letter_frequencies:
frequency = {}
letter_frequencies[last_letters] = frequency
else:
frequency = letter_frequencies[last_letters]
if letter not in frequency:
frequency[letter] = 1
else:
frequency[letter] += 1
with open(file_name, 'r') as inf:
for line in inf:
for word in line.split():
last_letters = ' '
for letter in word:
addLetter(last_letters, letter)
last_letters = last_letters[-1]+letter
addLetter(word[-2:], ' ')
return letter_frequencies
This code will take a file which contains a bunch of words, and then generate a "corpus". The corpus consists of a mapping of letter pairs to a list of subsequent letters and the number of times we saw that subsequent letter in the corpus.
For example, if you have the word "Dûrion" from my "Elvish" corpus, the letter pairs and subsequent letters are:
' ' -> 'D'
' D' -> 'û'
'Dû' -> 'r'
'ûr' -> 'i'
'ri' -> 'o'
'io' -> 'n'
'on' -> ' '
We start by mapping " " (two spaces) to the first letter in a word. Then " D" (a space followed by a letter) to the second letter in the word. Continuing on so that "ri" is followed by an "o" and so on.
Actually using this to generate words can be done as follows:
import random
def _generateName(corpus):
"""Generates a single name from the given vocabulary data."""
def getLetter(last_letters):
if last_letters not in corpus:
raise ValueError(last_letters)
frequencies = corpus[last_letters]
max_frequency = 0
for _,count in frequencies.iteritems():
max_frequency += count
index = random.randint(0, max_frequency)
for letter,count in frequencies.iteritems():
index -= count
if index <= 0:
return letter
return ' '
word = ' '+getLetter(' ')
while word[-1] != ' ' and len(word) < 10:
word += getLetter(word[-2:])
return word.strip()
We pass a pair of letters, representing the last two letters generated, to getLetter
, which looks up the corpus for that letter pair, which will give us the list of subsequent letters and their frequencies. We then generate a random number and choose the next letter from that list.
To get the ball rolling, we need to pass in " " (two spaces). We also stop when getLetter
returns a space (which represents an "end-of-word" in our corpus).
I found I had to put a maximum word length, otherwise you end up with mega-names like "Thamorthamollassëa" which, while still kind of pronounceable, is way too long to be memorable.
Corpus
I had a look around the internet for lists of names that I could use as my corpus. I found a list of "Nordic" names, which lets me generate names like:
Hun Jonarraf Jora Keti Egi Sigrir Erid Hjora Bikki
Fenjar Thorhall Atlif Skulif Tjorstric Sverda
A list of "Elvish" names let me generate names like this:
Verlassë Adan Maithredhi Mair Manwë Caldon Tawartion Thaldorcon
Aindion Morcionwë Alya Maidhion Ainion Glassien Urúvion
And so on. I think these names are generally much better than what I had before.
So now, when generating star names, I choose a random corpus (Elvish, or Nordic, or Latin, etc), then a random name from that corpus. That gives a nice mix of naming styles for the stars, and hopefully makes the world a little more usable.
Customizing your star names
While better than before, I don't think it's really possible for a randomly-generated name to have quite the same impact as a human-generated one. I'm considering allowing players to rename the stars they colonize to be whatever they like. I think that would give players a much greater sense of "ownership" if they choose the names of their stars.
Maybe there just needs to be some simple rules. Like you can't rename a star more than once per x days, you can't rename a star until you've had a colony there for y days without any other players colonizing, etc. I'd also like to be able to stop horrible names like "-}[GJH]{- Poo" or something that'll just throw you right out of the immersion.