Last night, I had a major breakthrough with my thesis work: I was able to a Prisoner’s Dilemma game with an exit option. At first, I decided to just pick a strategy for the opponent and have GP develop an effective strategy to defeat it. The strategy that I chose was C D C D X, repeated 3 more times. So on the 1st, 3rd, 6th, 8th, 11th, 13th, 16th, and 18th moves, the opponent cooperates. On the 5th, 10th, 15th, and 20th, the opponent exits. On the rest, the computer defects.
The matrix itself is set up so that the following payoffs occur:
I cooperate and you defect: 1 point for me.
I cooperate and you cooperate: 7 points for me.
I defect and you defect: 3 points for me.
I defect and you cooperate: 9 points for me.
Either of us exits: 5 points for me.
Using this system, the genetic programming technique quickly developed an almost-perfect strategy, which you will be able to see on the right. The way to interpret this image is to understand that “OwnPrev2” means “If I defected 2 turns ago” and “OppPrev2” means “If my opponent defected 2 turns ago.” D is to defect and X is to exit. Note that there is no cooperation in this case, which makes sense from a game theoretic standpoint: my opponent is not acting upon a dynamic strategy, so I don’t have to take any threats into account. Instead, I just exit where he defects (netting 5 points) and defect where he cooperates (netting 9), and do whatever when he exits (because I’ll get 5 no matter what). This particular strategy ends up being 8 points off of the ideal. The reason is pretty simple:
Programmed strategy chart: C D C D X C D C D X My strategy chart: D D D X D D D D X D Ideal strategy: D X D X ? D X D X ?
So you can see that in the second round, my strategy defects when it should exit, and this is responsible for 2 points. Do that 4 times a game and you’ll end up 8 points shy of perfection. But it did prove to me that my setup is pretty sound, so I decided to take it a step further and try to make the game truly dynamic.
To make the game truly dynamic, I have to have each agent facing off against other agents, in order to test their strategies. The way that I did it was pretty simple: I have access to the entire population, so each individual plays one game against every individual in the population. Unfortunately, this has a problem: each agent is playing itself once, and that might lead toward a pro-cooperation bias. How did this turn out in actuality? Well, the best overall strategy (developed roughly around generation 49 of 50) was very complex and nasty, but after eliminating logical inconsistencies (for example, if your opponent has never defected, it doesn’t make sense to have a branch for if your opponent defected in the previous round), it looks a lot like the chart that you see on your right. In this case, if my agent defected in the previous round, it will exit in this round (perhaps to avoid the wrath of a tit-for-tat player). If it did not, then it checks to see if the opponent ever defected. If the opponent never defected, then it would just cooperate—a very nice start. But if the opponent did defect? Well, then it actually checks again to see if the opponent ever defected, and will cooperate if it did. Huh? In effect, this agent cooperates all of the time, which means that it would be horrible against agents which just defect all of the time. And in fact, even though it was the best one in generation 49 (and had the best overall score, as it secured cooperation in all but 8 situations), it did not fare as well in generation 50 due to the fact that more defecting agents were created.
After doing all of this, I tried to figure out a way to add more statistics in. Right now, I have one file which gives me the best individual in each generation—obviously a very helpful thing. I have another file which gives me each individual in each generation, which is a bit too much to go through (given that I have 100 individuals per generation and 51 generations) and that I would have to analyze each one to figure out the “true logical form” by taking out contradictions. And, just for laughs, I have the worst individual in each generation. The worst individual of generation 50 will exit on every turn, which actually turns out to be a common occurrence for the worst agents near the end of the 50 generations. This implies that the existence of the game is actually Pareto superior to not having the game in most of the later generations, and that is also an important point that I am going to look at. But what I really want is game-by-game details, to figure out what each agent did against every other agent.
And for that, I received a very fortuitous e-mail from Mehdi Khoury, stating that his site has a tutorial up to print out statistics in ECJ. This allows me to print out each game due to the fact that I overwrite the arrays containing the moves before my statistical functions have a chance to observe what’s going on. In addition to adding this functionality, I threw in a check to make sure that the individual is not playing itself, so my new output has lines that look like:
---------------------- GENERATION 0, individual HashCode 229258280 Going up against... Individual HashCode 229258280 SKIPPING--THIS IS THE SAME INDIVIDUAL. ---------------------- GENERATION 0, individual HashCode 229258280 Going up against... Individual HashCode 229251536 Own choices: C D C C C C C C C C C C C C C C C C C C Opponent choices: D C X X X X X X X X X X X X X X X X X X ---------------------- GENERATION 0, individual HashCode 229258280 Going up against... Individual HashCode 25402584 Own choices: C X C C C C C C C C C C C C C C C C C C Opponent choices: C C C C C C C C C C C C C C C C C C C C ----------------------
So now I have a well-defined function which tells me the generation, gives a unique identifier, tells the opponent, and spits out the arrays of information. The only thing that I might want to add is the final result after each game, just to make sure that it adds up correctly (though I’m 99.9% sure that it does). By the way, the size of that text file? Oh, 100 MB…
I would consider this huge progress on my thesis, and considering that it all occurred over roughly 3 days, I’m quite happy with my pace. Yeah, I slacked off during most of May, but now that I’m back into things, life is going pretty swimmigly.
My next steps:
– Do I need to shrink the size of strategies? I don’t know if I want to do this, as I would like potentially very complex strategies to be developed, but the thing is that most of them are 12 or 13 levels deep and a lot of it is contradictory (e.g., if the opponent has never defected, then you should not throw in checks to see if the opponent defected).
– How do I seed the generation 0 members? Right now, they are being selected pseudo-randomly and that is fine and necessary for one portion of my thesis work. But I also want to put in a group of initial strategies and run based off of those, and that requires seeding. I have to figure out how to do this, and once I do, I can do more testing on that front.
– Is the PD game always a Pareto improvement for all agents? As I said, if the worst agent has a payoff which is at least as high as if he just exited every time (i.e., if he did not participate in the game at all), then having the game is a Pareto improvement, even if it is not Pareto optimal and if there is nothing like external enforcement (government, police, the courts, etc.) or reputation. Although I did not really focus on this aspect when talking about my thesis goals, I think it is an important moral consideration.
– How many of these agents are moral? This is a major consideration for me and the point of my thesis. I want to see if moral agents can thrive with an exit option available, and one of the easier ways of seeing this is to check how many moral agents there are in each generation. Ideally, I’d like to see the number start off fairly low but increase as time goes on and end up dominating by the end of the run. This would help to show that morality can be an emergent phenomenon (i.e., not needing an external force to define or enforce), even without reputation. But in order to analyze this appropriately, I need to look at the next question.
– Can I write a program to generate a “true logical form” or at least narrow it down? Right now, for some of the more complex agent strategies, it can take me roughly 5 minutes to figure out exactly what’s going on. In the example that I gave above and to the right, I actually skipped 18 functions because they were logically impossible. Before skipping these functions, there were three instances in which the agent would defect, so it would not have been considered a “moral” agent. After eliminating logical contradictions, though, it was no longer capable of defection, so I would put it in the ranks of moral agents. I liken it to this: suppose that I say that if and only if you are unmarried, then I will rape your wife. On the face, it is immoral: I’m saying that there are cases in which I will rape your wife. But when you look at the condition, it is a logical impossibility (because having a wife implies being married), so the circumstance could never occur. I am willing to consider this moral behavior, just because the immoral behavior branch will and can never be achieved. At any rate, I want to see if I can develop a program that will take a big tree and prune it down some, taking advantage of two circumstances:
- My trees are set up in such a way that a program could logically traverse them.
- If I look at it from a level-first approach, I can potentially eliminate vast swaths of logic easily. As I said, in my strategy above and on the right, I was able to eliminate a lot. In one case, I had OppEverDef, and if true, it checks againt to see if OppEverDef. In this case, the false branch had 14 child functions, none of which I had to care at all about.
I’m going to try to work on some of these, probably on Wednesday. I’m excited about the level of progress that I’ve made, and if I can get a tree pruner program set up, then it will make things a lot easier. I have the main program complete, so now it’s all about developing analysis programs and doing some robustness testing.