*This is part of a series entitled Genetics In Action.*

In today’s post, we are going to look at the GA package in R, which implements genetic algorithm. This package allows us to build different types of genetic algorithms in just a few lines of code. Let’s start with an easy example.

### Maximizing A Mathematical Function

Our first example involves finding the global maximum for a mathematical function. Here’s the setup code:

if(!require(GA)) { install.packages("GA", repos = "http://cran.us.r-project.org") library(GA) } f <- function(x) { (x^2+x)*cos(x) } min <- -10; max <- 10

The first bit loads the GA library if it does not exist. Then, we generate a fitness function called f. This fitness function tells us how well we’re doing for any value of x. We also have a couple of constraints, specifically that x must be in the range [ -10, 10 ].

Let’s view the curve that this fitness function generates by using the curve function like so : `curve(f, min, max, n = 1000)`

.

Our goal is to find the value x which maximizes f(x) over the relevant range. Eyeballing the problem, we can see that the maximum value is somewhere around 6.5, but note that there’s a local maximum somewhere around -6.5. Many hill-climbing algorithms will get stuck around -6.5 if you start out near that point, so we want to make sure that our genetic algorithm doesn’t get stuck there.

To build the genetic algorithm and run its test, call the ga function:

GA <- ga(type = "real-valued", fitness = f, min = min, max = max, monitor = FALSE) summary(GA)

We selected a type of real-valued, which means that our genes are numeric values with decimals. Our fitness function is the function f, which we defined above, and the constraint is that we must keep our genes between min and max. After the algorithm runs, we get back a summary:

+-----------------------------------+ | Genetic Algorithm | +-----------------------------------+ GA settings: Type = real-valued Population size = 50 Number of generations = 100 Elitism = 2 Crossover probability = 0.8 Mutation probability = 0.1 Search domain = x1 Min -10 Max 10 GA results: Iterations = 100 Fitness function value = 47.70562 Solution = x1 [1,] 6.560509

This tells us all about the genetic algorithm. We can see that the defaults are 50 population members, 100 generations, a crossover probability of 80%, and a mutation probability of 10%. We can see that we went through 100 iterations and found that the point 6.56 has a fitness of 47.70562, which is the highest possible score.

If you want to visualize what the algorithm did over time, calling `plot(GA)`

will generate a plot for you:

There are three things measured here: the best result, the mean result, and the median result. In this case, we ended up hitting the best result in generation 1, and that best result stuck around until the end of the simulation. Normally, we don’t hit the global maximum right from the beginning, but we got lucky this time. Notice that the median population fitness score eventually converges on the global maximum, but the mean fitness score never really makes it that high. This is actually a good thing for us: it shows that at 100 generations, there is still enough genetic diversity that we are building population members which search across the valid space just in case our current best isn’t really the best. In a non-trivial problem, it could take dozens or hundreds of generations to converge, and if you run out of viable alternatives too early, you’ll never get past the local minimum. But once the simulation is over, we really only care about the single most fit solution.

Let’s plot our best solution now:

curve(f, min, max, n = 1000) points(GA@solution, GA@fitnessValue, col = 2, pch = 19)

We were able to find the global maximum, just as we wanted.

### Conclusion

In this post, we were able to use a genetic algorithm to solve a single-variable maximization problem. It’s not the most exciting problem to solve, but it does give us an introduction into the mechanics of the GA package. In the next post, we will look at another classic optimization problem.