The Basics Of Genetics

This is part of a series entitled Genetics In Action.

In today’s post, I want to cover some extremely basic genetic concepts.  We use these concepts in genetic algorithms and genetic programming, and understanding the biological explanations will help us apply the metaphor to software development.

Evolutionary Biology In 400 Words

We have a population of self-contained entities called organisms.  Each organism is entirely independent from other organisms, in the sense that one organism in our population does not depend upon another organism for survival.  In evolutionary algorithms, organisms are our candidate solutions to problems:  we have a number of solutions in our population, and each solution is independent of the other solutions (although they will interact in complex ways, as we’ll talk about later).

Each organism has at least one chromosome.  The purpose of a chromosome is to carry genes.  In evolutionary algorithms, we often simplify the problem by saying that an organism has one chromosome, and it can become easy to conflate organisms and chromosomes for that reason.

Digging deeper, each gene has a number of alternate forms, called alleles.  Combining genes and alleles gives us DNA sequences called genotypes.  A genotype is a possible genetic structure.  Suppose that we have 32 genes, each of which can have 2 alleles.  This would give us 2^32 possible combinations, or a total of 4,294,967,296 possible genotypes.

Genotypes determine phenotypes.  Phenotypes are observable physical characteristics.  Classic examples of phenotypes include eye color, hair color, height, size, and beak structure.  But it’s important to understand that there is no 1:1 correspondence between a genotype and a phenotype; some number of genotypes can end up generating the same phenotype.  Phenotypes depend upon a certain combination of alleles but not the entire genotype.

Also, this goes the other way as well:  a phenotype may only appear when a particular combination of alleles hold a particular value.  Close may count in horseshoes and hand grenades, but it doesn’t count with genetics:  if there are seven alleles which combine to cause a phenotype and only six are set, you will not see the physical characteristics.  Think of this like a password:  unlike in TV and movies, if you have 18 of the 19 characters in a password correct, you shouldn’t get any indicator that you’re ever-so-close.  If we’re using an evolutionary process to find that password (which probably isn’t a good use for an evolutionary algorithm), we won’t get positive feedback until we get it absolutely correct.

The last aspect of genetics that I want to cover today is an environmental niche.  A niche is a set of features which certain phenotypes can exploit.  Think about life on the Arctic:  features like thick fur and layers of fat can help animals survive in these frigid climes.

Applying These To Evolutionary Algorithms

To apply biological concepts to evolutionary algorithms, I’m going to flop around a little bit and start at the end.

First, think about an environmental niche.  The programming concept we’re describing here is a fitness function.  An example of a fitness function may be a set of (x, y) coordinates, and our goal is to find something which maps as closely as possible to those coordinates.


An example of a fitness function

To solve the fitness function, we want to build up a population of organisms.  Our organisms are actually candidate solutions.  In the example above, these would be mathematical functions which fit the fitness function to a greater or lesser extent.

For the sake of simplicity, each organism will have one chromosome.  In its simplest form, a chromosome is an array which contains a set of elements.  Those elements are akin to genes in our biological nomenclature.  Each element has a set of alleles, that is, possible values.  Therefore, the genotype is the particular combination of elements in the chromosomal array.

A trivial example of a chromosome is a binary array:  [ 0, 0, 1, 0, 1, 1, 0, 1 ] would be a sample chromosome.  Here we have 8 genes, each with 2 alleles, for a total of 256 genotypes.  We apply this to the fitness function and get back a score, and that score tells us how fit the organism is.

In the next post, we’ll go deeper into this metaphor, diving into genetic algorithms.  I’ll explain more about fitness functions and explain what makes these evolutionary algorithms rather than biological algorithms.