Required theoretical background
Prefix Notation and S-Expressions
The most common way of writing down a function with two arguments, for example addition, is the infix notation. That is, the two arguments are connected with the operation symbol between them.
A + B
A different method is the so-called prefix (or polish) notation. Here the operation symbol is written down first, followed by its required arguments.
+ A B
While this may be a bit more difficult or just unusual for human eyes, it opens some advantages for computational uses.
The computer language LISP uses so-called symbolic expressions (or S-expressions) composed in prefix notation. Then a simple S-expression could be
where operator is the name of a function (here with one argument). Argument can be either an atom (i.e. constant or variable) or either another symbolic expression (e.g. S-expressions can be nested).
(operator argument (operator argument) (operator argument))
In LISP, programs and data have the same structure providing an easy means of manipulation and evaluation. This is why LISP is popularly used for implementing genetic programming.
Program trees and Subtrees
A tree in this context is a hierarchical structure consisting of nodes and leaves. The simplest tree consists only of a single leaf. Nodes are inner points of trees and may have more nodes or just leaves connected to them. The topmost point of a tree is called the root of this tree.
A program tree is describes a computer program. Functions are written down as nodes, their arguments as leaves. There is a close relationship between these program trees and S-expression; in fact these trees are just another way of writing down the expressions.
Figure 1 : A program tree representing the expression (+ 1 2)
Figure 2 : A program tree representing the expression (* 3 (+ 1 2))
A subtree is the part of a tree that is under an inner node of this tree. If this tree is cut out from its parent, the inner node becomes a root node and the subtree is a valid tree of its own. Subtrees are equivalent to inner S-expressions in other S-expressions.
A method to measure the complexity of these trees is the depth, meaning the maximum number of parent nodes a leave might have, or the complete size, meaning the total sum of leaves and nodes a tree contains.
Functions and Terminals
While the term function is self-explanatory, the meaning of terminal is not so obvious. While functions will be the nodes of the trees (or the operators in the S-expressions) and can have other functions as their arguments, the leaves will be formed by terminals, that is symbols that may not be further expanded.
In this work, a variety of functions will be used. Simple arithmetic functions like addition or multiplication, more complex ones like sinus or highly specific functions like IF-FOOD-AHEAD (a conditional branch). Terminals can be variables, constants or specific actions that are to be performed.
The process of selecting the functions and terminals that are needed or useful for finding a solution to a given problem is one of the key steps in the preparation of genetic programming.
Evaluation of these structures is straightforward. Beginning at the root node, the values of all sub-expressions (or subtrees) are computed, descending the tree down to the leaves. Their return values will serve as arguments for the upper node.
(+ 3 ( - 4 2)) = (+ 3 2) = 5 (- (* 2 3) (+ 2 1)) = (- 6 (+ 2 1)) = (-6 3) = 3
If the function represents a conditional branch, it is imperative that just the designated argument is evaluated; the other(s) must be ignored.
The most difficult and most important concept of genetic programming is the fitness function. The fitness function determines how well a program is able to solve the problem. It varies greatly from one type of program to the next. For example, if one were to create a genetic program to set the time of a clock, the fitness function would simply be the amount of time that the clock is wrong. Unfortunately, few problems have such an easy fitness function; most cases require a slight modification of the problem in order to find the fitness.