Evolutionary Algorithms and Genetic Programming [3]
Required theoretical background
Prefix Notation and S-Expressions
The most common way of writing down a function with two arguments, for example addition, is the infix notation. That is, the two arguments are connected with the operation symbol between them.
A + B
A different method is the so-called prefix (or polish) notation. Here the operation symbol is written down first, followed by its required arguments.
+ A B
While this may be a bit more difficult or just unusual for human eyes, it opens some advantages for computational uses.
The computer language LISP uses so-called symbolic expressions (or S-expressions) composed in prefix notation. Then a simple S-expression could be
(operator argument)
where operator is the name of a function (here with one argument). Argument can be either an atom (i.e. constant or variable) or either another symbolic expression (e.g. S-expressions can be nested).
(operator argument (operator argument) (operator argument))
In LISP, programs and data have the same structure providing an easy means of manipulation and evaluation. This is why LISP is popularly used for implementing genetic programming.
Program trees and Subtrees
A tree in this context is a hierarchical structure consisting of nodes and leaves. The simplest tree consists only of a single leaf. Nodes are inner points of trees and may have more nodes or just leaves connected to them. The topmost point of a tree is called the root of this tree.

A program tree is describes a computer program. Functions are written down as nodes, their arguments as leaves. There is a close relationship between these program trees and S-expression; in fact these trees are just another way of writing down the expressions.

Figure 1 : A program tree representing the expression (+ 1 2)

Figure 2 : A program tree representing the expression (* 3 (+ 1 2))
A subtree is the part of a tree that is under an inner node of this tree. If this tree is cut out from its parent, the inner node becomes a root node and the subtree is a valid tree of its own. Subtrees are equivalent to inner S-expressions in other S-expressions.
is a valid subtree of 
A method to measure the complexity of these trees is the depth, meaning the maximum number of parent nodes a leave might have, or the complete size, meaning the total sum of leaves and nodes a tree contains.
Functions and Terminals
While the term function is self-explanatory, the meaning of terminal is not so obvious. While functions will be the nodes of the trees (or the operators in the S-expressions) and can have other functions as their arguments, the leaves will be formed by terminals, that is symbols that may not be further expanded.
In this work, a variety of functions will be used. Simple arithmetic functions like addition or multiplication, more complex ones like sinus or highly specific functions like IF-FOOD-AHEAD (a conditional branch). Terminals can be variables, constants or specific actions that are to be performed.
The process of selecting the functions and terminals that are needed or useful for finding a solution to a given problem is one of the key steps in the preparation of genetic programming.
Evaluation
Evaluation of these structures is straightforward. Beginning at the root node, the values of all sub-expressions (or subtrees) are computed, descending the tree down to the leaves. Their return values will serve as arguments for the upper node.
(+ 3 ( - 4 2)) = (+ 3 2) = 5 (- (* 2 3) (+ 2 1)) = (- 6 (+ 2 1)) = (-6 3) = 3
If the function represents a conditional branch, it is imperative that just the designated argument is evaluated; the other(s) must be ignored.
Fitness Function
The most difficult and most important concept of genetic programming is the fitness function. The fitness function determines how well a program is able to solve the problem. It varies greatly from one type of program to the next. For example, if one were to create a genetic program to set the time of a clock, the fitness function would simply be the amount of time that the clock is wrong. Unfortunately, few problems have such an easy fitness function; most cases require a slight modification of the problem in order to find the fitness.
Evolutionary Algorithms and Genetic Programming [2]
Genetic Algorithm
Genetic algorithms are a variation of evolutionary algorithms. Genetic algorithms model of machine learning that derives its behavior from a metaphor of the processes of evolution in nature. This is done by the creation within a machine of a population of individuals represented by chromosomes, in essence a set of character strings that are analogous to the base-4 chromosomes that we see in our own DNA.
The individuals in the population then go through a process of evolution. The processes of nature boil down to different individuals competing for resources in the environment. Some are better than others. Those that are better are more likely to survive and propagate their genetic material.
Genetic algorithms are used for a number of different application areas. An example of this would be multidimensional optimization problems in which the character string of the chromosome can be used to encode the values for the different parameters being optimized. In practice, therefore, we can implement this genetic model of computation by having arrays of bits or characters to represent the chromosomes. Simple bit manipulation operations allow the implementation of crossover, mutation and other operations.
Although a substantial amount of research has been performed on variable-length strings and other structures, the majority of work with genetic algorithms is focused on fixed-length character strings. We should focus on both this aspect of fixed-lengthness and the need to encode the representation of the solution being sought as a character string, since these are crucial aspects that distinguish genetic programming, which does not have a fixed length representation and there is typically no encoding of the problem.
When the genetic algorithm is implemented it is usually done in a manner that involves the following cycle:
1. Evaluate the fitness of all of the individuals in the population.
2. Create a new population by performing operations such as crossover, fitness-proportionate reproduction and mutation on the individuals whose fitness has just been measured.
3. Discard the old population and iterate using the new population.
One iteration of this loop is referred to as a generation. There is no theoretical reason for this as an implementation model. Indeed, we do not see this punctuated behavior in populations in nature as a whole, but it is a convenient implementation model. The first generation (generation 0) of this process operates on a population of randomly generated individuals. From there on, the genetic operations, in concert with the fitness measure, operate to improve the population.
Genetic Programming
Genetic programming is a branch of genetic algorithms. The main difference between genetic programming and genetic algorithms is the representation of the solution – genetic algorithms create a string of numbers that represent the solution while genetic programming creates computer programs as the solution.
The objects in genetic programming that constitute the population are not fixed-length character strings that encode possible solutions to the problem at hand, they are programs that, when executed, “are” the candidate solutions to the problem. These programs are expressed in genetic programming as program trees, rather than as lines of code. Thus, for example, the simple program “a + b * c” would be represented as:

Or, to be precise, as suitable data structures linked together to achieve this effect. Because this is a very simple thing to do in the programming language LISP, many GP programmers tend to use LISP. However, this is simply an implementation detail. There are straightforward methods to implement GP using a non-LISP programming environment. The programs in the population are composed of elements from the function set and the terminal set, which are typically fixed sets of symbols selected to be appropriate to the solution of problems in the domain of interest. In GP the crossover operation is implemented by taking randomly selected subtrees in the individuals (selected according to fitness) and exchanging them. It should be pointed out that GP usually does not use any mutation as a genetic operator.
Further explanation of how genetic programming works is given in the section below.
Evolutionary Algorithms and Genetic Programming [1]
Introduction
Evolution in general means change and adaptation to the changes by a population and not individuals, changes that are passed on to the next generations. The idea of applying the biological principle of natural evolution to artificial systems, introduced more than three decades ago, has seen impressive growth in the past few years. Usually grouped under the term evolutionary algorithms or evolutionary computation, we find the domains of genetic algorithms, evolution strategies, evolutionary programming, and genetic programming.
Evolutionary algorithms (EAs) are ubiquitous nowadays, having been successfully applied to numerous problems from different domains including optimization, automatic programming, machine learning, economics, operations research, ecology, studies of evolution and learning, and social systems. In this article, we consider genetic algorithms in general and genetic programming to be specific.
Evolutionary Algorithms
Evolutionary algorithm is an overall term used to describe computer-based problem solving systems that use computational models of some of the known mechanisms of evolution as key elements in their design and implementation.
A variety of evolutionary algorithms have been proposed. The major ones are: genetic algorithms, evolutionary programming, evolution strategies, classifier systems and genetic programming. They all share a common conceptual base of simulating the evolution of individual structures via processes of selection, mutation, and reproduction. The topic of discussion in this article concerns genetic programming, but to understand genetic programming it is necessary to first understand the basic concepts of evolutionary algorithms.
EAs maintain a population of structures that evolve according to rules of selection and other operators that are referred to as genetic operators, such as recombination and mutation. Each individual in the population is measured for its fitness in the environment. Reproduction focuses on high fitness individuals and recombination and mutation agitate those individuals, providing general heuristics for exploration.
To understand EAs, it is necessary to have some appreciation of the biological processes on which they are based. In nature, the encoding for genetic information (genome) admits asexual reproduction. Asexual reproduction typically results in offspring that are genetically identical to the parent. (large numbers of organisms reproduce asexually; this includes most bacteria which some biologists hold to be the most successful species known.) Sexual reproduction allows some shuffling of chromosomes, producing offspring that contain a combination of information from each parent.
At the molecular level what occurs (wild oversimplification alert!) is that a pair of almost identical chromosomes bump into one another, exchange chunks of genetic information and drift apart. This is the recombination operation, which is often referred to as crossover because of the way that biologists have observed strands of chromosomes crossing over during the exchange.
Recombination happens in an environment where the selection of who gets to mate is largely a function of the fitness of the individual, i.e. how good the individual is at competing in its environment. Some “luck” (random effect) is usually involved too.
Some EAs use a simple function of the fitness measure to select individuals (probabilistically) to undergo genetic operations such as crossover or asexual reproduction (the propagation of genetic material unaltered). This is fitness-proportionate selection. Other implementations use a model in which certain randomly selected individuals in a subgroup compete and the fittest is selected. This is called tournament selection and is the form of selection we see in nature when stags rut to vie for the privilege of mating with a herd of hinds. Much EA research has assumed that the two processes that most contribute to evolution are crossover and fitness based selection/reproduction.
Evolution, by definition, absolutely requires diversity in order to work. In nature, an important source of diversity is mutation. In an EA, a large amount of diversity is usually introduced at the start of the algorithm, by randomizing the genes in the population. The importance of mutation, which introduces further diversity while the algorithm is running, therefore continues to be a matter of debate. Some refer to it as a background operator, simply replacing some of the original diversity that may have been lost, while others view it as playing the dominant role in the evolutionary process.
It cannot be stressed too strongly that an evolutionary algorithm (as a simulation of a genetic process) is not a random search for a solution to a problem (highly fit individual). EAs use stochastic processes, but the result is distinctly non-random (better than random).
Algorithm EA is // start with an initial time t := 0; // initialize a usually random population of individuals initpopulation P (t); // evaluate fitness of all initial individuals in population evaluate P (t); // test for termination criterion (time, fitness, etc.) while not done do // increase the time counter t := t + 1; // select sub-population for offspring production P' := selectparents P (t); // recombine the "genes" of selected parents recombine P' (t); // perturb the mated population stochastically mutate P' (t); // evaluate its new fitness evaluate P' (t); // select the survivors from actual fitness P := survive P,P' (t); od end EA.
Using YAML to transport data in ActionWebService
Here’s a handy tip for ActionWebService. Sometimes it could be a bit tedious to create the correct SOAP or XML messages to pass back and forth the webservices. If you’re using a Ruby 3rd party client (or even another Rails app as a client) you might want to try out YAML. It’s not entirely conventional but it works like a breeze and it sure beats the hell out of all that XML tagging.
Here’s the code to zip it up to as a YAML object:
yaml_obj = YAML::dump(your_object)
Even quicker — you can use a Hash and stuff everything inside the Hash. Then at the other end:
original_obj = YAML::load(yaml_obj)
There’s YAML for Java, Python, Perl and even Javascript. Check it out here
Don’t come after me if something doesn’t work though — it’s quite experimental at this moment, but it’s a pretty neat alternative to XML tagging.
1 comment