From Grokking Artificial Intelligence Algorithms by Rishal Hurbans
What you’ll learn in this article:
§ The lifecycle of a genetic algorithm.
§ Designing and developing a genetic algorithm to solve problems.
§ The parameters for configuring a genetic algorithm lifecycle based on different scenarios, problems, and data sets.
Genetic Algorithm – Life Cycle
The genetic algorithm is a specific algorithm in the family of evolutionary algorithms. Each algorithm works on the same premise of evolution but have small “tweaks” in the different parts of the lifecycle to cater for different problems.
Genetic algorithms are used to evaluate large search spaces for a good solution. It’s important to note that a genetic algorithm isn’t guaranteed to find the absolute best solution. It attempts to find the global best while avoiding local best solutions.
The global best is the best possible solution and local bests are solutions which are less optimal. In the diagram below represents the possible best solutions if the solution must be minimized—this means that the smaller the value, the better. If the goal was to maximize a solution, then the larger the value, the better. Optimization algorithms like a genetic algorithm aim to incrementally find local best solutions in search of the global best solution.
Local best vs. global best
Careful attention is needed when configuring the parameters of the algorithm such that it strives for diversity in solutions at the start and gradually gravitates towards iteratively better solutions through each generation to eventually converge on an optimal solution. This means that at the start, potential solutions should vary drastically in individual genetic attributes. Without divergence at the start, the risk of getting stuck in a local best is increased.
Diversity to convergence
The configuration for a genetic algorithm is based on the problem space. Each problem has a unique context and a different domain in which data is represented and solutions are evaluated differently.
The general lifecycle of a genetic algorithm is as follows:
- Creation of a population: This involves creating a random population of potential solutions.
- Measuring fitness of individuals in the population: This involves determining the efficacy of a specific solution. This is accomplished by using a fitness function which scores solutions to determine their worth.
- Selecting parents based on their fitness: This involves selecting a number of pairs of parents who will reproduce offspring.
- Reproducing individuals from parents: This involves creating offspring from their respective parents by mixing genetic information and applying slight mutations to the offspring.
- Populating the next generation: This involves selecting individuals and offspring from the population who will survive to the next generation.
A number of steps are involved in implementing a genetic algorithm. These steps encompass the stages of the algorithm lifecycle.
Genetic algorithm lifecycle
With the Knapsack Problem in mind, how would you use a genetic algorithm to find solutions to the problem? Let’s dive into the process.
Encoding the Solution Space
When using a genetic algorithm, it’s paramount that the encoding step is done correctly for it to work. This requires careful design of the representation of possible states. The state is a data structure with specific rules that represents possible solutions to a problem. Furthermore, a collection of states comprises a population.
Terminology of the data structures representing a population of solutions
The Knapsack Problem has number of items which can be placed into the bag. A simple way to describe a possible solution which contains some items but not others is binary encoding. Binary encoding represents excluded items with 0s and included items with 1s. If for example at gene index 3, the value is 1, that item is marked to be included. The complete binary string will always be the same size – this is the total number of items available for selection. A number of alternative encoding schemes exist.
Binary encoding the Knapsack Problem
Binary Encoding – Represent possible solutions with zeros and ones
Binary encoding represents a gene in terms of 0 or 1, and a chromosome is represented by a string of binary bits. Binary encoding can be used in versatile ways to express the presence of a specific element, or even encoding numeric values as binary numbers. The advantage of binary encoding is that it’s usually more performant due to the use of the primitive type used. This has less demand on working memory and depending on the language used, binary operations are computationally faster. Critical thought must be used to ensure that the encoding makes sense for the respective problem, and represents potential solutions well, or the algorithm may perform poorly.
Binary encoding the larger data set for the Knapsack Problem
Given the Knapsack problem with a data set that consists of twenty-six items of varying weight and value, a binary string can be used to represent the inclusion of each item. This results in a 26-character string where for each index, 0 means that the respective item is excluded, and 1 means that the respective item is included.
Exercise: What is a possible encoding for the following problem?
Because the number of possible words are always the same, and the words are always in the same position, binary encoding can be used to describe which words are included and which are excluded. The chromosome consist of nine genes, each gene indicating a word in the phrase.
That’s all for this article.