2
\$\begingroup\$

For past few months I was trying to understand genetic algorithms (GA) and most of the materials availble in the web was not always easy for me. Then I came across this article written by Ahmed GadGenetic Algorithm Implementation in Python which implemented GA with numpy. Using this as a guiding tool I wrote my first GA in python with numpy. All of the codes were written by me exceptcal_pop_fitness.

The problem GA need to solve was to find parameters(a,b) in an equation of the formaty = a*x1+b*x2 wherex1,x2 andy are give as a numpy array. The equation I chose to solve isy = 2*x1+3*x2. Because we have two parameters to solve I chose two genes per chromosome. In all GA's we have to choose a fitness function and I chosemean squared error (MSE) as the fitness function for selecting best parents. MSE was chosen because we already have the real outputy. The lesser the MSE better the parents we selected. This also the main difference between mine and Ahmed's GA where he used a maximisation fitness function I used a minimisation function. From the selected parents we generate offsprings by crossover and mutations. All offsprings go through crossovers but only a few offsprings have mutations.

import numpy as npnp.set_printoptions(formatter={'float': '{: 0.3f}'.format})np.random.seed(1)def generate_data(x_range):    # Formula='2*x1+3*x2'    x_range = range(-x_range,x_range)    x = np.vstack((np.array(x_range),np.array(x_range)*2)).T    y = [2*i[0]+3*i[1] for i in x]    return x,np.array(y)def cal_pop_fitness(equation_inputs, pop):     fitness = np.sum(pop*equation_inputs, axis=1)     return fitnessdef select_best_parents(total_population,top_parents):    arr = []    for i in total_population:        pred_y = cal_pop_fitness(i,X)        mse = (np.square(y-pred_y)).mean(axis=None) # Mean squared error        # Append the mse with chromose values        row = np.append(i,mse).tolist()        arr.append(row)    arr = np.array(arr)    # Sorting the chromosomes respect to mse    # Lower the mse better the individuals    arr = arr[arr[:,2].argsort()]    error = arr[:,2]    arr = arr[:,0:2] # removing mse column    return arr[0:top_parents,:],error[0:top_parents]def crossover(sub_population):    children = []    for i in range(population_size):        # Selecting random two parents        parent1 = np.random.randint(0,sub_population.shape[0])        parent2 = np.random.randint(0,sub_population.shape[0])        # A child is created from parent1's first gene and parent2's second gene        child = [sub_population[parent1][0],sub_population[parent2][1]]        children.append(child)    return np.array(children)def mutation(population):    for i,row in enumerate(population):        if np.random.rand() < mutation_rate:            # Random mutations             population[i][np.random.randint(genes)] = np.random.uniform(low = -4.0, high = 4.0)    return populationif __name__ == "__main__":    # Generate input ouptut data    X,y = generate_data(150)    genes = X.shape[1] # Total genes in a chromosome    population_size = 50 # Number of populations    top_parents = int(0.25*population_size) # Select top parents from the total populations for mating    mutation_rate = 0.1 # Mutation rate    generations = 1000 # number of generations    # Step1 : Population    # Total population    total_population = np.random.uniform(low = -4.0, high = 4.0, size = (population_size,genes))    for i in range(generations):        # Step2 : Fitness calculation        # Step3 : Mating pool        # Step4 : Parents selection        # Choosing best parents with their corresponding mse        sub_population,mse = select_best_parents(total_population,top_parents)        # Step5 : Mating        # Crossover        new_population = crossover(sub_population)        # Mutation        new_population = mutation(new_population)        print("Best Parameters: ",np.round(sub_population[0],3),"Best MSE: ", round(mse[0],3))        # Next population        total_population = new_population    # Real    x_range=range(-10,10)    x1 = np.array(x_range)    x2 = np.array(x_range)*2    formula='2*x1+3*x2'    y1=eval(formula)    # Predicted by Genetic algorithm    a = 1.943    b = 3.029    formula = 'a*x1+b*x2'    y2=eval(formula)    print("\nMSE between  Real and predicted Forumulas : ",(np.square(y1-y2)).mean(axis=None))
askedSep 2, 2019 at 10:59
Eka's user avatar
\$\endgroup\$
4
  • \$\begingroup\$What is it that you expect to get from the community? General feedback on the code? Style? Clarity? Performance?\$\endgroup\$CommentedSep 2, 2019 at 11:25
  • \$\begingroup\$Code review. eventhough I took the ideas from the mentioned article I wrote the code myself except forcal_pop_fitness.\$\endgroup\$CommentedSep 2, 2019 at 11:28
  • \$\begingroup\$My question was more on which aspects of the code should be in the focus of the review. If you don't care which aspects get attention it's okay to say so.\$\endgroup\$CommentedSep 2, 2019 at 11:34
  • \$\begingroup\$Actuall the implementation of GA itself. Especially the creation of offspring using crossover and mutation. Whether my implementation is correct or not?. The second reason for posting the code here is to share my implementation for the newbie's in GA.\$\endgroup\$CommentedSep 2, 2019 at 11:37

1 Answer1

2
\$\begingroup\$

From my understanding of the subject your implementation seem correct but there are numerous things you could fix or improve there.

Specific remarks

  1. Use actually predicted values for printing i.e.
    # Predicted by Genetic algorithm    a, b = sub_population[0]
  1. Your initialisation ofX is dangerous and prevents the algorithm to converge on some values sinceX[0] = 2*X[1], while they are supposed to be independant variables. Use random values instead.

  2. Don't useX andy from global context inselect_best_parents you could have ugly surprises if these are used elsewhere in the program. Add them to function parameters.

  3. Your mutation function uses magic numbers, you could make[-4, 4] interval a parameter of your GA to make it more generic.

  4. You could make a function of the main'sfor to segregate well what is part of algorithm and what is parameters.

  5. You could extract your sorting key (MSE) as a separate function so it's easier to replace.

General remarks

  1. If you want to validate your approach you need more than a single test value. So for example after you make a and b (3 and 2) parameters, you can use a loop or a variety of examples to prove it converges toward these values.

  2. I find odd and damaging scientific programmers often pass on object oriented design. Using classes could be handy to pass the variables from core function to core function using object context rather than extending parameters of every function as needed, so it would to propose a catalog of different implementations of GA mutation, for instance. Consider migrating to an object oriented design if these are later goals

  3. It could be beneficial to separate the usage example from the core GA functions in two distinct modules and files. They are independent parts that can be improved and extended separatedly.

  4. I believe you can support more than two dimensions using a bit of tidying and genericizing in generate_data, crossover, and select_best_parents.

  5. Improve style and formatting a bit, trying to follow PEP8. Keyword don't need spaces, two spaces between function definitions, use docstrings rather than comments etc.

answeredSep 3, 2019 at 1:52
Diane M's user avatar
\$\endgroup\$

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.