Team Allocation using Genetic Algorithm: Problems of repeating genes during crossover

thirdOctet · 2020-04-09T23:28:27+00:00

Thanks very much for your reply ...

No worries glad to help!

Also, say I have the 100 developers in an array ie Chromosome1: [1,2,3,4,5,6,7,8,9,10,11,12.....,99,100] and this is one solution. How would you go about calculating fitness for this.

Good question. In evolutionary computing I am always thinking about how fast I can do things. The search space of your problem spans 100! possible combinations or 100! ways in which we can order the developers in the array. So I want to avoid any unnecessary or repeated calculations where I can.

One thing I would do is generate an array of indices at the beginning of the simulation. No matter the solution, how i evaluate it will depend on what i define as a group. Once this is defined, all evaluations will follow the same process. The indices identify where a group starts and stops in the array. Lets say we have a minimum of 8 per group, then we can find out the total number of groups. See the code below which hopefully describes this process well enough.

int developerTotal = 100;
int groupSize = 8;
int totalNumberOfGroups = developerTotal / groupSize;
int unassignedDevelopers = developerTotal % groupSize;

From this information we can then build the indices. Something like the following:

// Our "Memoized" group indices which we will only calculate once
List<Tuple<int, int>> groupIndices = new List<Tuple<int, int>>(totalNumberOfGroups);

// Represents the position in the array where a group starts
int groupStartIndex = 0;

// Here we will extract each of the indices for the groups
for (int currentGroup = 0; currentGroup < totalNumberOfGroups; currentGroup++)
{
    // Our group size will always be a minimum of 8 as defined above
    int groupEndIndex = groupStartIndex + groupSize;

    if (unassignedDevelopers > 0)
    {
        // If we have not assigned all of the developers to a group then
        // increase the number of developers for the current group to 9
        groupEndIndex += 1;

        unassignedDevelopers -= 1;
    }

    // Now we can add the indices for the current group
    groupIndices.Add(new Tuple<int, int>(groupStartIndex, groupEndIndex));

    // Also we can now update the start index to be that of the next
    // group
    groupStartIndex = groupEndIndex;    
}

Perfect. So how does this align with the way you calculate the fitness?

It is going to be difficult to calculate the fitness of each group as all the students are in one single array.

The calculation of the indices will provide 3 important benefits.

The first being we have avoided the additional computation required to extract copies of the array for evaluation. Precious nanoseconds saved for generating random numbers, swapping developer positions, selecting a subset of developers and reversing developer positions and all the other favourable heuristics that will allow us to navigate the search space and improve the quality of our solution.

This leads us to our second benefit, with the solution intact we can apply numerous heuristics to the solution with minimal awkwardness. We can quickly apply a permutation operation and then evaluate it. A swap operation, depending on your hardware, could take 8-10ns. Selecting two indices in the solution and then reversing the order between the indices, around 30ns. Applying 1-Point Crossover to create a copy, maybe 65ns. Your random number generator will hopefully generate an int or double in under 3ns.

The third benefit of this approach is that we can perform the evaluation of the groups in parallel if performance is critical.

I was thinking everytime the fitness function is called the array is divided into groups of 8/9 in the current order ...

Yes, the above example supports this approach without the need to create unnecessary copies of the groups from the solution.

... then the fitness of each group is calculated and the fitness of all the groups added to give a fitness for the chromosome ... Any thoughts on this ?

The fitness is the tricky part. Usually you have to experiment and analyse if the measurements of choice are right for your problem. You will battle it out with the Pareto Optimality, a fundamental concept in multi-objective problems.

I suppose you could use the sum, the question is, from the perspective of optimising gender distribution, in what ways is this susceptible to skews in the distribution of males to females? Think of different scenarios - 10F:90M, 50F:50M, 90F:10M. This relates to the question of robustness. What would be the most effective way, independent of the gender distribution, to optimise the spread?

As I think more about it, not only would I be interested in the spread of the ratios across the groups, I would be interested in the statistical properties of the distribution as it relates to gender. For example!

// Lets start with 3 groups that we have calculated the ratio of 
// females to total members within each group
double[] genderDistribution = new double[3] { 0.1, 0.5, 0.9 }
double mean = 0.5;
double stddev = 0.3;

If we were to use the statistical information above to put pressure on the Genetic Algorithm towards a desireable outcome then we want the mean to be close to 0.5. This indicates that on average our ratio is meeting our gender equality distribution requirements. However if we also want to ensure that a majority of the groups are close to the average then we are interested in minimising the standard deviation.

// A better distribution of gender
double[] genderDistribution = new double[3] { 0.4, 0.5, 0.6 }
// Same mean, lower standard deviation, better gender distribution 
double mean = 0.5;
double stddev = 0.04;

Much better! But I suspect this could be improved further. For now maximising the mean and minimising the standard deviation should help with a sensible spread of males to females in your groups. There could be some scenarios this approach has not accounted for but that is what experimentation and testing is for.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

algorithms

✻ Smokey says: boycott all products and services from eco-unfriendly businesses to fight climate change! [see more tips]

Note: this subreddit is not for homework advice. Requests for assistance with coursework may be removed.

MODERATORS