125 | | == The Parallel Solution == |
| 125 | == The Algorithm Parallelization == |
| 126 | |
| 127 | '''1. Serial Solution Analysis ''' |
| 128 | |
| 129 | In order to see what are the most computing expensive parts of the designed algorithm we analyzed first the serial version using Sun Studio Performance Analyzer. |
| 130 | In the following figure we can see the time taken by different functions called in the algorithm, sorted in descending order. We highlighted (Red) the functions that consume the most cpu time and that can be implemented using openMP. |
| 131 | [[Image(grafic2.jpg)]] |
| 132 | |
| 133 | |
| 134 | As can be seen in the above figure, most of the functions of the proposed algorithm have significant execution time (10 to 50%) and can be easily parallelized because they are applied independently to each chromosome from the population. In the next paragraphs we will explain what each function does and how it is better to be parallelized. |
| 135 | |
| 136 | The presented analysis reveals that the most time consuming part of the algorithm is the evaluation (or the computation of fitness for each individual). If we analyze the complexity of this phase of the algorithm we will obtain a O(N*M), where N is the number of nodes, M is the maximum number of predecessor that a node can have. More, this function is applied on each of the C chromosomes leading to a total complexity equals to O(C*N*P). If we also consider that this function is applied on each generation, the total complexity is O(G*C*N*M). |
| 137 | |
| 138 | Starting from the complexity of the algorithm the easiest and most efficient way to parallelize this function is to use a parallel for with guided scheduling because some individuals survive to the next generation without been affected by mutation and thus there is no need to be evaluated again. |
| 139 | |