SAMPARS [1] for (Simulated Annealing for Maximum PARSimony) is a phylogenetic reconstruction software based on the maximum parsimony criterion under Fitch's parsimony. The core of SAMPARS relies on a simulated annealing algorithm.
The main features of SAMPARS is the use of fine tuned parameters and the modification of the GenerateNeighbor function that generates the next neighbor of the main inner loop of the SA algorithm.
The SAMPARS program is part of the BioSBL (Standard Bioinformatics Library) developed by Jean-Michel Richer (still under development) and that is publicly available for download on sourceforge.net. It has been designed for Linux 32 bits and 64 bits architectures and can be compiled with g++. It uses a multi-character optimization technique described in [2] to decrease computation times. The current implementation can benefit from SSE (SIMD Streaming Extension) or AVX (Advanced Vector Extensions) instructions of Intel and AMD processors. Note that AVX is only available for Intel processors.
problem | n | k | LVB | Hydra | SAMPARS | diff. |
tst01 | 45 | 61 | 549 | 545 | 545 | 0 |
tst02 | 47 | 151 | 1365 | 1354 | 1354 | 0 |
tst03 | 49 | 111 | 840 | 833 | 833 | 0 |
tst04 | 50 | 97 | 594 | 588 | 587 | -1 |
tst05 | 52 | 75 | 793 | 789 | 789 | 0 |
tst06 | 54 | 65 | 605 | 596 | 596 | 0 |
tst07 | 56 | 143 | 1280 | 1269 | 1269 | 0 |
tst08 | 57 | 119 | 867 | 852 | 852 | 0 |
tst09 | 59 | 93 | 1153 | 1144 | 1141 | -3 |
tst10 | 60 | 71 | 727 | 721 | 720 | -1 |
tst11 | 62 | 63 | 549 | 542 | 540 | -2 |
tst12 | 64 | 147 | 1240 | 1211 | 1208 | -3 |
tst13 | 65 | 113 | 1532 | 1515 | 1515 | 0 |
tst14 | 67 | 99 | 1175 | 1160 | 1160 | 0 |
tst15 | 69 | 77 | 770 | 752 | 752 | 0 |
tst16 | 70 | 69 | 547 | 529 | 527 | -2 |
tst17 | 71 | 159 | 2463 | 2453 | 2450 | -3 |
tst18 | 73 | 117 | 1552 | 1522 | 1521 | -1 |
tst19 | 74 | 95 | 1031 | 1013 | 1012 | -1 |
tst20 | 75 | 79 | 688 | 661 | 659 | -2 |
sum | n/a | n/a | 20320 | 20049 | 20030 | -19 |
SAMPARS was able to improve 10 of the best results of [3].
Example of newick tree of score 659 for tst20:
((N0066,((((N0061,N0031),N0067),((N0012,(N0027,((N0072,N0060),N0022))),((((N0036,N0053),N0005),(((((N0055,((N0063,(N0014,((N0047,N0034),(((((N0043,(N0009,N0065)),N0050),N0048),N0049),((N0044,N0010),(N0058,N0070)))))),N0039)),(((N0071,(N0035,(N0020,N0028))),N0037),((((((N0011,N0051),N0068),N0046),((N0003,N0074),N0002)),N0025),N0054))),(N0029,((N0013,N0032),N0069))),N0033),N0015)),N0018))),((N0045,N0059),(((((((((N0052,N0064),N0042),(N0040,N0024)),N0056),((((N0023,N0008),N0021),N0017),(N0019,N0057))),N0038),(N0062,N0026)),((N0073,(N0001,((N0075,N0004),(N0030,N0006)))),N0007)),N0041)))),N0016);
The computation time for problem tst11 with a medium effort and a SPR neighborhood is 805 seconds with an average of 1.4 billion moves (or trees visited) on an AMD Phenom II X6 1090T with Linux Ubuntu 64 bits.
Also note that 64 bits architectures are more efficient than 32 bits architectures (see last two lines).
CPU | Freq (Ghz) | Arch. (bits) | Neighborhood | Time (s) | Trees (109) |
AMD Phenom 1090 T | 3.2 | 64 | SPR | 805 | 1.4 |
AMD Phenom 1090 T | 3.2 | 64 | TBR | 811 | 2.0 |
Intel i7 860 | 2.8 | 32 | SPR | 1191 | 1.4 |
Intel i7 860 | 2.8 | 32 | TBR | 1173 | 1.5 |
Intel i7 2600k | 3.4 | 32 | SPR | 833 | 1.4 |
Intel i7 2600k | 3.4 | 64 | SPR | 577 | 1.4 |
LVB written by Daniel Barker is also based on a simulated annealing metaheuristics but is not as good as SAMPARS in terms of the quality of the solution. Experiments that we have carried out showed that the GenerateNeighbor function of LVB if probably too simple. Moreover, the execution time of LVB can greatly vary from a few seconds or minutes to a few hours:
problem | n | k | SAMPARS | LVB | |||
time (s) | score | time (s) | trees (*) | score | |||
tst03 | 49 | 111 | 858 | 835 | 16 | 15 | 855 |
880 | 835 | 370 | 18 | 851 | |||
tst11 | 62 | 63 | 1231 | 544 | 610 | 150 | 557 |
1248 | 544 | (6h >) 20949 | 12 | 552 | |||
tst16 | 70 | 69 | 1251 | 534 | (15h >) 51973 | 9330 | 555 |
1233 | 532 | (27h >) 97035 | 359 | 547 | |||
(7h >) 23257 | 1921 | 555 | |||||
(45h >) 159960 (!) | n/a | 566 | |||||
tst20 | 75 | 79 | 2574 | 668 | (7h >) 24014 | 142 | 688 |
2752 | 669 | (50h) 178500 (!) | n/a | 695 |
(*) number of trees of same cost as reported by LVB
(!) stopped
SAMPARS is included in BioSBL that is freely available for non commercial purpose and is distributed under GPL License.