SAMPARS

Introduction

SAMPARS [1] for (Simulated Annealing for Maximum PARSimony) is a phylogenetic reconstruction software based on the maximum parsimony criterion under Fitch's parsimony. The core of SAMPARS relies on a simulated annealing algorithm.

The main features of SAMPARS is the use of fine tuned parameters and the modification of the GenerateNeighbor function that generates the next neighbor of the main inner loop of the SA algorithm.

The SAMPARS program is part of the BioSBL (Standard Bioinformatics Library) developed by Jean-Michel Richer (still under development) and that is publicly available for download on sourceforge.net. It has been designed for Linux 32 bits and 64 bits architectures and can be compiled with g++. It uses a multi-character optimization technique described in [2] to decrease computation times. The current implementation can benefit from SSE (SIMD Streaming Extension) or AVX (Advanced Vector Extensions) instructions of Intel and AMD processors. Note that AVX is only available for Intel processors.

Results

problem   n     k   LVB Hydra SAMPARS diff.
tst01 45 61 549 545 545 0
tst02 47 151 1365 1354 1354 0
tst03 49 111 840 833 833 0
tst04 50 97 594 588 587 -1
tst05 52 75 793 789 789 0
tst06 54 65 605 596 596 0
tst07 56 143 1280 1269 1269 0
tst08 57 119 867 852 852 0
tst09 59 93 1153 1144 1141 -3
tst10 60 71 727 721 720 -1
tst11 62 63 549 542 540 -2
tst12 64 147 1240 1211 1208 -3
tst13 65 113 1532 1515 1515 0
tst14 67 99 1175 1160 1160 0
tst15 69 77 770 752 752 0
tst16 70 69 547 529 527 -2
tst17 71 159 2463 2453 2450 -3
tst18 73 117 1552 1522 1521 -1
tst19 74 95 1031 1013 1012 -1
tst20 75 79 688 661 659 -2
sum n/a n/a 20320 20049 20030 -19

SAMPARS was able to improve 10 of the best results of [3].

Example of newick tree of score 659 for tst20:


((N0066,((((N0061,N0031),N0067),((N0012,(N0027,((N0072,N0060),N0022))),((((N0036,N0053),N0005),(((((N0055,((N0063,(N0014,((N0047,N0034),(((((N0043,(N0009,N0065)),N0050),N0048),N0049),((N0044,N0010),(N0058,N0070)))))),N0039)),(((N0071,(N0035,(N0020,N0028))),N0037),((((((N0011,N0051),N0068),N0046),((N0003,N0074),N0002)),N0025),N0054))),(N0029,((N0013,N0032),N0069))),N0033),N0015)),N0018))),((N0045,N0059),(((((((((N0052,N0064),N0042),(N0040,N0024)),N0056),((((N0023,N0008),N0021),N0017),(N0019,N0057))),N0038),(N0062,N0026)),((N0073,(N0001,((N0075,N0004),(N0030,N0006)))),N0007)),N0041)))),N0016);

Compatibility and efficiency

The computation time for problem tst11 with a medium effort and a SPR neighborhood is 805 seconds with an average of 1.4 billion moves (or trees visited) on an AMD Phenom II X6 1090T with Linux Ubuntu 64 bits.

Also note that 64 bits architectures are more efficient than 32 bits architectures (see last two lines).

Computation times of SAMPARS and trees visited for problem tst11 on different computer architectures and different neighborhoods with default parameters.
CPU Freq (Ghz) Arch. (bits) Neighborhood Time (s) Trees (109)
AMD Phenom 1090 T 3.2 64 SPR 805 1.4
AMD Phenom 1090 T 3.2 64 TBR 811 2.0
Intel i7 860 2.8 32 SPR 1191 1.4
Intel i7 860 2.8 32 TBR 1173 1.5
Intel i7 2600k 3.4 32 SPR 833 1.4
Intel i7 2600k 3.4 64 SPR 577 1.4

Comparison with LVB

LVB written by Daniel Barker is also based on a simulated annealing metaheuristics but is not as good as SAMPARS in terms of the quality of the solution. Experiments that we have carried out showed that the GenerateNeighbor function of LVB if probably too simple. Moreover, the execution time of LVB can greatly vary from a few seconds or minutes to a few hours:

LVB execution time on AMD Phenom II X6 1090 T Linux 64 bits compared to SAMPARS with default parameters
problem   n     k   SAMPARS LVB
time (s) score time (s) trees (*) score
tst03 49 111 858 835 16 15 855
880 835 370 18 851
tst11 62 63 1231 544 610 150 557
1248 544 (6h >) 20949 12 552
tst16 70 69 1251 534 (15h >) 51973 9330 555
1233 532 (27h >) 97035 359 547
(7h >) 23257 1921 555
(45h >) 159960 (!) n/a 566
tst20 75 79 2574 668 (7h >) 24014 142 688
2752 669 (50h) 178500 (!) n/a 695

(*) number of trees of same cost as reported by LVB

(!) stopped

Download

SAMPARS is included in BioSBL that is freely available for non commercial purpose and is distributed under GPL License.

Bibliography