RT Journal Article
SR Electronic
T1 Data Distribution for Phylogenetic Inference with Site Repeats via Judicious Hypergraph Partitioning
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 579318
DO 10.1101/579318
A1 Baar, Ivo
A1 Hübner, Lukas
A1 Oettig, Peter
A1 Zapletal, Adrian
A1 Schlag, Sebastian
A1 Stamatakis, Alexandros
A1 Morel, Benoit
YR 2019
UL http://biorxiv.org/content/early/2019/03/18/579318.abstract
AB The so-called site repeats (SR) technique can be used to accelerate the widely-used phylogenetic likelihood function (PLF) by identifying identical patterns among multiple sequence alignment (MSA) sites, thereby omitting redundant calculations and saving memory. However, this complicates the optimal data distribution of MSA sites in parallel likelihood calculations, as the cost of computing the likelihood for individual sites strongly depends on the sites-to-cores assignment. We show that finding a ‘good’ sites-to-cores assignment can be modeled as a hypergraph partitioning problem, more specifically, a specific instance of the so-called judicious hypergraph partitioning problem. We initially develop, parallelize, and make available HyperPhylo, an efficient open-source implementation for this flavor of judicious partitioning where all vertices have the same degree. Using empirical MSA data, we then show that sites-to-core assignments computed via HyperPhylo are substantially better than those obtained via a previous na ï ve approach for phylogenetic data distribution under SRs.