Poster Abstract: Benchmarking subclonal reconstruction algorithms for scalable multi-sample cancer phylogeny inference

Helena Winata, Graduate Student Researcher, University of California, Los Angeles

Abstract

Subclonal reconstruction from bulk DNA sequencing is essential for understanding tumor evolution, therapeutic resistance, and clinical outcomes in cancer. Multi-sample data from longitudinal or multi-region sequencing of a patient’s tumor offers detailed insights into tumor evolution. Yet, existing methods face challenges in scalability, failing to reconstruct phylogenies for datasets with as few as ten subclones. Here we present EMulSI-Phy (Efficient Multi-Sample Inference of cancer Phylogeny), a computational framework designed to efficiently reconstruct subclonal architecture and evolutionary relationships across multiple tumor samples. EMulSI-Phy addresses key limitations of current approaches through optimized clustering algorithms, and rule-based phylogenetic inference suitable for large whole-exome and whole-genome sequencing datasets.

Benchmarked against DPClust, PyClone, PyClone-VI, CONIPHER, Pairtree, and PhyClone on 504 simulated datasets and 400 patients from the TRACERx NSCLC cohort, EMulSI-Phy achieved the highest task completion rates (100% clustering, 98.4% phylogeny) and the lowest compute cost while producing results consistent with ground truth or published reconstructions. A one-at-a-time parameter sensitivity analysis identified the minimum SNV threshold and clustering distance metric as the most influential parameters. We further demonstrate that a consensus approach across input perturbations increases cluster silhouette width by ~30% and reduces the magnitude of sum rule violations by ~20% relative to a single full-data run. Collectively, these results demonstrate that EMulSI-Phy delivers competitive accuracy and improved output stability across datasets where other tools fail. EMulSI-Phy is distributed as a Dockerised R package with a Nextflow pipeline for reproducible multi-tool analysis.