Poster Abstract: From Care to Discovery: A Cloud-Native Platform for Secondary Use of Cancer Clinico-Genomic Data

Tarishi Pathak, Bioinformatics Software Engineer, Dana-Farber Cancer Institute

Abstract

Introduction: Clinico-genomic data generated during routine cancer care represent a high-value resource for secondary research, enabling biomarker discovery, real-world evidence generation, patient stratification, and faster translation of molecular insights into clinical impact. However, realizing this value requires secure, scalable infrastructure that can integrate multimodal data, support reproducible analyses, and provide governed access for diverse research users.

 

Methods:We established a scalable, cloud-native analytics platform within the DNAnexus ecosystem to support translational research at Dana-Farber Cancer Institute. The platform unifies access to clinical and genomic data and enables integrated analysis across multiple modalities. Its workflow-driven architecture, built with WDL and Nextflow, supports reproducible, portable, and scalable genomic analysis. We developed modular, version-controlled pipelines for the analysis of data generated by WGS, WES, WTS, and several other ‘omic’ assays. To support downstream analyses, we also provide curated RStudio and JupyterLab environments.

 

Conclusion: Researchers may request access to datasets derived from electronic health records, laboratory systems, and genomic platforms. Following review and approval, data are provisioned directly within the platform. Users can explore structured clinico-genomic data through interactive applications such as the Cohort Browser integrated with an AI assistant and can also access primary sequencing data to run NGS pipelines in the same environment. With role-based access controls, robust security and governance, and fully cloud-managed compute and storage, the platform provides an end-to-end solution for secure data access, analysis, and collaboration. By reducing operational burden and accelerating the path from data generation to insight, this ecosystem enables efficient secondary use of clinico-genomic data at scale.