AI-ready Perturb-seq and Spatial Transcriptomic Data for Drug Discovery Acceleration

Poster Abstract: Sondra Kopyscinski, Amrita Bhattacharya, Dzmitry Fedarovich, Marissa Hirst, Ben Ernest, Corey Oravetz, Nicole Leyland, Dan Rozelle

Abstract

Drug discovery remains one of the most formidable challenges in the preclinical pipeline, particularly the identification of novel therapeutic targets. Breakthrough technologies such as Perturb-seq and spatial transcriptomics are reshaping this landscape, enabling unprecedented insights into cellular function and tissue organization. Yet, the integration and utilization of datasets generated by these approaches remain complex and fragmented due to inconsistent formats, metadata, and lack of standards.

Rancho Biosciences brings both expertise and proven experience in curating, harmonizing, and analyzing these cutting-edge datasets to accelerate discovery.

  • Perturb-seq combines single-cell RNA sequencing with CRISPR-based perturbations to enable large-scale functional screening. Since its introduction in 2016 by the Regev lab at the Broad Institute, the method has expanded to include CRISPRi, CRISPRa, and chemical perturbation formats. Rancho standardizes metadata across studies, aligns gene nomenclature, and harmonizes perturbation annotations to support cross-study comparison. Perturbation-based data can be extremely useful for foundation model training.
  • Spatial transcriptomics preserves tissue architecture while mapping RNA expression, revealing not only which genes are active but where they are expressed, often down to the single-cell level. This spatial dimension provides critical context for understanding disease mechanisms and therapeutic opportunities. Rancho’s Spatial Innovation Initiative has established robust pipelines to process, harmonize, and analyze spatial datasets, making them readily deployable in drug discovery programs.

By harmonizing Perturb-seq and spatial transcriptomic data, Rancho BioSciences removes barriers to analysis, produces AI and ML-ready datasets, and accelerates the path to actionable insights in drug discovery.