Background: Functional and disease pathway enrichment analyses are widely used to interpret transcriptomic gene sets and identify biological pathways and disease mechanisms. However, conventional approaches rely on RNA expression–based ranking, often producing redundant, inconsistently ranked, and difficult-to-interpret results. To address these limitations, we developed CellFun, a context-robust, agentic AI framework that introduces a bias-aware scoring system to prioritize non-redundant, biologically coherent pathways without arbitrary weighting.
Methods and Results: CellFun builds on our previously developed SCIG framework, which identifies functionally important genes to generate cell type–specific gene sets. By integrating transcriptional and regulatory features, CellFun prioritizes biologically relevant genes across large-scale datasets (>10 million single cells spanning ~500 cell types). Compared to traditional methods based on expression level, differential expression, or specificity, CellFun demonstrates improved performance in pathway enrichment analyses. The framework employs a multi-agent architecture that integrates pathway enrichment, redundancy-aware clustering, specificity assessment, and automated functional summarization. Semantic similarity modeling is used to group redundant pathways into interpretable functional themes. A key innovation is the Q-score, which quantitatively evaluates signal strength, non-redundancy, specificity, and functional diversity using rank-based calibration and data-driven normalization, enabling robust comparisons across datasets.
Application: In cardiovascular disease datasets, CellFun identifies pathways associated with endothelial dysfunction and disease progression that are not consistently detected by conventional approaches. These results highlight its ability to uncover biologically meaningful and previously underappreciated mechanisms. CellFun provides a scalable and interpretable framework for identifying disease pathways and therapeutic targets, with broad applicability across diverse biological and disease contexts.