Background:Discovering rare subpopulations in single cell multiomics data remains challenging due to their sparse representation and transient nature. Integrating multiple omic modalities often leads to information loss when heterogeneous feature spaces are forced into joint embeddings. We present VenusGT, a trajectory-aware graph transformer pipeline that integrates multimodal single-cell data while preserving biological heterogeneity. VenusGT constructs a heterogeneous cell–gene–peak graph, learns embeddings through attention-based message passing, and incorporates lineage-aware pseudotime information to guide learning. To emphasise transitional dynamics, it applies rarity-weighted, trajectory-guided sampling and a weighted objective that amplifies gradients from rare populations. The model captures temporal continuity through smoothness regularisation and biases attention toward temporally adjacent neighbors. Applied to matched lymphoma dataset, VenusGT improves identification of rare and transitional cell states over existing approaches, enabling interpretable discovery of rare, lineage-specific, and reprogramming cell types in complex single-cell systems.
Conclusions: This work was benchmarked on current state-of the art algorithms and outperformed these on both, computation efficiency and the detection of rare cellular subpopulations.