Purpose: Traditional GWAS has advanced our understanding of com- plex diseases but often misses nonlinear genetic interactions. Deep learn- ing offers new opportunities to capture complex genomic patterns, yet existing methods mostly depend on feature selection strategies that ei- ther constrain analysis to known pathways or risk data leakage when applied across the full dataset. Further, covariates can inflate predictive performance without reflecting true genetic signals. We explore different deep learning architecture choices for GWAS and demonstrate that care- ful architectural choices can outperform existing methods under strict no-leakage conditions. .
Conclusions: Building on this, we extend our approach to a multi-label framework that jointly models five diseases, leveraging shared genetic architecture for improved efficiency and discovery. Applied to five million SNPs across 37,000 samples, our method achieves competitive predictive performance (AUC 0.68-0.96), offering a scalable, leakage-free, and biologically meaningful approach for multi-disease GWAS analysis