The GWAS Catalog is the central repository of human genome-wide association studies, hosting variant-trait associations across phenotypes. It is freely-accessible, supporting major resources; including Ensembl (for genomic annotation), OpenTargets (for drug target prioritization), PhenoScanner, PheGenI, PGSCatalog, OpenGWAS, GWASCentral etc, also used by clinicians and data scientists for investigating genetic contributors to human traits, epidemiology, method development, and precision-medicine applications. Its data volume and diversity have grown exponentially, covering >7,000 publications, >180,000 studies (a 1,000% increase since 2020; 67% have full genome-wide summary statistics), >20,000 mapped traits, and >1,000,000 lead associations. Static downloads and legacy interfaces are increasingly insufficient for high-throughput, and integrative analyses, hence the need for scalable, programmatic access.
We present GWAS Catalog REST API-V2, a redesigned, data-centric programmatic layer engineered for high throughput, FAIR, and efficient retrieval of genome-wide association data. The API is structured around core biological entities; variants, traits, associations, and studies, optimized to answer key scientific questions such as which variants are linked to a trait; ancestral differences in associations; linking study metadata to analytical pipelines etc.
To scale these workflows, the system incorporates indexed data storage, containerized services, asynchronous jobs, and schema-driven responses, for efficient access across millions of records. The API adheres to FAIR principles using stable identifiers, standardized ontologies, machine-readable metadata, user-friendly documentation and tutorials, enabling reproducible research, interoperability, and seamless integration into genomics workflows.
By combining data-centric design with scalable infrastructure and standardized ontologies, the GWAS Catalog API-V2 facilitates integrative, reproducible, and population-aware genomic analyses, supporting interactive exploration and high-throughput pipelines.