Background: Myotonic dystrophy type 1 (DM1) is a multisystemic disorder caused by unstable CTG repeat expansions in the DMPK gene. While its clinical manifestations are well documented, the global genetic landscape of DM1 remains poorly characterised. Most genomic studies to date have focused on individuals of European ancestry, leaving ancestry-specific patterns underexplored, limiting the generalisability of diagnostic criteria and may contribute to underdiagnosis in underrepresented populations. Objectives: Leveraging large-scale whole genome sequencing (WGS) data, we aimed to identify DMPK repeat expansions across nearly one million participants and examine ancestry-specific differences.
Methods: WGS data from Genomics England (n = 80,110), UK Biobank (n = 490,276), and All of Us (n = 414,744) were analysed using ExpansionHunter. Repeat lengths were classified as Normal (≤37), Intermediate (38–49), or Pathogenic (≥50). Genetic ancestry was inferred using platform-specific methods and grouped into five superpopulations: African (AFR), Admixed American (AMR), East Asian (EAS), European (EUR), and South Asian (SAS). A combined dataset (n ≈ 985,000) enabled assessment of ancestry-specific differences.
Results: We report the largest ancestry-informed dataset of DMPK repeat expansions to date. Significant variation in repeat length distributions and pathogenic expansion prevalence was observed across ancestries. European individuals showed the highest prevalence and larger expansions, while other groups exhibited lower frequencies, highlighting potential diagnostic bias and epidemiological modifiers. Conclusion: By integrating WGS data from nearly one million individuals, this study reveals ancestry-specific variation in DMPK repeat expansions, addressing a critical gap in DM1 research and supporting development of inclusive diagnostic and management strategies.