Poster Abstract: Maria Kiourlappou, Senior Computational Biologist, University of Oxford

ChatMDV: Democratising Bioinformatics Analysis Using Large Language Models

Poster Abstract: Maria Kiourlappou, Senior Computational Biologist, University of Oxford

Abstract

Background: The rapid advancement in single-cell sequencing, spatial omics, imaging and genomic technologies has produced complex, high-dimensional biological datasets that demand accessible tools for analysis and interpretation. Existing visualisation platforms, such as the Multi-Dimensional Viewer (MDV), offer comprehensive interfaces for data exploration but often require advanced computational expertise and manual configuration, limiting their use among clinicians and experimental biologists. Material and Methods: We developed ChatMDV, a natural-language interface integrated with MDV, enabling users to generate interactive visualisations and analyses through natural language commands. ChatMDV leverages a retrieval-augmented generation (RAG) architecture combined with large language models (LLMs) to translate user queries into executable, reproducible Python code and interactive output. This conversational layer lowers the technical barrier to data interrogation, allowing domain experts to perform sophisticated analyses and visualisation tasks without coding expertise. Results: We illustrate how ChatMDV enables efficient and reproducible analysis of single-cell transcriptomic data across three datasets of increasing complexity: (1) the PBMC3K single-cell RNA-seq dataset, (2) the Human Cell Atlas lung cancer atlas, and (3) the longitudinal TAURUS scRNA-seq study. Across all cases, ChatMDV successfully produced high-quality, reproducible visualisations from simple natural language questions, achieving accuracy above 95% in visualising the datasets.

Conclusions: By bridging the gap between natural language processing and bioinformatics visualisation, ChatMDV reduces technical barriers, enhances reproducibility, and supports more inclusive scientific inquiry. Its modular design and adherence to FAIR (Findability, Accessibility, Interoperability, and Reuse) principles make it a scalable and adaptable framework for accelerating biological data analysis.

ChatMDV: Democratising Bioinformatics Analysis Using Large Language Models