AI Tool DOLPHIN Detects Hidden Disease Markers Inside Human Cells — A Step Toward Early Diagnosis and Digital Cell Models

A team of researchers from McGill University has unveiled a new artificial intelligence tool called DOLPHIN, which can spot hidden signs of diseases inside individual cells — even before any symptoms appear. Published in Nature Communications in July 2025, this research could mark a major step forward in early disease detection and personalized treatment.

Let’s break down what this discovery means, how DOLPHIN works, why it’s important, and what challenges still lie ahead.

The Problem With How We Currently Detect Diseases

For decades, scientists have tried to understand diseases by analyzing genes — the basic units of heredity that code for proteins and other crucial biological molecules. Most studies rely on a method called single-cell RNA sequencing (scRNA-seq), which measures how active each gene is inside individual cells.

While this method has revolutionized biology, there’s a catch: traditional analysis only looks at the total expression level of each gene. In other words, it adds up all the RNA molecules produced by a gene and treats that as one number.

This approach hides an enormous amount of information. Inside cells, genes aren’t always expressed in a uniform way. They can be spliced, meaning that different parts of a gene — known as exons — can be rearranged, omitted, or combined in multiple ways. These subtle differences in RNA splicing can change how a protein functions and can even be early indicators of disease.

Unfortunately, conventional tools tend to ignore this splicing information because it’s complicated to analyze. As a result, doctors and researchers might miss the early cellular hints of diseases like cancer or neurodegenerative disorders.

Enter DOLPHIN — A New Kind of AI Microscope

This is where DOLPHIN comes in. The system’s name stands for “Deep Learning for Phenotypic Inference”, and it’s designed to look deeper into single-cell data — past the gene level and straight into the fine-grained structure of RNA molecules.

Rather than counting whole genes, DOLPHIN focuses on exons and junction reads — the connecting pieces that reveal how a gene’s components are assembled. It then uses a variational graph autoencoder, a type of deep learning model that builds a network-like map where nodes represent exons and edges represent the junctions between them.

By analyzing how these elements interact, DOLPHIN can identify subtle patterns that indicate when a cell is beginning to behave abnormally — long before the effects show up in tissue or blood tests.

One of the key advantages of this AI system is that it creates richer digital profiles of each cell. This level of detail allows researchers to distinguish between different disease subtypes and understand how a disease evolves at the molecular level.

What the Study Found

The McGill team tested DOLPHIN on single-cell RNA data from pancreatic cancer patients — a particularly challenging disease because it often goes undetected until late stages.

When compared to traditional gene-level analysis, DOLPHIN found over 800 previously undetected disease markers. These hidden markers helped distinguish high-risk, aggressive tumors from less severe cases. That information could be crucial in tailoring treatment plans for patients, giving doctors a clearer view of which therapies might work best.

The researchers explained that this is the kind of insight doctors have been seeking for years — the ability to connect a cell’s microscopic behavior with real clinical outcomes. It’s like reading a cell’s diary instead of just glancing at its cover.

The team believes this method could eventually make it possible to detect diseases before symptoms appear, opening up new opportunities for preventive medicine and personalized therapies.

Building Toward “Virtual Cells”

Perhaps the most exciting part of this research is the idea of digital or virtual cells.

DOLPHIN generates high-resolution cellular profiles that can be used to simulate how a real human cell behaves — how it responds to different drugs, how it mutates, or how it interacts with neighboring cells.

This concept is sometimes called a “digital twin” of a biological cell. Just as engineers use digital twins to simulate bridges, engines, or aircraft before building them, biologists could use these virtual cells to test hypotheses and design experiments in silico — that is, on a computer.

Imagine running millions of virtual drug tests across digital cells representing different patients, diseases, and genetic backgrounds — all before a single pipette touches a sample in the lab. That could save years of research time and reduce the cost of developing new therapies.

The McGill team hopes to expand DOLPHIN’s reach from a few datasets to millions of cells, which would make such simulations increasingly realistic.

Why This Is a Big Deal

Early detection is the holy grail of modern medicine. Many diseases — cancer, Alzheimer’s, Parkinson’s, and even autoimmune disorders — are far easier to treat if caught early.

The problem has always been that symptoms appear late, often after irreversible damage has occurred. By uncovering molecular-level changes before symptoms arise, tools like DOLPHIN could shift healthcare toward true predictive and preventive medicine.

Moreover, DOLPHIN could help solve one of the biggest challenges in medical treatment — the trial-and-error process of choosing drugs. By revealing hidden disease signatures, doctors could better match patients to the treatments most likely to work for them.

That’s not just faster and cheaper — it’s also safer and more humane.

The People and Support Behind the Discovery

This project was led by Assistant Professor Jun Ding, a researcher at McGill University’s Department of Medicine and a scientist at the Research Institute of the McGill University Health Centre. The first author, Kailu Song, is a PhD student in McGill’s Quantitative Life Sciences program.

The work was supported by several major Canadian research organizations, including the Meakins-Christie Chair in Respiratory Research, the Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Fonds de recherche du Québec (FRQ).

Their combined efforts highlight Canada’s growing leadership in merging AI and biomedical research.

Challenges Ahead

While DOLPHIN is a major step forward, there are still hurdles to overcome.

1. Scalability:
The AI model has been tested on select datasets. Scaling it up to analyze millions of cells across many diseases and populations will require enormous computational power and high-quality data.

2. Data quality:
To capture exon and junction details, sequencing data must have sufficient read depth — meaning it needs to capture enough of those fine RNA connections. Not all single-cell experiments meet that requirement yet.

3. Interpretability:
Deep learning models, especially those built on graph structures, are often “black boxes.” Researchers can see that a certain pattern predicts disease, but understanding why it does can be challenging. Bridging the gap between prediction and explanation will be essential for clinical adoption.

4. Validation across diseases:
So far, DOLPHIN has been shown to work in pancreatic cancer, but its performance needs to be tested across many other conditions, such as heart disease, lung disorders, or neurological illnesses.

Even with these challenges, the progress is undeniable. As computing power and biological datasets continue to grow, AI tools like DOLPHIN could become a core part of the diagnostic toolkit for researchers and physicians.

Understanding Single-Cell Transcriptomics

To appreciate DOLPHIN’s impact, it helps to understand what single-cell transcriptomics actually means.

Every cell in your body contains the same DNA, but not all genes are active in every cell. What makes a skin cell different from a neuron or a liver cell is which genes are turned on or off — and how those genes are spliced and expressed.

Single-cell RNA sequencing lets scientists measure that activity at the level of individual cells, rather than in bulk tissue. This helps uncover cellular diversity, rare subpopulations, and developmental trajectories — all key to understanding how complex diseases start and progress.

By moving beyond the gene level to examine exons and junctions, DOLPHIN opens an even deeper window into what’s really happening inside each cell. It’s like switching from a blurry photograph to a high-definition video feed of your body’s inner workings.

The Bigger Picture: AI Meets Molecular Biology

The use of artificial intelligence in biology isn’t new, but the field is now accelerating at an incredible pace. AI tools are helping researchers:

Predict protein structures (as in DeepMind’s AlphaFold project).
Analyze vast genomic datasets.
Model complex molecular interactions.
Classify cell types and tissue samples faster than ever before.

DOLPHIN fits perfectly into this new landscape — one where AI doesn’t just crunch data but discovers new biological insights that humans alone couldn’t easily find.

By marrying deep learning with single-cell data, the McGill team has demonstrated that AI can move beyond image or text recognition to tackle the most intricate biological patterns inside us.

A Glimpse Into the Future

As DOLPHIN and similar models evolve, medicine could move toward a new era of cellular-level diagnostics.

Doctors may one day use a simple blood or tissue sample, feed it into a DOLPHIN-like system, and receive a detailed digital report showing whether a person is at risk of developing a specific disease — even before the first symptom appears.

Such a tool could fundamentally change how we think about health: not waiting for illness to strike, but catching it in its earliest whispers.

Research Reference:
“DOLPHIN advances single-cell transcriptomics beyond gene level by leveraging exon and junction reads” – Nature Communications (2025)