AlphaGenome: AI workflow for DNA analysis

AlphaGenome provides the first unified, high-resolution map of the non-coding genome

Introduction: The Book Written in an Alien Cipher

The Human Genome Project decoded our DNA, but we only understood the 2% that codes for proteins. The other 98%—the genomic "dark matter"—remained a mystery.

The problem: Most disease-causing mutations are in that unknown 98%. Without understanding it, personalized medicine was stuck.

The solution: AlphaGenome from DeepMind—an AI that can predict the effects of mutations across the entire genome, including the 98% non-coding region.

1 The 98% Unknown of DNA

Your genome has 3 billion letters. Only 2% codes for proteins. The rest? Controls when, where, and how much those genes turn on.

The Challenge

2% of genome: Genes (well understood)

98% of genome: Regulation ("dark matter")

Most disease-causing mutations: In the 98%

The problem: Thousands of mutations identified, but we don't know what they do. Do they affect gene activation? Alter RNA splicing? Disrupt 3D DNA structure?

AlphaGenome's solution: A prediction engine that simulates mutation effects across the entire genome.

2 AlphaGenome: The First Model That Does It All

Before AlphaGenome, scientists had to choose between two bad options:

The Old Dilemma

Option A - Specialized models (SpliceAI):

✅ High precision on specific mutations

❌ Only see ~10,000 letters at a time (miss context)

Option B - Global models (Enformer):

✅ See 200,000 letters (broad context)

❌ Low resolution (average every 128 letters)

AlphaGenome breaks the dilemma: Processes 1 million DNA letters with 1-letter precision.

How it works: Uses 8 connected processors working in parallel. Like having 8 ultra-powerful magnifying glasses looking at different parts of the genome simultaneously.

3 An All-in-One Tool

AlphaGenome isn't a specialist—it's a generalist trained on thousands of different experiments.

What Can It Predict?

11 different types of biological data:

  • ✓ Gene expression (which genes are active)
  • ✓ RNA splicing (how genes are cut and pasted)
  • ✓ DNA accessibility (which regions are "open")
  • ✓ Histone modifications (epigenetic marks)
  • ✓ Protein binding (where they stick to DNA)
  • ✓ 3D genome structure

Results:

  • 25 of 26 tests: Equal or better than specialized models
  • Trained with humans and mice: Learns general principles, not just memorizes

The advantage: One model replaces dozens of specialized tools.

4 Detects Hidden Splicing Mutations

Splicing is the "cut and paste" process that removes unnecessary parts of RNA. Mutations here are hard to detect but cause many diseases.

What AlphaGenome Predicts

Where the exact cut occurs

How strong each cutting site is

Competition between normal and abnormal sites

Why it matters: "Hidden" mutations far from genes can create new cutting sites that destroy the final protein.

Real Example: Muscular Dystrophy

A mutation in the DMD gene (Duchenne muscular dystrophy) creates a false cutting site. AlphaGenome predicts this false site will compete with the normal one, resulting in a broken protein.

Result: AlphaGenome is the best model in 6 of 7 splicing prediction tests.

This enables the model to identify "deep intronic" mutations—pathogenic variants hidden far from exons that create "decoy" splice sites.

"AlphaGenome achieves SOTA splicing variant effect prediction on six of seven benchmarks, providing a more comprehensive view of altered splicing events and transcript structure."

By modeling the competition between these sites, AlphaGenome provides a holistic view of how a variant alters the final transcript structure. This is critical for genetic diseases where a single intronic mutation can cause exon skipping or intron retention.

Clinical Example: Activated Cryptic Site

A mutation in intron 7 of the DMD gene (Duchenne muscular dystrophy) creates a new splice acceptor site 50 bp upstream of the canonical site. AlphaGenome predicts that this cryptic site will now compete with the normal site, resulting in a truncated transcript and non-functional protein.

5 Fast and Efficient

AlphaGenome is powerful, but also needs to be fast and practical for clinical use.

The Solution: "Distillation"

Step 1: Train 64 large models ("teachers")

Step 2: One small model ("student") learns from them

Result: Same performance, 64x faster

Speed: Less than 1 second per prediction (NVIDIA H100 GPU)

1M+
Variants analyzed in hours (before: months)

Impact: Researchers can now analyze millions of mutations quickly, making personalized medicine possible at scale.

6 Real Case: Discovering Mutations in Leukemia

The ultimate test: can AlphaGenome make real discoveries? Yes.

The Virtual Experiment

Target: Analyze the TAL1 gene (causes leukemia)

Hypothesis: Mutations create new "switches" that activate the gene

AlphaGenome's Predictions:

  1. ✅ Mutations would create new activation "switches"
  2. ✅ Would increase chemical activation marks (H3K27ac)
  3. ✅ Would result in TAL1 gene overactivation

New Discovery

AlphaGenome identified a second unknown mechanism of how these mutations activate the gene. This shows it can make real discoveries, not just confirm what's known.

Impact: AlphaGenome can predict how mutations cause cancer, accelerating the development of personalized treatments.

What This Means

AlphaGenome marks a fundamental shift: from "mapping" genes to "simulating" what mutations do.

Key Points

1

The 98% of the genome is no longer a mystery

AlphaGenome can predict mutation effects across all DNA, not just genes.

2

1 million letters at once

1-letter precision with 1-million-letter context—best of both worlds.

3

11 data types in one

One model replaces dozens of specialized tools.

4

Detects hidden mutations

Identifies splicing mutations other models miss.

5

Fast for clinical use

Less than 1 second per prediction—millions of analyses in hours.

6

Makes real discoveries

Already identified new mechanisms in leukemia.

Future Applications

Rare diseases: Interpret unknown mutations

Personalized medicine: Predict treatment response

Drug design: Avoid genetic side effects

Oncology: Identify cancer-driving mutations

AlphaGenome gives us the first complete map of the "unknown" 98% of the genome—the dark matter that controls when and how genes turn on.

Scientific References

  1. Avsec, Ž., Latysheva, N., Cheng, J. et al. (2026). Advancing regulatory variant effect prediction with AlphaGenome. Nature, 649, 1206–1218. https://doi.org/10.1038/s41586-025-10014-0
  2. ENCODE Project Consortium (2020). Perspectives on ENCODE. Nature, 583, 693–698. https://doi.org/10.1038/s41586-020-2449-8
  3. Jaganathan, K. et al. (2019). Predicting Splicing from Primary Sequence with Deep Learning. Cell, 176, 535-548.e24. https://doi.org/10.1016/j.cell.2018.12.015
  4. Avsec, Ž. et al. (2021). Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods, 18, 1196–1203. https://doi.org/10.1038/s41592-021-01252-x
Artificial Intelligence Genomics Deep Learning Precision Medicine

Interested in advanced genomic analysis?

Discover how our WES sequencing and microbiome analysis services can accelerate your research.

Contact GenBiome