AlphaGenome provides the first unified, high-resolution map of the non-coding genome
• Introduction: The Book Written in an Alien Cipher
The Human Genome Project decoded our DNA, but we only understood the 2% that codes for proteins. The other 98%—the genomic "dark matter"—remained a mystery.
The problem: Most disease-causing mutations are in that unknown 98%. Without understanding it, personalized medicine was stuck.
The solution: AlphaGenome from DeepMind—an AI that can predict the effects of mutations across the entire genome, including the 98% non-coding region.
1 The 98% Unknown of DNA
Your genome has 3 billion letters. Only 2% codes for proteins. The rest? Controls when, where, and how much those genes turn on.
The Challenge
2% of genome: Genes (well understood)
98% of genome: Regulation ("dark matter")
Most disease-causing mutations: In the 98%
The problem: Thousands of mutations identified, but we don't know what they do. Do they affect gene activation? Alter RNA splicing? Disrupt 3D DNA structure?
AlphaGenome's solution: A prediction engine that simulates mutation effects across the entire genome.
2 AlphaGenome: The First Model That Does It All
Before AlphaGenome, scientists had to choose between two bad options:
The Old Dilemma
Option A - Specialized models (SpliceAI):
✅ High precision on specific mutations
❌ Only see ~10,000 letters at a time (miss context)
Option B - Global models (Enformer):
✅ See 200,000 letters (broad context)
❌ Low resolution (average every 128 letters)
AlphaGenome breaks the dilemma: Processes 1 million DNA letters with 1-letter precision.
How it works: Uses 8 connected processors working in parallel. Like having 8 ultra-powerful magnifying glasses looking at different parts of the genome simultaneously.
3 An All-in-One Tool
AlphaGenome isn't a specialist—it's a generalist trained on thousands of different experiments.
What Can It Predict?
11 different types of biological data:
- ✓ Gene expression (which genes are active)
- ✓ RNA splicing (how genes are cut and pasted)
- ✓ DNA accessibility (which regions are "open")
- ✓ Histone modifications (epigenetic marks)
- ✓ Protein binding (where they stick to DNA)
- ✓ 3D genome structure
Results:
- ✅ 25 of 26 tests: Equal or better than specialized models
- ✅ Trained with humans and mice: Learns general principles, not just memorizes
The advantage: One model replaces dozens of specialized tools.
4 Detects Hidden Splicing Mutations
Splicing is the "cut and paste" process that removes unnecessary parts of RNA. Mutations here are hard to detect but cause many diseases.
What AlphaGenome Predicts
✓ Where the exact cut occurs
✓ How strong each cutting site is
✓ Competition between normal and abnormal sites
Why it matters: "Hidden" mutations far from genes can create new cutting sites that destroy the final protein.
Real Example: Muscular Dystrophy
A mutation in the DMD gene (Duchenne muscular dystrophy) creates a false cutting site. AlphaGenome predicts this false site will compete with the normal one, resulting in a broken protein.
Result: AlphaGenome is the best model in 6 of 7 splicing prediction tests.