Human Gene Maps Are Biased Toward European Ancestries and Scientists Are Only Now Seeing the Full Impact
Human gene maps are one of the most widely used tools in modern biology and medicine. They help researchers identify where genes are located, how they function, and how genetic changes might influence disease. But a major new study has revealed something unsettling: these maps are far from complete. In fact, they are strongly biased toward people of European ancestry, leaving out large parts of human genetic diversity.
The research, published in Nature Communications, shows that thousands of gene transcripts found in people from Africa, Asia, and the Americas are missing from official human gene catalogs. Some of these transcripts may even come from entirely unknown genes, meaning scientists have been working with an incomplete picture of how human biology really works.
Why Human Gene Maps Matter So Much
A gene map, also known as a gene annotation, is more than just a DNA sequence. While the human genome contains roughly three billion DNA letters, that raw sequence does not explain where genes start and end, how many genes exist, or how one gene can produce multiple proteins.
That complexity comes from RNA transcripts, which are molecules created when genes are activated. Through a process called splicing, a single gene can generate multiple transcripts, and each transcript can lead to a different protein. These variations are critical because proteins carry out most biological functions in the body.
Gene maps like GENCODE were created to organize all of this information. They have become foundational tools used daily in genetics, disease research, and drug development. But the new study shows that these maps were built on a narrow genetic base, largely drawn from individuals of European descent.
The Historical Roots of Eurocentric Genetics
The first draft of the human genome was published in 2001, marking a major scientific milestone. However, it had clear limitations. It did not fully explain gene locations, transcript diversity, or population-specific genetic differences.
Over time, researchers layered gene annotations on top of this reference genome. But because most of the underlying DNA data came from European individuals, the resulting gene maps inherited the same bias.
Although humans are about 99.9% genetically identical, the remaining fraction reflects tens of thousands of years of evolution. Different populations accumulated distinct genetic variants shaped by geography, environment, and chance. These differences are real, biologically meaningful, and until now, poorly represented in gene maps.
How Scientists Found What Was Missing
To uncover these blind spots, researchers focused on RNA transcripts, which show how genes are actually used inside cells. They used long-read RNA sequencing, a technology capable of reading entire RNA molecules from start to finish.
This is important because older sequencing methods only captured short fragments of RNA, making it extremely difficult to reconstruct full transcripts accurately. Long-read sequencing finally made it possible to address this problem properly.
The team analyzed blood cells from 43 individuals belonging to eight populations: Yoruba from Nigeria, Luhya from Kenya, Mbuti from the Congo, Han Chinese, Indian Telugu, Peruvians from Lima, Ashkenazi Jewish individuals, and Utah Europeans. These participants were also part of the 1000 Genomes Project, allowing researchers to directly compare RNA data with well-characterized DNA sequences.
Tens of Thousands of Missing Transcripts
The results were striking. The researchers identified 41,000 potential transcripts that were completely absent from official gene maps.
Among transcripts that came from already known protein-coding genes, 41% were predicted to produce different versions of existing proteins. This means thousands of protein variants had never been cataloged before.
Even more surprising, 773 transcripts appeared to originate from previously unrecognized gene regions, suggesting that scientists may have missed entire genes altogether.
One specific example involved the gene SUB1, which plays a role in essential processes like DNA repair. Individuals of Peruvian ancestry were found to produce a distinct transcript of SUB1 that changes the resulting protein. This transcript was missing from all existing gene annotations.
Clear Ancestry-Based Patterns
When the researchers grouped the data by ancestry, a clear pattern emerged. Non-European populations contained far more previously unseen transcripts than European groups.
In total, the study identified 2,267 population-specific transcripts, meaning RNA molecules that appeared in only one population and nowhere else. For European populations, most of these transcripts were already known. For non-European populations, the majority were completely new to science.
This finding confirms that current gene maps are much better at describing European biology than global human biology.
Why Using a Single Reference Genome Is a Problem
The researchers also tested what would happen if they used each personโs own DNA sequence as the reference instead of the standard human genome.
Doing so uncovered hundreds of additional transcripts per individual, with the largest gains seen in people of African ancestry. This shows that relying on a single, universal reference genome actively hides meaningful biological variation.
While the human reference genome is extremely useful, this study demonstrates that it also acts as a filter that removes population-specific features from view.
How Missing Transcripts Affect Disease Research
The study didnโt stop at discovery. Researchers examined how missing transcripts affect the detection of allele-specific transcript usage, a phenomenon where the two copies of a gene inherited from each parent behave differently.
These effects can only be observed if all relevant transcripts are present in gene maps. When transcripts are missing, genetic signals disappear.
After adding the newly discovered transcripts to existing maps, the researchers detected many more genetic effects, especially in individuals of non-European ancestry.
Importantly, many of these ancestry-biased transcripts occur in genes already linked to diseases that differ across populations, including lupus, rheumatoid arthritis, asthma, and cholesterol-related traits.
This does not mean the transcripts directly cause these diseases. Instead, they reveal genetic signals that were previously invisible, helping scientists better understand why diseases may occur more frequently or behave differently in certain populations.
The Case for a Human Pantranscriptome
The researchers emphasize that their work is only a first step. The study examined just one cell type, from one tissue, in only 43 people. Many regions of the world were not represented at all, and none of the bodyโs most complex organs were studied.
Yet even within this limited scope, tens of thousands of missing transcripts were discovered. This strongly suggests that much more biological diversity remains undocumented.
To address this, scientists are calling for the creation of a human pantranscriptome. While initiatives like the Human Pangenome Project aim to capture DNA diversity, RNA tells a different story. DNA is the instruction manual, but RNA shows which instructions are actually used in each cell.
A pantranscriptome would catalog all RNA molecules across all tissues, life stages, and populations, creating a truly inclusive map of human biology.
A Massive but Necessary Effort
Building such a resource will not be easy. This single study generated more than 10 terabytes of data and 800 million full-length RNA sequences, requiring advanced machine-learning tools and the computing power of the MareNostrum 5 supercomputer.
Scaling this effort to hundreds of tissues and thousands of individuals will require unprecedented global collaboration and computational capacity. But researchers argue that the payoff is worth it.
A complete and inclusive gene map is essential for fair, accurate, and effective genomic medicine, ensuring that future discoveries benefit all populations, not just a narrow subset of humanity.
Research reference:
https://www.nature.com/articles/s41467-025-66096-x