New Scientific Tools Are Making It Easier to Understand the Hidden World of Microbes

New Scientific Tools Are Making It Easier to Understand the Hidden World of Microbes
A digital tree illustrates how scientists trace microbial ancestry using DNA-like connections. Tools such as TMarSel and scikit-bio help clarify microbial family trees. (Graphic courtesy of the Biodesign Institute; Credit: The Biodesign Institute at Arizona State University)

The microscopic organisms that live inside our bodies and surround us in soil, water, and air play a crucial role in human health, climate systems, and ecosystems. From regulating digestion in the gut to driving nutrient cycles in the oceans, microbes influence life on Earth in ways scientists are only beginning to fully understand. Despite major advances in DNA sequencing, identifying these organisms and figuring out how they are related to one another remains a major scientific challenge.

Now, researchers at Arizona State University (ASU) have developed two powerful tools that promise to make microbial research more accurate, scalable, and accessible. These tools address some of the biggest problems in modern microbiology: building reliable microbial family trees and making sense of enormous biological datasets. Together, they strengthen the foundation of research in microbiome science, disease tracking, environmental monitoring, and precision medicine.


Why Studying Microbes Is So Difficult

Microbes are incredibly diverse and abundant. Scientists estimate that trillions of microorganisms exist in the human body alone, and many more populate Earthโ€™s soils, oceans, and atmosphere. Modern DNA sequencing has made it possible to extract genetic material directly from environmental samples, a technique known as metagenomics.

Metagenomics allows researchers to sequence all the DNA in a sample at once, revealing entire microbial communities that were previously invisible. However, this approach comes with a major drawback: the resulting genomes are often incomplete, fragmented, or uneven in quality. That makes it difficult to determine what organisms are present and how they evolved.

Two new toolsโ€”TMarSel and scikit-bioโ€”were created to tackle these exact problems.


Improving Microbial Family Trees With Smarter Marker Genes

One of the most important tasks in microbiology is building evolutionary trees, also known as phylogenetic trees. These trees show how microbes are related to one another and help scientists track how organisms evolve, spread, and adapt.

Traditionally, researchers have relied on a small, fixed set of marker genesโ€”specific DNA sequences that act as evolutionary signposts. While this approach worked reasonably well in the past, it struggles in the era of metagenomics, where millions of genomes of varying quality are analyzed at once.

To solve this, ASU researcher Qiyun Zhu and his collaborators helped develop TMarSel, short for Tree-based Marker Selection. Instead of relying on predefined genes, TMarSel automatically searches through thousands of possible gene families to find the best combination for building a reliable evolutionary tree.

What makes TMarSel especially powerful is its data-driven approach. The tool evaluates genes based on how common they are across genomes, how informative they are for evolutionary analysis, and how much they contribute to a stable and meaningful tree. This flexibility allows TMarSel to work well even when genomes are incomplete or highly diverse.

The result is more accurate microbial family trees, which are essential for tracking disease-causing organisms, monitoring environmental changes, and understanding how microbial communities respond to pollution or climate stress.


Scikit-bio and the Challenge of Big Biological Data

While TMarSel focuses on improving evolutionary analysis, the second major advance addresses a broader problem: how to analyze massive biological datasets efficiently and reliably.

Biological data is unlike most other types of data. It is huge, sparse, and highly interconnected, often involving thousands of features that influence one another. Standard data analysis tools are not designed to handle this level of complexity.

This is where scikit-bio comes in.

Scikit-bio is a large, open-source Python software library designed specifically for biological and โ€œomicโ€ data analysis. It provides researchers with more than 500 specialized functions that cover a wide range of tasks, including:

  • Comparing microbial communities
  • Calculating biological diversity
  • Transforming compositional data
  • Analyzing DNA, RNA, and protein sequences
  • Building and editing phylogenetic trees
  • Preparing datasets for machine learning applications

By integrating seamlessly with the broader Python scientific computing ecosystem, scikit-bio allows researchers to move from raw data to meaningful insights more efficiently.


A Community-Driven Scientific Resource

One of the defining features of scikit-bio is its open-source and community-driven nature. The project is supported by more than 80 contributors from around the world and is maintained with rigorous testing, documentation, and peer review.

This collaborative approach has paid off. Scikit-bio has already been cited in tens of thousands of scientific papers, spanning fields such as medicine, ecology, climate science, and cancer research. It has become a core tool for microbiome research, helping scientists analyze data in a way that is transparent, reproducible, and scalable.

The latest major update to scikit-bio was described in a study published in Nature Methods, highlighting its growing importance in modern biological research.


Why These Tools Matter for Health and the Environment

Together, TMarSel and scikit-bio represent a significant step forward for microbiology. Better evolutionary trees improve disease surveillance, allowing scientists to track how harmful microbes change over time. In environmental research, they help reveal how microbial communities respond to pollution, warming oceans, and shifting ecosystems.

In human health, clearer microbial identification strengthens studies of the gut microbiome, which has been linked to digestion, immunity, mental health, and chronic disease. As sequencing technologies continue to become faster and cheaper, the volume of microbial data will only grow. Without tools like these, much of that data would remain underused or misunderstood.


The Growing Role of Computational Biology

These developments also highlight the increasing importance of computational biology, a field that sits at the intersection of biology, mathematics, and software engineering. ASUโ€™s work in this area shows how combining evolutionary theory with advanced computing can produce tools used by scientists worldwide.

As researchers continue to uncover new microbial species and genetic relationships, tools like TMarSel and scikit-bio ensure that discoveries are built on solid, reproducible scientific foundations.


Research References

Augmenting microbial phylogenomic signal with tailored marker gene sets (Nature Communications, 2025):
https://www.nature.com/articles/s41467-025-64881-2

Scikit-bio: a fundamental Python library for biological omic data analysis (Nature Methods, 2025):
https://www.nature.com/articles/s41592-025-02981-z

Also Read

Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments