Despite rapid advances in microbiome research, a large proportion of proteins encoded by the human gut microbiome remain functionally uncharacterized. Sequence-based methods often fail to annotate these evolutionarily divergent proteins, leaving major gaps in our understanding of how microbial activities influence host metabolism, immunity, and disease. This uncharted layer of "functional dark matter" reflects proteins whose roles cannot be inferred from sequence similarity alone, highlighting the need for frameworks that can map protein function beyond sequence signals.
A research team led by Dr. DAI Lei at the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, together with collaborators, has introduced a structural- and AI-based strategy to illuminate this functional landscape. Their study recently published in Cell Host & Microbe presents the first comprehensive structural proteome database of the human gut microbiome and demonstrates how structure-guided approaches and AI can decode both microbial protein functions and microbe–host metabolic interactions.
Using structure prediction tools, the researchers built the Human Gut Microbial Protein Structure Database (GMPS, https://www.gmpsdb.cn/), containing about 2.7 million predicted protein structures across 968 gut bacterial species and 1,255 phage genomes. By analyzing proteins in three-dimensional space rather than relying solely on sequence similarity, the team uncovered functional relationships that traditional approaches often miss.
One major achievement is the improved annotation of phage proteins, of which up to 75% typically lack functional labels. Structural analogy more than doubled the annotation rate and revealed extensive structural diversification of phage endolysins—antibacterial enzymes with high target-species specificity. Several newly predicted endolysins were experimentally validated and shown to eliminate gut pathobionts, demonstrating how structural proteomics can accelerate discovery of precision antimicrobials.
The study also highlights the value of structure-guided discovery of microbial–host isozymes—bacterial enzymes that perform host-like functions but are too sequence-divergent to detect using conventional tools. Through structural comparisons, the team identified previously unrecognized bacterial enzymes involved in melatonin biosynthesis. Biochemical assays validated their activities, and animal experiments showed that these microbial enzymes can modulate host melatonin levels and have direct impacts on host physiology.
To address cases where even structural similarity is insufficient, the researchers developed Dense Enzyme Retrieval (DEER), an alignment-free method powered by structure-aware language models. DEER enables ultrafast and sensitive detection of remote homologs, achieving state-of-the-art performance and extending functional annotation into regions previously inaccessible to both sequence and structure alignment-based tools.
Together, these advances establish a new paradigm that integrates large-scale structural genomics with AI-driven inference to resolve the deep functional architecture of gut microbial communities. By illuminating substantial fractions of the microbiome’s functional dark matter, this work opens new avenues for therapeutic discovery, precision microbiome engineering, and mechanistic understanding of microbe–host interactions.

Exploring Functional Insights into the Human Gut Microbiome via the Structural Proteome. (Image by SIAT)
File Download: