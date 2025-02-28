Read time: 3 minutes

A new resource from the Gene Ontology Consortium, a comprehensive encyclopedia of the known functions of all protein-coding human genes, has just been completed and released on a new website. For the first time, researchers from the Keck School of Medicine of USC, the Swiss Institute of Bioinformatics and other institutions used large-scale evolutionary modeling to integrate data on human genes with genetic data collected from other organisms. This has culminated in a searchable public resource that lists the known functions of more than 20,000 genes using the most accurate and complete evidence available.





A paper describing the resource was just published in the journal Nature.





The Gene Ontology, a National Institutes of Health-funded knowledge base that has been continually expanded and improved for more than 25 years, has become a mainstay of the biomedical research process. Already, it is used in more than 30,000 publications each year to aid with data analysis and interpretation.

Biomedical researchers who conduct “omics” experiments—large-scale studies of DNA, RNA, proteins and other biological molecules—generate data that can identify hundreds of genes of interest. For example, a researcher might learn which genes are turned “on” or “off” in cancerous cells compared to healthy ones. Reviewing thousands of published papers on the known functions of each gene is not feasible, so many scientists turn instead to the Gene Ontology.





“Our knowledge base allows scientists to go from just a list of genes to an understanding of their biological functions, sometimes even pointing toward potential treatments,” said Paul D. Thomas, PhD, a principal investigator of the Gene Ontology Consortium, director of the division of bioinformatics and professor of population and public health sciences at the Keck School of Medicine and professor of quantitative and computational biology at the USC Dornsife College of Letters, Arts and Sciences.





Now, this latest milestone provides a new resource within the knowledge base that uses evolutionary modeling to make the tool even more powerful. The approach allows the researchers to combine experimental data collected from human genes with that obtained from related genes in model organisms, such as mice and zebrafish. It provides a more complete picture of human gene function, including filling in gaps in scientific knowledge where direct evidence from human studies is not available.





“We’d previously amassed a huge knowledge base that has become an authoritative reference on human gene functions,” said Thomas, who is also lead author of the new publication. “And now, by adding information about when each function arose in evolution, we’re now providing an even more complete, accurate, and concise description of the functions encoded by human genes.”

An evolutionary view

The new resource was compiled by a team of more than 150 biologists around the world, including at the Keck School of Medicine of USC. Since 1998, the group has meticulously reviewed over 175,000 scientific publications on gene function, searching for data on gene functions in well studied organisms and every gene in the human genome—primarily the more than 20,000 protein-coding genes that control key biological processes.





After reviewing the literature, they categorized each gene according to the biological functions it performs, either on its own or in combination with other genes. They selected from a catalog they developed of more than 40,000 functions that span cell division, cell signaling, immune response, molecular transport and many more. Understanding the precise functions performed by groups of genes can help researchers understand what goes wrong in cancer and other diseases and design targeted approaches to treatment.





The new resource of gene function descriptions, called the “PAN-GO functionome,” will essentially be used in the same way by the scientific community—to analyze omics data among other applications—but it will yield more accurate results, Thomas said. That’s because the recent work has brought together all the information in the knowledge base using large-scale evolutionary models (which track the evolutionary history of thousands of genes and related proteins), creating a more complete and accurate picture of gene function.





In many cases, experimental data from human genes is not available, but scientists have studied related genes in mice, rats, zebrafish, fruit flies, yeast or E. coli. By understanding when and how specific functions (such as energy processing or cell signaling) evolved, researchers can use data obtained from other organisms to gain an understanding of gene function in humans.





“This helps us infer the functional characteristics of human genes, even when there is no direct evidence from an experiment on the human gene itself,” Thomas said.

Further improving the knowledge base

Going forward, the Gene Ontology Consortium is requesting that researchers use the PAN-GO functionome in their analyses. The information is structured in a machine-readable format that allows scientists to use computational tools, such as artificial intelligence, to quickly search and use the data.





The consortium is also issuing a call to action: Researchers can now submit suggestions for updating the knowledge base on specific genes through the project’s website. Crowd-sourcing knowledge of gene functions and categorizing them in a structured way ensures that the shared resource continues to improve over time and that its insights are easy to apply.





Though it is the most comprehensive resource available on gene functions, the PAN-GO functionome is not yet complete. It contains data on 82% of protein-coding genes, but no experimental data exists for the other 18%–roughly 3,600 genes, the biological functions of which remain unknown.





“We now have a real picture of where we are missing information, and that’s where future research in this area may want to focus,” Thomas said.





Reference: Feuermann M, Mi H, Gaudet P, et al. A compendium of human gene functions derived from evolutionary modelling. Nature. 2025. doi: 10.1038/s41586-025-08592-0



