A researcher at the Johns Hopkins Institute of Genetic Medicine has led the effort to compile to date the largest free resource of experimental information about human proteins. Reporting in the February issue of Nature Biotechnology, the research team describes how all researchers around the world can access this data and speed their own research.
“Advances in technology have made data generation much easier, but processing it and interpreting observations are now the major hurdles in science today,” says Akhilesh Pandey, M.D., Ph.D., associate professor of biological chemistry, pathology and oncology and member of the McKusick-Nathans Institute of Genetic Medicine at Hopkins.
“We’ve created a repository that incorporates easy-to-use Web forms so that all researchers can contribute and share data,” says Pandey, who coordinated this effort with scientists and software developers at the Institute of Bioinformatics, a nonprofit institute he founded in Bangalore, India, in 2002.
Like the online encyclopedia Wikipedia, Human Proteinpedia allows any researcher to contribute and edit their data as their research progresses. “Researchers will be able to quickly review what has been discovered by others about their protein of interest, speeding their own work,” says Pandey.
Human Proteinpedia contains information on when and where specific proteins are expressed or not, including in cells and tissues from diseases such as cancers; how the proteins are modified; and which other proteins they interact with. The repository includes only experimental data and doesn’t include computer-generated predictions, which may not turn out to be real.
The current version of Human Proteinpedia compiles data provided by more than 71 laboratories from all over the world and contains entries for more than 15,230 human proteins.
“With the amount of proteomic data pouring in each day, however, cataloging all of human protein data by hand is a Herculean task,” says Pandey.
“So we’re hoping that the scientific community will come together to contribute data generated in individual laboratories. This will not only improve the quality of the data but also increase the pace at which data is collected in a common repository. We’re excited about the enthusiasm and involvement of the entire global proteomics community and hope that we can work with companies like Google and Microsoft that are interested in enabling such data sharing and dissemination for biological data,” he added.
The research was funded by the National Institutes of Health Roadmap Initiative, the National Heart Lung and Blood Institute and internal funds from the Institute of Bioinformatics in Bangalore, India.