The Human Protein Atlas Launches Version 24
Professor Mathias Uhlén discussed the new features of The Human Protein Atlas version 24 at HUPO 2024.
Complete the form below to unlock access to ALL audio articles.
The Human Protein Atlas (HPA) is an impressive open-access resource that offers scientists deep biological insights beyond the human genome. Launched in 2003, the HPA’s aim was to map of all the human proteins in cells, tissues and organs.
To achieve this goal, the HPA is built from research that adopts a wide variety of omics technologies, including antibody-based imaging, mass spectrometry (MS), transcriptomics, spatial analysis and – more recently – artificial intelligence-based tools. There are several papers published monthly that feature contributions from the HPA, which is used by researchers in over 150 countries to study protein science.
The very first version of the HPA website launched in 2005, comprising protein expression data on ~700 antibodies. In the years since the atlas has been continuously updated through new releases and iterations to feature the most up-to-date and cutting-edge research on human proteins.
At this year’s Human Proteome Organization (HUPO) annual meeting, the consortium announced the launch of the 24th version of the HPA. Technology Networks had the pleasure of speaking with Mathias Uhlén, a professor of microbiology at the Royal Institute of Technology (KTH) and program director of the HPA, to learn more about its features.
Uhlén stressed the value of studying proteins in the post-genomic era: “The future is in protein science,” he said. “Proteins are the building blocks of life, and they are the targets of pharmaceuticals. We experienced a genetic revolution that inspired many people to enter genomics research, but now I think it’s time to set the stage for functional biology, which is the proteins.”
Structural changes to the HPA enhance accessibility
The new version of the HPA comprises 5 million web pages and over 10 million, manually annotated high-resolution bioimages. It contains a large amount of information including 16 knowledge summary pages on topics that are of high biological or medical interest, including:
1. The Disease Blood Atlas
2. 3D-structures of proteins
3. The cell and tissue specific proteome
4. The human secretome
5. The human membrane proteome
6. The house-keeping proteome
7. The human protein classes
8. Evidence of the human protein-coding genes
9. The right cell line for your experiment
10. The druggable proteome
11. The cancer proteome
12. Transcription factor landscape
13. Multiplex tissue profiling
14. Spatial transcriptomics of the brain
15. Cilia and basal bodies
16. Sperm and flagella
“It’s very important, since there is a lot of data, to structure it [the HPA] in such a way that it is easy for the audience to utilize,” Uhlén explained. To that end, version 24 has undergone a slight restructure. “We used to have 12 sections, which we have compressed into 8 major resources, which is more logical and accessible,” Uhlén continued. These eight resources include the blood resource, the brain resource, the subcellular resource, the single cell resource, the structure and interaction resource, the tissue resource, the cancer resource and the cell line resource.
A novel Disease Blood Atlas
Beyond restructuring, version 24 of the HPA incorporates exciting new data that supports a deeper understanding of human disease pathophysiology and a path towards improved diagnostic approaches using protein biomarkers.
“We now have a Disease Blood Atlas that contains next-generation blood profiling data from 59 diseases,” Uhlén said. To build the novel atlas, blood samples from patients experiencing different types of cancer, autoimmune, infectious, neurobiology and cardiovascular diseases were profiled.
“The underlying data is from 10,000 patients, from whom blood samples were taken when they received a diagnosis and before they started treatment. It is a fantastic resource for any scientist that wants to look at disease biomarkers,” said Uhlén.
Biomarkers are biological molecules that can be obtained from blood, bodily fluids or tissues that signal disruption of healthy function and the presence of disease. Biomarkers can be incredibly helpful for clinicians that are seeking to diagnose a patient, but their discovery, identification, characterization and validation isn’t always easy. In fact, biomarker discovery is facing one of the biggest reproducibility crises in the history of science, Uhlén expressed: “What we have found using the Disease Blood Atlas is that the expression of certain proteins, considered biomarkers for a specific disease, can be elevated or reduced in more than one disease.”
He provided an example to showcase why this is so problematic. Say a case-control study identifies a biomarker – a protein – that is elevated in pancreatic cancer. But the data from the Disease Blood Atlas reveals that this protein is also elevated when someone experiences an infection with Escherichia coli (E. coli). “That’s very important information for people to know because when they screen for pancreatic cancer, they don’t want to screen for E. coli instead,” Uhlén said.
What is a case-control study?
A case-control study is one that compares two groups of people, one group that has received a diagnosis of a condition (case group) and a similar group of people that have not received such a diagnosis (control group).
For biomarker discovery, it’s incredibly important to look at all diseases at the same time. This enables the identification of proteins that are altered in more than one disease, which is what Uhlén and the HPA are working towards with the Disease Blood Atlas. It’s not easy – or cheap – but it’s worthwhile, Uhlén said. Validation studies of biomarkers can then be conducted by individual researchers using the open-access resource, perhaps using case-control studies at that stage.
“I think this is the first time anyone has created such a resource. We will have much more data coming in Q1 and Q2 of 2025, so by the summer, I’m hopeful that we will have a very comprehensive Disease Blood Atlas.”
Version 24 also includes data on multiplex profiling of human tissues, and a new section sharing spatial transcriptomics data from the cerebral cortex of the human brain.
“Spatial transcriptomics is a fantastic approach whereby you take a tissue and then you actually conduct RNA analysis of single cells in situ. The outcome is that you get a transcript from all of the genes within that cell with spatial context. It's an incredible amount of data and this technology allows us to have resolution on single cells,” Uhlén said.
The HPA version 24 also benefits from new structural data obtained using the 2024 Nobel Prize-winning technology, AlphaFold. “AlphaFold is an incredible technology – what an excellent choice for the Nobel Prize in Chemistry 2024,” Uhlén said.
“In version 23 [of the HPA], we already had AlphaFold structures for human proteins. In this new version, we’ve also done an analysis of all the splice variants, of which there are ~80,000,” he continued. “It takes an enormous amount of computer time, but now we can actually show all of the structures of the proteins and the isoforms. We have this sort of ‘dark matter’ in the proteome – the splice variants – and we don’t really know how useful or functional they are. But now, at least we know how they look structurally. So that’s exciting.”
An “extraordinary” future in proteomics
The field of proteomics has evolved rapidly over the last 20 years. As such, so has the HPA. While the speed and sensitivity of MS has been important for progressing the study of proteins, Uhlén emphasized that it is one tool in the box for the HPA.
“Usually when I talk about the protein atlas, I say it has undergone three stages. The first stage was about 10 years, and it was very much about making an antibody to every human protein to then analyze the proteins. Then we went into an era that was very much multiomics-based, including RNA technologies, because there was a lot of RNA-based methods that came along. It came at the perfect time for us, because obviously RNA is a proxy for the proteins, and we’ve been working on RNA for over 10 years,” Uhlén said.
Looking to the future of proteomics and the HPA, Uhlén was visibly excited about the prospect of what could come next.
“Now we are in a third phase, which is very much about AI. It’s very much about finding human proteins and, hopefully, in 10 years we can model human cells. The idea here is that you can add something to perturb the cell and model it. I don’t know whether that will take 10 or 100 years – with the speed at which things are happening – who knows? But it would be very extraordinary,” he concluded.
Professor Mathias Uhlén was speaking to Molly Coddington, Senior Science Writer and News Team Lead for Technology Networks, at HUPO 2024.
About the interviewee
Professor Mathias Uhlén received his PhD at the Royal Institute of Technology (KTH), Stockholm, Sweden in 1984. After a post-doc period at the EMBL in Heidelberg, Germany, he became a professor in microbiology at KTH in 1988.
His research is focused on protein science, antibody engineering engineering and precision medicine and ranges from basic research in human and microbial biology to more applied research, including clinical applications in cancer, infectious diseases, cardiovascular diseases, autoimmune diseases and neurobiology. His research has resulted in more than 750 publications. Professor Uhlén is also program director of The Human Protein Atlas.