Proteogenomics is a relatively recent ’omics - the term was first used in a 2004 paper to describe a mapping technique to improve genome annotation using proteomics data. Integrating its parent fields of proteomics and genomics with transcriptomics, proteogenomics is the latest in a series of ‘omics technologies to reach the science bench. The goal is to discover new peptides by comparing mass spectrometry data from the protein sample of interest with a database of proteins created from genomic and transcriptomic data.
Genomics was the first ‘omics technology and was initially expected to present a panacea for health care, providing information on a patient’s genome to enable predictive, personalized medicine. Researchers soon realized that the genome alone wasn’t always enough to give useful clinical information as genes can be translated into a myriad of proteins depending on the environment the organism experiences, among other factors.
Transcriptomics went some way to solving this problem by analyzing the RNA in a cell to determine which genes are being expressed. But this, too, can produce misleading results as RNA content doesn’t always represent protein abundance. Proteomics analyzes proteins, the functional effectors of the cell, to identify biomarkers of disease and potential drug targets, bypassing the limitations of transcriptomics.
This isn’t the end of the story though, as proteomics must meet the challenge of identifying individual proteins from a mass spectrometry or NMR readout – difficult without reference information from the genome or transcriptome to give a baseline idea of which proteins may be present. The first uses of proteogenomics aimed to solve this problem using genomic and transcriptomic data to aid in protein identification.
Associate Professor György Marko-Varga uses proteogenomics at Lund University in Sweden. His research looks at variations in proteogenomic biomarkers in disease including cancer. “We are at an early stage in proteogenomics research,” he explains, “and we are still learning as we move forwards. It is largely a challenge of technology – we can sequence the whole genome, but we cannot, yet, do an analysis of whole proteome expression.”
Proteogenomics gives a systems perspective, viewing the genome sequence, RNA expression, protein synthesis and post-translational modifications all at once. Its potential is in use for prevention, diagnosis and treatment of disease and for precision medicine , creating treatment groups based on patients’ proteogenomic profile or even designing drugs to act on a specific biomarker. Professor Marko-Varga explains, “The most valuable thing today is to understand disease mechanisms using proteogenome data and doing computationally powerful calculations to look for correlations. In this way, we can look for drug effects – like whether a person is a responder or non-responder. We can use this discover and validate novel prognostic and diagnostic biomarkers.” Before combining ‘omics approaches, the search for biomarkers was limited to single markers of disease. Now researchers determine the fingerprint of disease across a wide spectrum of markers in non-invasive tests using bodily fluids.
Proteogenomics Techniques: Mass Spec and Data Integration
Mass spectrometry is the key tool used in proteogenomic analysis, and advancements in this technology have been integral to the increasing utility of proteogenomics in research. Next generation sequencing data now allows research to have more advanced goals, for example detecting abnormal protein variants across a range of cancer tissue samples.
Informatics is also essential in the proteogenomic arena for integration of data from individual genomics and proteomics experiments. Professor Richard Kumaran Kandasamy studies inflammation in the Center for Molecular Inflammation Research at the Norwegian University of Science and Technology. “We use proteogenomics in many aspects of our research in order to understand the molecular mechanisms behind activation and regulation of immune signaling,” he explains. “Proteogenomics enables us to integrate data from several ‘omics platforms. We have successfully identified protein-coding evidence for erstwhile categorized long non-coding RNAs, small open reading frames and for pseudogenes. We have also obtained evidence for refinement of annotated genes including identification of alternative splice variants and variant peptides in several cancers that provide novel insights into their expression across various cells and tissues. These can also be used as potential targets for evaluation of disease progression and/or monitoring therapeutic response.”
Fighting Cancer and Powering Precision Medicine
Proteogenomics is still used in its initial capacity to find genes and connect them with their functions, otherwise known as gene annotation. However, the uses of proteogenomics have expanded hugely due to technological advancements.
The real breakthrough use of proteogenomics has been better treatment and diagnosis of diseases such as cancer. Proteogenomic analysis allows understanding of the molecular changes, such as translocation or methylation, that lead to cancer. This ability to determine biomarkers of disease is revolutionizing diagnosis. Professor Marko-Varga says, “We use proteogenomics in clinical studies when we are investigating gene expression and its correlation with protein expression. One major objective of our research team is to verify that you get a DNA-RNA-protein synthesis event when cancer is present.”
Determining if a drug will work for a certain person in advance of treatment is a long-term goal of proteogenomics. Between 1 in 4 and 1 in 25 people prescribed the ten most-used drugs in the US are non-responders, wasting time, resources and failing to treat disease. Colorectal cancer can be treated with cetuximab, but only improves the survival of patients with tumor cells carrying a mutated EGFR gene, not the mutated KRAS gene. Breast cancer is another example with trastuzumab being used to directly target HER2. Professor Marko-Varga explains, “Driver (regulator) gene mutations are central in most cancers, but drugs are developed towards proteins not genes. So, we need to answer the question; is the key regulator protein mutated? If the answer is no, but the protein is mutated, the drug we are designing may have no treatment effect but could have safety issues.”
Another advance has been development of individualized treatments for cancer in the form of immunotherapy. This treatment uses proteogenomics to determine the specific antigen of tumor cells in individual cases, meaning an antibody can be designed to target these cells while leaving healthy cells alive. For example, tumor antigens can be used as specific biomarkers of ductal carcinomas using proteogenomics.
Proteogenomics has huge potential, but further development is needed to make the most of this. “The biggest limitation is the lack of an end-to-end (plug and play type) pipeline and the lack of a graphical user interface (GUI) as most available software is based on command-line scripts,” explains Professor Kandasamy. He adds, “De novo analysis in the absence of genomic or transcriptomic datasets is cumbersome as it is computationally intensive, and analysis of variant peptides requires genomic/transcriptomic and proteomic datasets from the same source.”
Despite current limitations, proteogenomics represents a huge step forwards in our ability to view human biology as a whole, functioning system. As technology improves the future could see precision medicine where treatments are tailored to individuals. How our diet and lifestyles affect our proteogenome may predict our future disease, and how we plan and carry out clinical trials. Professor Marko-Varga concludes, “Disease is very complex in terms of proteogenome data. The link between this data and function is key, and at present we poorly understand what the data means. Improvements will come because the technology is developing quickly – computational power is expanding and with it more power to analyze the proteogenome.”