After combing through thousands of genes in a large number of patients, they had come up with a list of likely genetic suspects tied to the disease. Most of these genes made sense – some had previously been implicated in cancer, others clearly played an important biological role. But the data also pointed to a group of genes encoding olfactory receptors – the proteins that allow us to smell. Why were so many of these genes cropping up? Could these possibly be culprit genes? In the end, researchers found that they were simply red herrings – distractions along the way to pinpointing the mutations driving cancer.
As cancer genomics scales up, more and more mutations can be detected. But in order for critical patterns and potential drug targets to emerge, researchers need to be able to eliminate the red herrings from their results and identify the genetic changes driving different cancer types. To do so, Broad researchers have surveyed the genetic landscape of cancer to better understand the spectrum of mutations within and across cancer types, and have used this information to develop a more sophisticated analytical methodology to detect key mutations. Known as MutSigCV, the new methodology is featured prominently in a paper appearing in Nature this week.
“Back in the days when there was very little data, researchers would get excited about seeing one mutation in a gene,” said co-first author Michael Lawrence, a Broad computational biologist. “But in this era of big data, thousands of samples are being sequenced and every gene has many mutations. We have to be able to support the idea that the mutation rate of a particular gene is above what we’d expect to see.”
Lawrence and his colleagues wanted to understand the source of the red herring problem. Most analytical approaches used for finding cancer genes take into account the overall genome-wide mutation frequency in a given type of cancer as well as a handful of other parameters. But these measures are often not enough to weed out unhelpful results. Mutations are not uniform across the genome, and some genes are ‘highly mutable,’ meaning that for a variety of reasons, they are more prone to accumulating mutations.
“Not taking these highly mutable genetic regions into account leads to declaring that genes in these regions have more mutations than expected and therefore likely were positively selected for during the cancer’s evolution,” said senior author Gad Getz, director of Cancer Genome Computational Analysis at the Broad. “Now we know that this could have been just by chance so there is really no evidence that these genes are actually involved in cancer.”
In order to factor in these highly mutable genes and other sources of problematic data, Getz’s team developed an algorithm that takes context into account. Any given nucleotide in the genome is influenced by three major kinds of context: its immediate, DNA neighbors (having an A, C, T, or G on either side); its location on a chromosome; and the type of cell it is in.
In the case of the olfactory receptor genes that kept turning up unexpectedly in the lung cancer data, a key clue would emerge when a visiting scientist named Paz Polak gave a talk to Getz’s lab. His talk had to do with the phenomenon of DNA replication timing. In order for a cell to divide and each daughter cell to receive a complete copy of the genome, all of the DNA must be duplicated. But this copying process does not happen all at once – some genes get copied early, and others are copied later (this animation shows the process over time). Genes that get copied later tend to be more prone to mutation. And olfactory receptor genes tend to get copied quite late.
“That was the real key to solving this mystery,” said Lawrence. “We went back to our data and looked at the olfactory receptors, and sure enough, they are uniformly late replicating.”
Using this criterion and many others, the researchers developed MutSigCV to account for context when ranking the most promising genetic findings. By looking at data from over 3,000 tumor samples representing 27 cancer types, the team was able to see the extraordinary range of mutation frequency and spectrum for tumors.
These maps show patterns that are unique to certain cancer types. For instance, skin cancer has a specific pattern or “signature” that reflects mutations induced by ultraviolet light; lung cancer’s signature is influenced by tobacco smoke; and other cancers – head and neck cancer, cervical cancer, and bladder cancer – all share a common signature likely tied to a response to infection.
Overall, the frequency of mutations varied by more than 1,000 fold between cancers with the lowest (pediatric cancers and leukemias) and the highest (melanoma and lung cancer) mutation rates.
With the ability to eliminate many suspicious genes, the researchers say it is now possible to start analyzing large cancer collections, including combined datasets from many cancer types. “We believe that this tool should be used in all studies — small or large — since the resulting gene list more accurately represents genes that have undergone positive selection in cancer,” said Getz. “Our goal is to use MutSigCV across cancer datasets to get the most comprehensive list of cancer genes.”
By combining MutSigCV with other MutSig tools, Getz’s team has already begun undertaking this challenge, looking across cancer types in what his group refers to as “pan-cancer” studies. Through this work, they hope to pinpoint genes that transcend cancer type and offer promising targets for drug treatment for many patients.