Human papillomaviruses are associated with invasive cervical cancer as well as more benign disorders such as skin warts. Although more than 180 HPV genomes have been sequenced, there has been little research on the diversity of HPV genomes within the same patient, primarily because the virus is thought to have a low mutation rate.
Of the 13 HPV genotypes thought to be carcinogenic, HPV16 is responsible for about half of all invasive cervical cancer cases worldwide. In the study, the researchers sequenced HPV16 genomes from 10 patients with cervical cancer and one with non-malignant genital warts.
To date, most genomic studies of papillomaviruses have used Sanger sequencing to look at the "most prevalent, consensus sequence" during chronic infection, but Sanger sequencing may "not be appropriate to capture the dynamics of slowly evolving viruses, such as PVs," the authors wrote.
So, they decided to turn to next-generation sequencing. The authors extracted DNA from 10 clinical samples of invasive cervical cancer and one case of genital warts caused by HPV16. They used long PCR to generate 8-kb long amplicons — the size of the HPV genome — and sequenced them using Thermo Fisher's Ion Torrent PGM.
The authors generated both a consensus genome and also de novoassembled each sample using CLC software.
Comparing the clinical samples to the reference sequence, the researchers observed 190 changes, with the E2 gene containing the largest number of changes. Two samples had duplication events in the L1 gene and L2 gene, respectively.
The team also performed a phylogenetic analysis using consensus sequences from the PGM data as well as 20 HPV16 genomes from GenBank. From the eleven clinical samples, the researchers identified three types of HPV: HPV16_A1, HPV16_A2, and HPV16_D. In addition, these types correlated with specific tumor types, with squamous cell carcinomas associated with the A type and adenocarcinomas associated with the D types.
To analyze intra-host variation, the researchers performed de novo assembly. They were able to generate one contiguous sequence for four samples, with the remaining seven samples in three to eight contigs.
The researchers identified between three and 125 polymorphic sites per genome. In the most diverse sample, 31 of the 125 polymorphic sites represented more than 10 percent of the reads in that position. In the least diverse sample, only one polymorphic site represented more than 5 percent of the reads at that position.
Next, the team calculated a "diversity index" for each sample, defined as the "probability of a randomly chosen genome to be identical to the consensus genome." The median value for the samples was just 40 percent.
The authors suggest a number of factors could contribute to the diversity observed, including both innate and adaptive immune responses. For instance, the APOBEC3G family of proteins have been shown to target papillomavirus DNA, " which may partially account for the broad diversity of human PVs." In addition, "polymorphisms observed in the E6 gene could be a result of an immune selective pressure," the authors wrote.
In the future, more research will need to be done on HPV infection to monitor viral diversity in asymptomatic, productive, benign, premalignant and malignant infections. "The possible role of oncovirus intralesion diversity generated during chronic infections should be explored as a differential factor for increased oncogenic potential," they wrote.