The human genome is contained within a vast jumble of DNA. Its 20,000 or so genes are concealed within strings of As, Ts, Gs, and Cs, and each gene must be turned on at the right time and in the right cells.
For the first time, scientists have glimpsed the cellular machinery that accomplishes that feat, as it assembles directly on the DNA and readies it for transcription into RNA, the first step in protein production.
“We’ve described the assembly of the machinery that allows the human genome to be read one gene at a time. This molecular step is critical is transforming DNA ultimately into the protein repertoire that carries out all the functions of a cell,” says Eva Nogales, the Howard Hughes Medical Institute investigator who led the research. Nogales and her team published their work online in the journal Nature on February 27, 2013.
The enzyme that carries out transcription, RNA polymerase II, is very efficient at copying the information encoded in DNA into an RNA molecule. “But the polymerase is completely incapable of detecting the beginning of a gene,” says Nogales, whose lab is at the University of California, Berkeley.
Likewise, the polymerase cannot do its work until the two strands of the DNA double helix have been separated to reveal their sequence. To accomplish these tasks, a bulky complex of proteins called the pre-initiation complex is assembled each time transcription begins.
Together, the proteins in the pre-initiation complex find a gene’s start site, prepare the DNA, and set the polymerase on its way.
“This machinery has to find the beginning of the gene in the huge ocean of DNA that makes up the genome,” explains Nogales. Once there, it opens the double helix and positions the polymerase in exactly the right place, ensuring that transcription begins with the correct letter in the DNA code.
Researchers knew that it takes a core complex of at least six different factors-TATA-binding protein (TBP), TFIIA, TFIIB, TFIIE, TFIIF, and TFIIH-to initiate transcription of human genes.
Based on biochemical experiments, they knew the order in which these transcription factors arrived at the start site and joined the complex, and they had an idea of each one’s contributions to transcription initiation. But no one had actually visualized the molecules in action, so the molecular details of how they functioned remained unknown.
The main obstacle to visualizing the pre-initiation complex, Nogales says, was obtaining enough of each of its components for structural studies. Several of the components in the complex are not amenable to the techniques scientists often rely on to produce large quantities of the protein in the lab.
Instead, Nogales and her colleagues isolate those proteins directly from human cells. Because the transcription factors are not abundant inside cells, the quantities that can be purified are small.
“Things like x-ray crystallography and NMR”-common techniques for structural studies that require large amounts of protein-“were completely out of the question,” Nogales says. “That’s why very few structures of any of these intermediates have been obtained.”
Instead, they turned to a technique called cryo-electron microscopy (cryo-EM), in which samples are flash frozen and then viewed through an electron microscope. Because the proteins do not have to be crystallized, researchers see them in their native state. Further, Nogales explains, “we need very little volume at very low concentrations to do cryo-EM.”
Nogales and her team wanted to watch as the pre-initiation complex assembled at a gene’s start site, so they created a series of cryo-EM snapshots. In a test tube, they allowed a minimal form of the complex to self assemble on a strand of DNA: a cluster including TBP, TFIIA, TFIIB, and the RNA polymerase II.
They froze the complex in this state and captured images to generate a 3D structure. All the pieces in this simple form of the complex were known, so Nogales says this allowed them to confirm that their technique was consistent with previous crystallographic images. They then added TFIIF, TFIIE, and TFIIH one by one, capturing three more snapshots.
In cells, the pre-initiation complex remains on the DNA until the polymerase begins transcription and physically moves away from the start site. Before this can happen, the pre-initiation complex must use energy to open the double helix and push the DNA into the pocket of the polymerase where transcription occurs.
Nogales and her team mimicked this state in the test tube by altering the sequence of DNA on which the transcription factors and polymerase assembled, adding a short segment of RNA. “It is the equivalent of the polymerase having engaged the open DNA and started to add a few nucleotides of RNA,” Nogales explains. Again, they froze the complex and captured images with cryo-EM.
The stop-motion movie they have created from their snapshots reveals several key features. It shows, for example, how TFIIF stabilizes the complex by engaging both the polymerase and the DNA.
TFIIF was known to be important for lining the polymerase up at a gene’s start site, and the new images show how it allows the polymerase to bind to the DNA double helix, then nudges the polymerase to just the right position on the DNA.
“This initial engagement sets things in the right position, so that when TFIIH pushes on the DNA, it will be moved into the right place,” Nogales explains. It also reveals how the same factor ensures that the double helix will unwind and separate at the right place, where part of TFIIF physically inserts itself between the strands to prevent them from coming back together.
“We have seen the molecular details of how all the proteins come together and interact with each other and with the DNA, thus gaining insight into how the cell determines at which point of the DNA to start transcribing into RNA, and actually how the DNA is opened and inserted into the active site in the polymerase,” Nogales says. “But this complex is really only the beginning of the story.”
“If we want to get at what is different between one gene and another, we have to start building up even larger complexes,” she says. So far, her team has recreated the formation of the most fundamental portion of the pre-initiation complex-the part that assembles every time any human gene is transcribed. But in living cells, these proteins never act alone. The next step, Nogales says, will be to begin to add in the factors that allow the transcription machinery to recognize genes in a regulated fashion. “As genomes get bigger, their regulatory systems get more complex,” Nogales says.
That complexity, she says, enables cells to fine tune gene expression-and understanding it is essential for understanding the complexity arising from the human genome.