A Long Day's Night: Working on the Frontline of Structural Biology in COVID-19 Times
A Long Day's Night: Working on the Frontline of Structural Biology in COVID-19 Times
Garry Buchko, PhD is a physical biochemist at Pacific Northwest National Laboratory (PNNL) where he began his career in the laboratory of Michael Kennedy, PhD studying DNA repair proteins using NMR spectroscopy.
Over the past two decades, he has used NMR spectroscopy to solve solution structures and characterize the function of a wide suite of proteins. Despite long days and nights spent in the laboratory, his curiosity for the field clearly hasn't wavered. As he puts it: "Who gets paid to work on their hobby?"
Since 2007, a large portion of Buchko's research efforts have been dedicated to the Seattle Structural Genomics Center for Infectious Diseases (SSGCID), one of two Structural Genomics centers established by the National Institute of Allergy and Infectious Diseases.
His work with SSGCID involves determining the three-dimensional structures of proteins from pathogens that cause infectious diseases in humans.
After the initial outbreak of COVID-19 and the sequencing of the virus' genome, Buchko and colleagues quickly assembled to prioritize structural genomics efforts to focus on deciphering important protein structures from SARS-CoV-2.
In this interview, Buchko eloquently discusses his area of research, what it's like to be a scientist in this field in the current global pandemic, and how his team will continue to push the frontiers of scientific research to keep one step ahead of the many other infectious diseases that may be out there.
Molly Campbell (MC): Please can you tell us about your research background and work at the Seattle Structural Genomics Center for Infectious Disease?
Garry Buchko (GB): Over the past two decades at PNNL I’ve used NMR spectroscopy to solve solution structures and characterize the function of a wide suite of proteins. Since 2007 this included a significant dedication to the efforts of the Seattle Structural Genomics Center for Infectious Diseases (SSGCID), one of two Structural Genomics centers established by the National Institute of Allergy and Infectious Diseases (NIAID) to determine the three-dimensional structures of proteins from pathogens causing infectious diseases in humans.
Both Centers actively engage with infectious disease researchers to select Community Request targets for entry in their structure determination pipelines and to collaboratively interpret and publish results from successful structure determinations. SSGCID target selection focuses on essential enzymes, virulence factors, drug targets, and vaccine candidates from numerous bacterial, eukaryotic, and viral pathogens. In general, target genes are PCR amplified, cloned, and screened for soluble expression in Escherichia coli. Proteins are then purified in milligram amounts, screened for crystallization, and analyzed by X-ray diffraction using an in-house source or off-site synchrotron beamline. To address solving structures for proteins that fail to crystallize, proteins <25 kDa in molecular weight are queued for structure determination by NMR, and recently, selected targets > ~100 kDa are queued for cryo-EM.
Since project inception in late 2007, over 6,000 targets have entered the SSGCID structure determination pipeline, of which >4000 are Community Requests, resulting in the deposition of nearly 1,300 protein structures in the Protein Data Bank (PDB). Of the deposited structures, 95% were solved by X-ray, 4% by NMR, and 1% by cryo-EM. With few exceptions, the coordinates for all structures solved by both NIAID centers are immediately made available to the public where they serve as “blueprints” for structure-based discovery of new drugs, assist our understanding of the molecular biology of pathogens, and fill “structure-space”. In addition to atomic coordinates, more than 8000 expression clones and 4000 purified proteins are publicly available free of charge.
The members of SSGCID are located in the Pacific Northwest and led by Dr. Peter Myler at the Seattle Children’s Research Institute. Much of the cloning and protein production is conducted by a group directed by Dr. Wesley Van Voorhis at the University of Washington. The X-ray crystallography is overseen by Dr. Tom Edwards and his team at UCB (Bainbridge Island). Protein targets that fail to crystallize or crystallize but diffract poorly are directed to one of two NMR sites: Dr. Gabriele Varani at the University of Washington or myself at PNNL. At the NMR sites, recombinant methods are used to generate protein enriched with “NMR visible” isotopes, carbon-13 and nitrogen-15. If the fingerprint 1H-15N HSQC spectrum is favorable (good chemical shift dispersion in both dimensions), a suite of NMR experiments is collected towards the objective of solving the protein’s structure in solution by a process described in more detail below. Relative to solving a structure using X-ray methods, structure solution by NMR still requires considerably more human involvement. Hence, to be cost-effective, our group solves about 19-times more structures by X-ray than NMR.
Needless to say, after the news of the first reported COVID-19 case in Wuhan, China on December 31, 2019, and the sequencing of the genome, our team rapidly prioritized structural genomics efforts to focus on elucidation of important protein structures from SARS-CoV-2. Before the end of February these efforts resulted in the cryo-EM structure of the SARS-CoV-2 spike protein (S-protein) by SSGCID co-PI Dr. David Veesler at the University of Washington. Multiple other structural genomic efforts are in progress and are moving along as rapidly as possible under the current policies of social distancing in Seattle, Bainbridge Island, and Richland, WA.
MC: You are using snippets of genetic code to study one protein from SARS-CoV-2 at a time, growing the individual proteins in bacteria. Can you tell us more about this process?
GB: The central dogma of molecular biology is that genetic information can flow from DNA to RNA to protein with some reverse transfer of information from RNA to DNA but no reverse transfer of information from protein to RNA. Coronavirus is a positive-sense, single-stranded RNA virus. Once inside human host cells, SARS-CoV-2 +RNA is released into the cell’s cytoplasm and translated directly into protein by the host cell’s ribosomes. There are 27 proteins encoded in the viral +RNA and among them a protein called RNA-dependent RNA polymerase (RDRP) which, with help from a few other smaller viral proteins, synthesizes a -RNA strand that is replicated into +RNA (the starting point). Sequencing the genome of an organism is a routine procedure and soon after SARS-CoV-2 was isolated its genome was sequenced.
Due to the central dogma of molecular biology along with universal “start”, “stop” and other sequence signals in the +RNA, scientists immediately knew the corresponding double-stranded DNA sequence that would make the same proteins encoded by the +RNA (the first step in the dogma scheme). These DNA sequences were then chemically synthesized (commercially), inserted into engineered plasmids (small circular double-stranded DNA that replicate independent of the chromosome) that are used to transform competent Escherichia coli cells (bacteria). The plasmids are engineered such that the transcription and translation of the inserted DNA into protein can be turned on (expressed) with a chemical (typically a sugar). Hence, each viral protein can be made using bacteria (recombinant DNA technology). After the induction of protein expression by the sugar for a period of time, the cells are harvested, lysed (“blown up”), and the protein isolated using established chromatography techniques. It is with this recombinant DNA technology with E. coli that the vast majority of proteins are made by SSGCID. It should be noted that post-translational modifications are important with some proteins, such as the SARS-CoV-2 S-protein. In such cases, cell-free systems or eukaryotic cell lines (ideally human) need to be used to express these proteins.
MC: There are 27 proteins packed in SARS-CoV-2 that are being studied across the globe. Of these 27 proteins, which have been studied most extensively? Which proteins are particularly exciting to study in terms of drug discovery?
GB: Laboratories throughout the world are likely focusing on the major protein on the virus’s outer coat, the S-protein, and the protein complex responsible for replicating the viral +RNA, the RNA-dependent RNA polymerase (RDRP) complex.
As the name suggests, the S-protein is responsible for the spike-like protrusions visible on electron micrographs. The organization of these spikes impart a “crown” or “corona-like” appearance to the virus and is responsible for the name assigned to these viruses, coronaviruses. It is this glycoprotein that binds to the ACE2 receptors on the surface of certain cells in our respiratory system. This is also the major antigen used by our immune system to recognize this intruder and the primary basis of vaccine development and antibody-based therapies. Viral binding to ACE2 receptors by itself will not allow the virus to gain entry inside the cell, the spike protein must first be cleaved by a human protein, furin. A chemical that prevents this protease from activating the S-protein may be an effective treatment strategy against COVID-19 and this is the postulated mechanism used by the controversial drug hydroxychloroquine.
The SARS-CoV-2 genome encodes only 27 proteins and among these are the enzymes required to replicate its +RNA, the RDRP complex. This is a feature exclusive to RNA viruses - a similar complex does not exist in eukaryotic host cells. This distinction makes the RDRP complex an excellent target for an antiviral drug because if you stop the virus from replicating you will stop its propagation. One promising drug is remdesivir, a phosphoramidate prodrug with activity against the related virus MERS-CoV. Recent studies suggest the metabolomic product of remdesivir is incorporated into the growing RNA more readily than its natural substrate, ATP. Once incorporated, RNA chain extension stops three nucleotides later. This appears to be fortunate because the genome of SARS-CoV-2 contains an RNA proofreading enzyme, the exononuclease Nsp14. The “bad” nucleotide appears too distant from the end of the nascent RNA chain to be repaired by Nsp14. In addition to the major component of the RDRP complex, the protein Nsp12, viral proteins Nsp7 and Nsp8 may also be essential for the complex to function properly. Hence, small molecules that prevent these co-factor proteins from binding to Nsp12 may also serve as an antiviral drug. Other viral proteins are believed to assist viral replication by trimming away excess RNA (Nsp15), unwind the RNA to assist replication (Nsp13), protecting the viral RNA from host cell RNases (Nsp10 and Nsp16), and creating a mini-environment (bubble) for the assembly of baby viruses (Nsp2, Nsp4, and Nsp6). Drugs that interfere with the function of any of these assistant proteins could also play a role in slowing down the virus.
In addition to these two primary strategies to stagger, if not completely knock-out, the virus are a set of “accessory proteins” for which less is known about their function. These proteins are called Orfs, open reading frames, and there are five of them (Orf 3a, Orf6, Orf7a, Orf8, and Orf10). The primary amino acid sequence of Orf8 is very different from the corresponding proteins observed in other coronaviruses and there is no corresponding protein for Orf10.
"My colleagues and I are proud to see the rapid prioritization of our established structural biology resources to respond quickly to COVID-19 and it is exciting working on something that is affecting the lives of everyone on this planet at this moment."
MC: How are you managing the large data sets that are generated through the research?
GB: The quick answer is that it is not an issue from a computational perspective. However, because there is still a large human component in solving protein structures using NMR spectroscopy, one has to adopt “accounting-like” concepts to keep organized. This is because at the heart of solving a protein structure using NMR spectroscopy, one needs to assign the chemical shifts for the overwhelming majority of the proton, carbon, and nitrogen atoms. For example, the empirical formula for the 107-residue human high mobility group protein HMGA is C487H833N163O167S1. This means potentially 487 carbon, 833 hydrogen, and 163 nitrogen chemical shifts need to be assigned to their corresponding position in the chemical structure. This is a substantial number of atoms and this is not an especially large protein. The assignments can be accomplished with the interpretation of a suite of two-, three- and sometimes four-dimensional experimental NMR data sets in a process which essentially involves “connecting the dots (cross peaks)” among the various spectra. With these assignments in hand, computer programs can then make sense out of the most important NMR data set in terms of structure calculations, the NOESY data set.
This is an experiment that identifies pairs of equivalent protons that “talk” to each other through space via something called the Nuclear Overhauser Effect (NOE). The size of this NOE signal is proportional to 1/r6 and generally is undetectable at distances greater than approximately 6 Å. Hence, NOEs identify protons that are within ~6 Å of each other and the strength of each NOE is proportional to the distance between the pair of protons. Structure calculation programs take that information (along with other bits of data such as the chemical shifts and predicted torsion angle ranges) blended in with some molecular dynamics to spew out an ensemble of structures that satisfy the experimental input. While the overall process requires the accumulation of a significant amount of raw NMR data over the course of two to three weeks on an NMR spectrometer (in the absence of non-uniform sampling), storage space is cheap. All this raw data can be converted into an interpretable form easily with modest computer systems and the output stored and analyzed on today’s powerful personal laptop computers. In the end, at any given time I will have the processed NMR data for 10-20 proteins on my laptop that I can work on anywhere. If I am not in a hurry, I can even perform the structure calculations themselves on a laptop (however, I typically login into larger computers at PNNL where this is done much more rapidly).
"In the post-COVID-19 world it will be important that we don’t let our guard down and we continue to push the frontiers of scientific research to keep one step ahead of the many other infectious diseases lurking out there waiting for a weak spot in our armor."
MC: You are working mostly at night, when laboratory staffing is minimal and social distancing is easier. How has the COVID-19 outbreak impacted your typical research routine? What major challenges have you encountered?
GB: To be honest, in normal times scientists put in more than your typical 40-hour work week. This is partially because during the day it is often hard to get solid blocks of real work completed, especially in the wet-lab, due to interruptions by meetings and social events. Personally, my normal routine was that in the evenings, if I was not working on writing a paper at home, I would often return to the lab to get a few things done undisturbed (it is only a five minute drive from home). The major difference is that in the pre-COVID-19 days I often did this before or after an evening beer-league hockey game (Dr. Who fans might like to know that I manage the Washington TimeLords).
Now, I obviously have even more time available to work in the wet-lab in the evenings. Consequently, it wasn’t much of a change working more evening shifts after the COVID-19 social distancing mandates. I usually don’t complain about the long weeks, especially with regard to the laboratory part, because how many people get paid to work on their hobby? In summary, putting in more evening shifts is easy and because of the extensive teleworking of many people during the day, I’m now left largely undisturbed during the day too. The biggest challenge was getting used to wearing gloves while in the lab everyday (another COVID-19 mandate). The next biggest challenge will likely be getting used to wearing face masks when more people return to the lab.
MC: How does it feel to be a scientist contributing to the global efforts to understand and fight against SARS-CoV-2?
GB: NIAID established two structural genomics centers dedicated to organisms responsible for infectious diseases with the goal of building a library of three-dimensional protein structures of potential drug targets that may be used as “blueprints” for structure-guided design of new drugs. There is a great urgency for new therapies for many reasons. Foremost, the number of therapies for many infectious diseases is shrinking due to the emergence of antimicrobial resistance in bacterial and eukaryotic pathogens (see the 2020 special issue of Protein Science). Second, a potential consequence of rapid climate change is the emergence of epidemic infectious zoonotic diseases as the habitats of disease carriers change to expose human populations to new infectious agents. Third, for some infectious diseases no well-established therapies exist. The rapid spread of COVID-19 that has shut the entire world down in a matter of months highlights all too well the fragile balance in the continuous war between man and the microbial world.
My colleagues and I are proud to see the rapid prioritization of our established structural biology resources to respond quickly to COVID-19 and it is exciting working on something that is affecting the lives of everyone on this planet at this moment. We certainly don’t merit praise, this all goes to our heroic first responders (our laboratories right now are probably one of the safest places in town). However, in the post-COVID-19 world it will be important that we don’t let our guard down and we continue to push the frontiers of scientific research to keep one step ahead of the many other infectious diseases lurking out there waiting for a weak spot in our armor.
MC: Based on your expertise in this field, what do you envision being the next breakthrough in the SARS-CoV-2 research space, with particular reference to structural biology?
GB: The structure of the M-protein. This is a 222-residue protein that is an integral part of the virus’s surface exposed coat. Residues 1-103 are predicted to contain three membrane spanning regions and residues 104-222 are exposed on the outer surface of the virus. It is not known if this protein plays other functional roles aside from helping to hold the viral coat together. Solving the structure of membrane proteins is challenging. This membrane protein may be especially challenging because it is too small to be determined using current cryo-EM methods. Consequently, it may be necessary to use more exotic methods to obtain the structure for the complete protein, such as placing 13C-, 15N-labeled protein into nanodiscs – synthetic model membrane systems – and solving its structure using NMR-based methods. Such an approach might take a significant amount of time-consuming trial and error to optimize NMR data collection conditions.
Garry Buchko was speaking with Molly Campbell, Science Writer, Technology Networks.