Nanopore sequencing is now proving to a be a fruitful field for elucidating proteins as it has previously for DNA sequencing. With emphasis moving towards personalized and precision medicine, a tool that could analyse an individual’s proteome and provide insightful data would therefore be very timely indeed. However, it is yet to reach the point that DNA technologies have and there are still challenges to overcome and mysteries to solve. Here we talk to Professor Giovanni Maglia (GM), group leader in chemical biology at the University of Groningen, about his group’s recent work in the area, the challenges they are working towards and the future of the field.
KS: Thinking about challenges that are still to be addressed, in your paper you mention the potential need for pre-purification steps when using biological samples and the speed at which proteins pass through the pore being too quick to identify each amino acid, required for protein sequencing. Are you working on correcting this so that it could be used for protein sequencing?
GM: Let’s say you want to identify a specific biomarker and that specific biomarker will give a specific signal, we know what it is, but other proteins might give the same signal. So the point now is to separate the two with a pre-purification step. It could be quite a crude pre-purification though because subtle differences can be identified by the nanopore. However, it’s more that a pre-purification step would bring out all the target proteins plus all the impurity that normally is a problem. I think we can identify our targets in the background of some impurity, that should be a tractable problem. However, it would be very difficult to identify targets in a background of 25,000 potential targets.
Although, a recent PLOS One study suggests that using some kind of machine learning algorithm, you can also study the protein blockade signature of a sample as a whole, and then you should be able to identify a specific biomarker. So, it’s not 100% certain that it’s a fundamental limitation of this approach.
With regards to the speed at which proteins pass through the pore, for protein sequencing, one challenge – there are many more – is that you need to first find a way to transport the polypeptide across the nanopore at a constant speed and under a constant applied potential. Polypeptides are charged, positively and negatively, so you cannot just use the electric field to drive the transport across the nanopore. We tackled the problem of transporting a polypeptide at a fixed potential across the nanopore by identifying a set of conditions in the pore and in the solution that would allow the creation of a strong water flow across the nanopore at a fixed potential that could overcome the repulsive electric field potential.
A strong water flow across the nanopore was obtained using nanopores lined by many negative charges. Under an external potential, the positive counter-ion move out of the nanopore generating a strong unidirectional water flow. Then the negative charge of the peptide, which opposes to the entry into the negatively charged nanopore, was attenuated by simply changing the pH of the solution. In other words, we found the right balance between having a nanopore that is charged enough to promote the entry of the analyte and a protein which negative charge is moderated enough to entry the nanopore.
The second challenge we needed to overcome was to see if different polypeptides gave different signals as they're translocating across the nanopore. Different DNA bases have different signals, but it was not really obvious if it was the same for proteins. People don’t really know why different molecules give different signals in the nanopore because the molecular basis of the molecular recognition by nanopore currents is not well understood.
To test the system, we selected a polypeptide that gave us a decent signal, and then choose another polypeptide which was pretty much the same but with just one amino acid different and found we could achieve two different signals. This was quite important, because it told us that two molecules that were only different by one amino acid could be differentiated.
With protein sequencing, it might be more challenging to control transport across the nanopore, but there are also certain things compared to the DNA sequencing that are easier. For example, we know the sequence of proteins from genomic analysis, so you don’t really need to sequence all the different amino acids. You could sequence five or six amino acids and then, by comparing with the sequence of proteins that you know exist in a proteome, you can still recognise the protein that you have in a sample.
The third part, which is how to control the transport across the nanopore, is not something that we tackled in this work. But of course, we're now looking at different kinds of molecular machines that can allow this to happen. For proteins it's difficult because of the need to unfold and unroll the polypeptide chain, so it remains to be seen how well you can control the transport across the nanopore. However, as I said before, because you don't really need to sequence every single amino acid to recognise a protein, it could well mean that just a rough read of an unfolded protein would be enough to allow you to trace back the sequence of the protein.
KS: The published study used proteins of 1-25 kDa but 25 kDa is relatively small compared to many proteins found in nature and the average protein in a human. Would it be a problem to scale up for larger proteins?
GM: Well, in theory you just need to have a larger nanopore. Here, the limitation would probably be how many large biological nanopores you have. The good thing is that there is quite a lot of work that's been done by my colleagues with solid-state nanopores. Instead of using a biological membrane and using a protein to punch a hole in it, the solid-state nanopores use an artificial membrane based on silicon. It can be quite thin, a few nanometres, and they drill a hole the size they want with different methods.
With the current techniques it works pretty well on the nanometre scale, but not as well sub-nanometre scale. So, if you want to have something 5-10 nanometres to study larger proteins, you can have quite a reproducible solid-state nanopore.
The challenge there would be to create a shape that would allow the trapping of the protein, because the protein enters the pore and has to stay inside for enough time to be sampled. The biological pores that we use, have a conical shape, in which you have a large entry and a narrow exit so the protein can enter and remain trapped in the narrow exit. Because the protein sees another protein they don't unfold and will stay folded inside the nanopore. It's quite a soft interaction between the analyte and the pore itself.
I would say for the solid-state pores, it's still to be seen if you can recreate the same sort of shapes and environment that allow control of the permanence of the protein inside the pore, and if the protein inside the pore will maintain a shape that allows it to be recognised reliably.
KS: Thinking about the future, how close do you think we are to being able to sequence the proteome of an individual in the fast and efficient way in which DNA now is?
GM: It's difficult to say. Past experience with DNA sequencing showed that you need just a few breakthroughs. However, you do not know when such a breakthrough will happen. They can take one, two or tens of years. However, protein analysis is different than DNA analysis. With DNA you need to sequence it, that's it. You could map DNA a little bit, that could be the intermediate step, but with proteins we have so many more intermediate possibilities. You can recognise a protein, you can sense a specific protein in the background of other proteins, you can fingerprint a protein. Because they can be quite reactive, you can react certain residues and then recognise the protein. Or you can actually just sequence the protein itself, amino acid by amino acid.
Diagnostic is one thing, just recognise a specific protein in [low abundance] for example that you have in the blood, and as a biomarker connected to disease. Or you just want to re-sequence a protein, not sequence it and identify the sequence of a protein, but just recognise a few amino acids in a sequence and then match it with genomic data. Or de novo sequencing, where with no prior information, you just literally want to know all the different amino acids.
If you talk about the last one, I would say that there is still one crucial element that somebody needs to prove before we can put a date on protein sequencing. This is, can we actually control the transport of protein across a nanopore in a unidirectional way, not even amino acid by amino acid? The polypeptide chain needs to go through, and not go back. I think that's the crucial step, so it needs to be unidirectional. If you can do that, then I would say that what we know with the DNA sequencing would be a few years away, or even less depending the resources and how many people work on that.
But before somebody can actually prove that you have a unidirectional transport across the nanopore of this unfolded polypeptide, it's not possible to put a date on how long it's going to take, I don't think.
Professor Giovanni Maglia was speaking to Dr Karen Steward, Science Writer for Technology Networks.
Click here to read part 1 of the interview.