Proteomics
Down The Rabbit Hole: what does a peptide look like in MS?
' Armel Nicolas

Interpreting Individual Peaks – part 1

If you have ever looked at a mass-spectrometer output file, you may have wondered how it is possible to make sense of all of these peaks. I mean, there really is a lot of them! Here, we will talk a bit about the basic rules which govern the behaviour of peptides in MS1 spectra and MS2 spectra, and which are used to identify the boundaries of groups of peaks belonging to the same peptide.

First, though, we may need to define the notion of “peptide”. Oh, “that’s easy”, I hear you say. A peptide is a bit of protein – usually a digestion product but it can also be an unusually short protein, right? The limit is usually arbitrarily set at 50 amino acids I believe. Yes, true[1], but in proteomics we actually need to add a few additional nuances: when proteomics people mention a “peptide”, they usually mean all of the copies of the same peptide currently present in the sample. Usually, especially when speaking of behaviour inside a mass-spectrometer, a “peptide” is understood as being not just a collection of molecules with a single primary sequence, but also the same post-translational modification state and charge. I sometimes refer to these as “entities” or “species”.

This does not mean that we do not revert back to sometimes using peptide to refer to a single copy of a peptide. It would be too simple. In fact, in the lines below I do on a few occasions use the word peptide to refer to a single copy. Context, as always, is everything.

Now, let us consider a single peptide and its behaviour in the different scans of a standard Data-Dependent Acquisition experiment:

[1] Although, with my slightly OCD mind, I have never really liked the idea of distinguishing between proteins and peptides: it is just too blurry, where one ends and the other starts is never really clear.

MS1 level

How many peaks would we expect from a single peptide as defined above? Well, it is one sequence in a single charge and modification state, so maybe, 1 peak? Wrong. It turns out I have left out another source of variation that will affect the number of peaks, namely, isotopic composition. Indeed, all isotopically distinct versions of a peptide as defined above are considered as variants of the same peptide.

Now, as you surely know, each atom exists in nature in one main light isotope and several isotopes, distinguished by adding one or more neutron to the standard atom. Each additional neutron results in a mass shift of approximately +1 Da (= 1 “amu”, atomic mass unit)[1]. The probability – which we will note PL – of a given atom in a peptide being made of one of its heavier isotopes is low, usually in the 0.01 range. However, in even a short peptide there are easily a hundred or more atoms. If we use the very rough approximation of PL = 0.01, then the chart below represents the probabilities of peptides of 100, 200, 300, 400 or 500 atoms having between +0 and +10 neutrons:

[1] Thanks to the strong nuclear force, the precise value depends on atomic nucleus context and how much the additional neutron changes nuclear stability. Amazingly, Orbitraps are able to resolve the minuscule difference between +1 neutron (hydrogen -> deuterium), +1 neutron (12C -> 13C) or +1 neutron (14N -> 15N). This remarkable technologic feat makes NeuCode SILAC or TMT-10plex labelling possible.

Additional neutrons relative to monoisotopic peak

What the hell?!?! I am doing graphs in Excel? I feel like I just cheated on ggplot2!

  • As you can see, the curve is becoming more and more shifted to the right and bell-shaped as peptide size increases. At 500 atoms the monoisotopic peaks is almost lost! What each of these curves represents is an isotopic envelope, the shape that the multiple peaks resulting from a single peptide sequence will take.The mass of a peptide made of, for each atom, only the isotopes most abundant in nature is called its monoisotopic mass. Because, for most elements including those of interest for us in biochemistry, the most frequent isotope is also the lightest, a peptide’s monoisotopic mass will correspond to the first peak on the left of its isotopic envelope. Identifying the monoisotopic mass is part of the process of finding out what a peptide is, as it massively reduces the available possibilities. This can now be done with very high accuracy on instruments equipped with an Orbitrap mass detector.While we are on the subject of isotopic envelopes, I feel like we should speak a bit about SILAC labelling. The way SILAC labelling works is by introducing amino acids with a known, set number of heavier isotopes. This is usually done on arginine (R) and lysine (K), the two amino acids after which trypsin cleaves, thus ensuring that for each protein all (except the C-terminal) peptides will be labelled. The most common configuration for triple SILAC labelling is:
    • Light: R0 and K0
    • Medium: R6 and K4
    • Heavy: R10 and K8

    (the numbers after each amino acid letter are the number of additional neutrons)

    Since for each peptide all other amino acids will still be subject to the same statistical distribution as above, this means that the labelling will very slightly modify the shape of each envelope (we know the isotope present for all atoms from the labelled amino-acids), but most importantly it will shift them to the right: the monoisotopic peak for a peptide containing a single R6 or R10 and with charge +2 will be shifted 3 and 5 Th to the right relative to an R0 peptide. Thus, a mixture of a same peptide labelled with R0, R6 and R10 will materialise as three distinguishable isotopic envelopes:

peptide comparison Lysine/Arginine

MS2 level

At the MS2 level, things become waaaay more complicated because of fragmentation. So in order for this blog entry to stay a reasonable size, I will delay this until after the Christmas break.

Point of View
Related Posts: