Ideally, each MS2 spectrum would contain enough information to go back unambiguously to the sequence of its parent precursor. Since this is actually rarely the case, a strategy called Database Searching is usually used instead.
MS2 fragmentation spectra are used to identify the sequence behind their MS1 parent peak. However, MS2 spectra rarely allow unambiguous precursor identification, because fragmentation may be incomplete, or too extensive, or several precursors may have been co-isolated. Thus, while de novo sequencing is trivial for Genomics and Transcriptomics, it remains an elusive Holy Grail in Proteomics.
Instead, a strategy called Database Searching is commonly used: observed spectra are compared to theoretical spectra predicted from expected peptide sequences and assigned the best matching sequence and a score. Only peptide sequences existing in the database and with Post Translational Modifications included in the search parameters can be identified.
The False Discovery Rate (FDR) is controlled by also searching spectra against a database of “fake” peptides, typically derived from digesting in silico the inverted or scrambled sequences of expected proteins.
For a given Peptide-Spectrum Match (PSM), a Posterior Error Probability (PEP, a local form of FDR) is calculated as the proportion of fake peptides in the same score bin. A peptide’s Q-value is the percentage of decoy hits among PSMs with higher or equal scores (FDR if the threshold was set at this peptide’s score).