Computational Biochemistry Research Group

Site Map
 



Next: Usage Up: The Computational Biochemistry Server Previous: Methods

MassSearch: searching SwissProt or EMBL by protein mass after digestion

In some cases, recognition of proteins can be done by fragmenting the protein according to certain patterns and using the molecular weights of the fragments as a trace. This method is not effective to find the composition of an unknown protein, but it is effective in locating an unknown sample if its sequence is recorded in a protein database.

One of the ways of breaking a protein into smaller pieces according to a certain pattern is by using enzymes which digest the protein. For example, trypsin breaks a protein after every Arginine (R) or after every Lysine (K) not followed by a Proline (P). AspN breaks a protein before every Aspartic acid (D). A table of recognized enzymes and their cleavage rules is given below.

The molecular weight of fragments can be found experimentally by mass spectrometry methods to a good level of accuracy. More importantly, these methods typically require very small samples in the order of fractions of pico-moles.

The problem of identifying a sampled protein can be reduced to digesting the protein with an enzyme, finding the molecular weights of each of the pieces and then comparing this set of weights to what would be obtained from the digestion of each protein in the database. The process can be repeated with several different enzymes to increase its selectivity.

The function MassSearch locates the best candidates in the SwissProt database that would fit the given weights once digested by the given enzyme. The function DNAMassSearch locates the best candidates in the EMBL DNA database that would encode to a protein that would fit the given weights.

This type of searching has been found particularly useful in the following circumstances:

  • To identify proteins when the amount available is very small, for example as can be separated by 2D gels.
  • To determine whether an unknown protein is already known in the database before spending a significant effort in sequencing.
  • To identify more than one protein which cannot be separated by other means (this method has been successfully used to identify two proteins which were digested together).

Increased precision in the searching is obtained when more than one digestion is available. In general it is much better to perform 2 digestions with different enzymes (with half of the material and hence at a slightly lower accuracy) than a single digestion with all the material. The precision of the retrieval increases with the number of digestions available.




Next: Usage Up: The Computational Biochemistry Server Previous: Methods


CBRG
 
Comments, questions, problems? darwin.comments@inf.ethz.ch
CBRG