Computational Biochemistry Research Group

Site Map
 

All against All of SwissProt version 38 Computed Using BioOpera

Given a set of nucleotide or peptide sequences, the standard first step into any inquiry of the evolution, structure, and ultimately function of these biomolecules is the alignment of each sequence in this set against every sequence in a large dataset. Here we used SwissProt version 38.

In total, SwissProt v. 38 contains 80,000 amino acid sequences. An all vs. all comparison requires approximately 3.2x10^9 individual pairwise alignments for this dataset. As an indication of what this implies, over the past 7 years, the CBRG has updated (and made public) the all vs. all comparison of SwissProt.

Largescale computations are pervasive in bioinformatics due to the size of the datasets involved and the resource demands of the algorithms. These computations typically run for long periods of time and produce vast amounts of data. Currently, users are required to manage system resources, data, and the distribution of each process throughout the entire computation. Not surprisingly, this introduces a major bottleneck in the whole procedure. BioOpera can be used in heterogeneous computing environments and provides capabilities for specifying complex sequences of computations and tools for monitoring their execution. It can also persistently store all intermediate and final results produced. To demonstrate the power of BioOpera, we performed an all vs. all alignment of Swiss-Prot version 38. This computation, which requires every sequence to be aligned against every other sequence, involved several billion pairwise Smith-Waterman alignments and ran for approximately one month in the background of a heavily loaded cluster of machines. Our need to intervene on behalf of BioOpera was minimal and the entire computation required signicantly less time to complete than previously performed manual efforts.

You can query the completed AllAll database on our server/services page.

Contact Information: darwin.comments@inf.ethz.ch

Friday, April 21, 2000

 
Comments, questions, problems? darwin.comments@inf.ethz.ch
CBRG