All
against All of SwissProt version 38 Computed Using BioOpera
Given a set of nucleotide
or peptide sequences, the standard first step into any inquiry of
the evolution, structure, and ultimately function of these biomolecules
is the alignment of each sequence in this set against every sequence
in a large dataset. Here we used SwissProt version 38.
In total, SwissProt
v. 38 contains 80,000 amino acid sequences. An all vs. all comparison
requires approximately 3.2x10^9 individual pairwise alignments for
this dataset. As an indication of what this implies, over the past
7 years, the CBRG has updated
(and made public) the all vs. all comparison of SwissProt.
Largescale computations
are pervasive in bioinformatics due to the size of the datasets
involved and the resource demands of the algorithms. These computations
typically run for long periods of time and produce vast amounts
of data. Currently, users are required to manage system resources,
data, and the distribution of each process throughout the entire
computation. Not surprisingly, this introduces a major bottleneck
in the whole procedure. BioOpera
can be used in heterogeneous computing environments and provides
capabilities for specifying complex sequences of computations and
tools for monitoring their execution. It can also persistently store
all intermediate and final results produced. To demonstrate the
power of BioOpera, we performed an all vs. all alignment of Swiss-Prot
version 38. This computation, which requires every sequence to be
aligned against every other sequence, involved several billion
pairwise Smith-Waterman alignments and ran for approximately one
month in the background of a heavily loaded cluster of machines.
Our need to intervene on behalf of BioOpera was minimal and the
entire computation required signicantly less time to complete than
previously performed manual efforts.
You can query the completed
AllAll database on our server/services
page.
Contact Information:
darwin.comments@inf.ethz.ch
Friday, April 21, 2000
|