# Darwin Sample session (August, 2000) # # Load a small peptide file DB := ReadDb('enolaseTEST'); # After this command, the enolase database is loaded and # most computations will refer to this database. The name # DB can be used to refer to the database. # # Compute the counts of the amino acids in this database Count := GetAaCount(DB); # Print the frequencies in a more readable format print(Count); # # Now compute a Dayhoff matrix based on these frequencies. DM := CreateOrigDayMatrix( Mutations1978, Count, 250 ); # # print it in a more readable format: print(DM); # Notice that the Dayhoff command takes two additional arguments: # a set of observed mutations (in this case the original # mutations matrix reported by Dayhoff et al.) and the PAM # distance, as it is usual, 250. # # Compute a subset of the all-against-all matching of the # enolases (first half against first half): AllAll := GetSelfAlign( DB, DM, 100, 1,1,2 ); # The matchings are described, at this stage, by two offsets # into the amino acid database. # # The matchings are printed as they appear and are also returned # as an array, in this case assigned to the variable AllAll. # These matchings reached the goal of 100 (third parameter) # according to the Dayhoff matrix passed as second argument. # # Running the dynamic programming algorithm (Needleman & Wunsch) # up to a point where a maximum is found is called "GlobalAlign" m := GlobalAlign(AllAll[1],DM); # Now print this refined match: print(m); # A matching is printed aligned, and the middle line contains # characters which make it easier to estimate the quality of the # similarity. A "|" indicates an exact match, an "!" # indicates a very likely mismatch, a ":" a likely mismatch, # a "." an unlikely mismatch, and a space a very unlikely # mismatch. # # Running the dynamic programming from left to right and then # right to left until it stabilizes usually gives a better # alignment. This is called " LocalAlign": print( LocalAlign(m,DM)); # The following statement creates 300 Dayhoff matrices, one # for every possible PAM distance between 1 and 300. DMS := CreateOrigDayMatrix( Mutations1978, Count, 1..300 ): # Compare the values printed for the first Dayhoff matrix with # the values for PAM=20. The smaller the PAM distance, the # higher the penalty for mismatches (negative off-diagonal # entries and deletion penalties) print(DMS[20]); # Now find the most likely PAM number for the previous match: FindBestPam(m,DMS); # The last argument of the Match, the PAM number, indicates # the time since divergence in PAM units. # # We can also search for an arbitrary string in the database: String( CaseSearchString( 'LPVPAFNVINGGSHA', DB[string] )); # or searching for all the occurrences of an amino acid sequence: String( SearchDb( 'SFRDA' )); # To see all the selectors for DB type "?database" # # Find the longest repetition within the database: FindLongestRep(DB); # Darwin also allows us to do arithmetic and general programming, # for example: a := 1; x := 0.01; for b from 10 by 5 to 20 do printf( '(a+x)^%d = %g\n', b, (a+x)^b ) od; A := [[1,0,0],[0,2,1],[1,0,3]]; print(A*A); GaussElim( A, [1,1,1] ); A * "; SqrtA := exp(ln(A)/2); print(SqrtA*SqrtA); # How much CPU time did all of this take? time(); #Now, to end the session you can type "done" yourself. #When you do, some data will be displayed. This describes #the internal workings of the alignment algorithms, and then #finally, after every darwin session, the total cpu time used. # #Happy Darwining!