fokidisc.blogg.se - Biological sequence analysis

#Biological sequence analysis series#

If gaps are allowed, then the percentage can increase to 25%. For protein sequences, there are 20 possible amino acid residues, and any two unrelated sequences can match at up to 5% of the residues. Second, the accuracy of sequence alignments drops off rapidly in cases where the sequence identity falls below a certain critical point. Furthermore, the alignment approach may often overlook rearrangements on an even smaller scale for instance, the linear and modular organization of proteins is not always preserved due to frequent domain swapping, or duplication or deletion of long peptide motifs. As a result, each genome becomes a mosaic of unique lineage-specific segments (i.e., regions shared with a subset of other genomes). These large-scale evolutionary processes essentially occur all the time in the genomes of other organisms. A good example is viral genomes, which exhibit great variation in the number and order of genetic elements due to their high mutation rates, frequent genetic recombination events, horizontal gene transfers, gene duplications, and gene gains/losses. However, this assumption, which is termed collinearity, is very often violated in the real world.

#Biological sequence analysis series#

However, as our understanding of complex evolutionary scenarios and our knowledge about the patterns and properties of biological sequences advanced, we gradually uncovered some downsides of sequence comparisons based solely on alignments.įirst, alignment-producing programs assume that homologous sequences comprise a series of linearly arranged and more or less conserved sequence stretches. The procedure assumes that every sequence symbol can be categorized into at least one of two states-conserved/similar (match) or non-conserved (mismatch)-although most alignment programs also model inserted/deleted states (gaps). Many successful alignment-based tools were created including sequence similarity search tools (e.g., BLAST, FASTA ), multiple sequence aligners (e.g., ClustalW, Muscle, MAFFT ), sequences’ profile search programs (e.g., PSI-BLAST, HMMER/Pfam ), and whole-genome aligners (e.g., progressive Mauve, BLASTZ, TBA ) these tools became game-changers for anyone who wanted to assess the functions of genes and proteins.Īll alignment-based programs, regardless of the underlying algorithm, look for correspondence of individual bases or amino acids (or groups thereof) that are in the same order in two or more sequences. At that time, many computational biologists quickly became stars in the field by developing programs for sequence alignment, which is a method that positions the biological sequences’ building blocks to identify regions of similarity that may have consequences for functional, structural, or evolutionary relationships. The 1980s and 1990s were a flourishing time not only for pop music but also for bioinformatics, where the emergence of sequence comparison algorithms revolutionized the computational and molecular biology fields.