Aligning protein sequences with predicted secondary structure

John Kececioglu, Eagu Kim, Travis Wheeler

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Accurately aligning distant protein sequences is notoriously difficult. Since the amino acid sequence alone often does not provide enough information to obtain accurate alignments under the standard alignment scoring functions, a recent approach to improving alignment accuracy is to use additional information such as secondary structure. We make several advances in alignment of protein sequences annotated with predicted secondary structure: (1) more accurate models for scoring alignments, (2) efficient algorithms for optimal alignment under these models, and (3) improved learning criteria for setting model parameters through inverse alignment, as well as (4) in-depth experiments evaluating model variants on benchmark alignments. More specifically, the new models use secondary structure predictions and their confidences to modify the scoring of both substitutions and gaps. All models have efficient algorithms for optimal pairwise alignment that run in near-quadratic time. These models have many parameters, which are rigorously learned using inverse alignment under a new criterion that carefully balances score error and recovery error. We then evaluate these models by studying how accurately an optimal alignment under the model recovers benchmark reference alignments that are based on the known three-dimensional structures of the proteins. The experiments show that these new models provide a significant boost in accuracy over the standard model for distant sequences. The improvement for pairwise alignment is as much as 15% for sequences with less than 25% identity, while for multiple alignment the improvement is more than 20% for difficult benchmarks whose accuracy under standard tools is at most 40%.

Original languageEnglish (US)
Pages (from-to)561-580
Number of pages20
JournalJournal of Computational Biology
Volume17
Issue number3
DOIs
StatePublished - Mar 1 2010

Keywords

  • Affine gap penalties
  • Inverse parametric alignment
  • Protein secondary structure
  • Sequence alignment
  • Substitution score matrices

ASJC Scopus subject areas

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Aligning protein sequences with predicted secondary structure'. Together they form a unique fingerprint.

Cite this