2nd PROSITE Workshop,

27-29 June 2001

IGS-Marseille




Workshop Project

A new strategy for the design of functional signatures in protein sequences

Purpose

Regular expression-based motifs, at the origin of the concept of functional signatures for proteins are progressively being supplanted by more sophisticated probabilistic methods, such as position weight matrices, Hidden Markov Models and Neural Networks. We will present a new strategy to turn multiple alignments into specific regular expressions using an interactive JAVA-based software system called REAL (Regular Expression Analysis and Location). Our approach is based on the computation and parallel display of various profiles computed both from the average value and variability, of numerous amino acid properties. Relevant positions can also be selected on the basis of their information content, relative to a given property. The Information Content provides a quantitative measure of the contrast between residue variability and property invariance. In the automatic mode, a simple probabilistic framework is used to extract the most informative positions. Once the position selected on the basis of one (or several) of the many available criteria, a regular expression is automatically generated from the multiple alignment using a simple quantitative rule. Tentative signatures are tested with the companion database scanning program LOOKFOR. Possible applications of the REAL software will be presented through two examples.

Contact

Chantal.Abergel