Pro-Coffee
A Multiple Sequence Alignment Tool for Promoter Regions


HOME SERVERS DOWNLOAD DOCUMENTATION OUTPUT CONTACT



What is Pro-Coffee?

Pro-Coffee is a multiple sequence alignment method specifically designed for promoter regions. It is part of the T-Coffee distribution. Pro-Coffee takes nearest-neighbour nucleotide correlations into account when aligning DNA sequence. For this it first translates sequences into a di-nucleotide alphabet and then does the alignment using a specifically designed di-nucleotide substitution matrix. This matrix was constructed from binding sites alignments from the Transfac data base. A benchmark on multi-species ChIP-seq data shows that validated binding sites will be better aligned than when using off-the-shelf methods.

An article about the method and its evaluation is in preparation.


SERVERS



DOWNLOAD

  • Pro-Coffee is a special mode of T-Coffee. Download the latest T-Coffee version for Pro-Coffee here.
  • Download benchmark data sets used in the paper "Use of ChIP-Seq data for the design of a multiple promoter alignment method" here.


INSTALLATION

Follow the standard T-Coffee installation procedure.

You should also take a look at the T-Coffee Technical Documentation.




OUTPUT

Given a sequence file regions.fa, Pro-Coffee outputs three different kinds of files:

regions.dnd contains the guide tree used to assemble the progressive alignment,

regions.aln contains the final alignment in ClustalW format, and

regions.html contains the final alignment colored according to and index ranging from red (very consistent) to blue (poorly consistent).

The example below shows part of a 2000 bp upstream regions alignment of the human gene c18orf19 to various orthologous regions. Highlighted in yellow are ChIP-seq regions of the CEBPA transcription factor. Predicted binding sites falling in this region are shown in green, predicted sites outside of the regions are shown in red. Pro-Coffe manages to align the proven binding sites while default T-Coffee fails to align these sites.

(ChIP-seq raw data taken from: Dominic Schmidt, et al. Science 328, 1036 (2010))


DOCUMENTATION

The full documentation is on the T-Coffee Homepage. But the following shortcuts may be useful.

To run procoffee, type
t_coffee regions.fa -mode=procoffee
To modify your gap costs (default gap opening -60, gap extension -1) type
t_coffee regions.fa -method promo_pair@EP@GOP@-60@GEP@-1



CONTACT

Our projects rely on your feeback. Please send me an E-mail if you wish to make a request, a comment, or report a bug!

*******************************************
Dr. Cedric Notredame, PhD.
Group Leader
Comparative Bioinformatics Group
Bioinformatics and Genomics Programme
Center for Genomic Regulation (CRG)
Dr Aiguader, 88
08003 Barcelona
Spain
Email: cedric.notredame@gmail.com
HOME : http://www.tcoffee.org/homepage.html
GROUP: CRG
Phone: +34 933 160 271
*******************************************



HOME SERVERS DOWNLOAD DOCUMENTATION OUTPUT CONTACT