April, 2005

 

 

 

 

 

 

T-COFFEE (version 2.xx)

                                               

§       Installation Notes

§       Technical Notes

§       Documentation

§       Tutorial

§       F.A.Q

 

 

Cedric Notredame (cedric.notredame@europe.com)



Table of Contents

T-COFFEE (version 2.00) 1

§    Installation Notes. 1

§    Technical Notes. 1

§    Documentation.. 1

§    Tutorial.. 1

§    F.A.Q.. 1

License and Terms of Use.. 9

Addresses and Contacts. 10

Citations. 11

T-Coffee.. 11

Mocca.. 12

DESCRIPTION.. 13

What is T-Coffee?. 13

What does it do?. 13

What can it align?. 13

How can I use it?. 13

Is T-Coffee different from ClustalW?. 13

What Can T-Coffee do for you (and what it cannot do) 14

(NOT) Fetching Sequences. 14

Aligning Sequences. 14

Combining Alignments. 14

Evaluating Alignments. 14

Combining Sequences and Structures. 14

Identifying Occurrences of a Motif: Mocca.. 15

How Does T-Coffee works. 15

INSTALLATION.. 16

Standard Installation.. 16

Extended Installation and other Packages. 17

QUICK START.. 19

T-COFFEE.. 19

MOCCA.. 19

RECENT MODIFICATIONS. 20

T-COFFEE REFERENCE MANUAL.. Error! Bookmark not defined.

Well Behaved Parameters. 39

Separation.. 39

Posix.. 39

Entering the right parameters. 39

Parameters Syntax.. 39

Default Usage and Configuration Files. 39

No Flag. 40

-parameters. 40

-t_coffee_defaults. 41

-special_mode. 41

-do_score. 41

-convert [cw] 42

-do_align [cw] 42

Special Parameters. 42

-version. 43

-check_configuration. 43

-cache. 43

-update. 43

-full_log. 43

-other_pg. 43

Input.. 44

Sequence Input. 44

-infile [cw] 44

-in. 44

-get_type. 44

-type [cw] 44

-seq. 45

-seq_source. 45

-seq_to_align. Error! Bookmark not defined.

Structure Input. 45

-pdb. 45

Tree Input. 46

-usetree. 46

Methods and Library Input. 46

-in. 46

Profile Input. 48

-profile. 48

-profile1 [cw] 49

-profile2 [cw] 49

Alignment Computation.. 49

Library Computation: Methods. 49

Library Computation: Extension.. 49

-do_normalise. 50

-extend. 50

-extend_mode. 50

-max_n_pair 50

-seq_name_for_quadruplet 51

-compact 51

-clean. 51

-maximise. 51

-do_self. 51

-seq_name_for_quadruplet 51

-weight 51

Tree Computation.. 52

-tree_mode. 52

-quicktree [CW] 53

Pairwise Alignment Computation.. 53

-dp_mode. 53

-ktuple. 54

-ndiag. 54

-diag_mode. 55

-diag_threshold. 55

-sim_matrix. 55

-matrix [CW] 55

-nomatch. 56

-gapopen. 56

-gapext 56

-fgapopen. 56

-fgapext 56

-cosmetic_penalty. 56

-tg_mode. 57

Weighting Schemes. 57

-seq_weight 57

Multiple Alignment Computation.. 58

-msa_mode. 58

-profile_mode. 58

-profile_comparison. 58

Alignment Post-Processing.. 58

-clean_aln. 58

-clean_threshold. 59

-clean_iteration. 59

-clean_evaluation_mode. 59

-iterate. 59

CPU Control.. 60

Multithreading.. 60

-multi_thread [NOT IMPLEMENTED] 60

Limits. 60

-mem_mode. 60

-ulimit 60

-maxlen. 60

Aligning more than 100 sequences with DPA.. 60

-maxnseq. 60

-dpa_master_aln. 61

-dpa_maxnseq. 61

-dpa_min_score1. 61

-dpa_min_score2. 61

-dap_tree [NOT IMPLEMENTED] 62

Using Structures. 62

Generic.. 62

-special_mode. 62

-check_pdb_status. 62

3D Coffee: Using SAP. 62

Using PDB templates for the Sequences. 63

-template_file. 63

-struc_to_use. 63

Multiple Local Alignments. 64

-domain. 64

-start 64

-len. 65

-scale. 65

-domain_interactive [Examples] 65

Output Control.. 66

Generic.. 66

Alignments. 66

-outfile. 66

-output 67

-outseqweight 67

-case. 67

-cpu. 68

-outorder [cw] 68

Libraries. 68

-lib_only. 69

Trees. 69

-newtree. 69

Reliability Estimation.. 69

CORE Computation.. 69

-evaluate_mode. 70

Generic Output.. 70

-quiet 71

USING NEW AND EXISTING METHODS. 20

Methods With An Established  T-Coffee Interface.. 22

Methods Without An Established T-Coffee Interface. 24

Generate Your Own Libraries. 29

Making a New Method File.. Error! Bookmark not defined.

FAQ.. 30

Abnormal Terminations and Wrong Results. 30

Input/Output Control.. 31

Alignment Computation.. 34

Alignment Evaluation.. 37

BUILDING A SERVER.. 72

Common Problems. 72

FORMATS. 72

Parameter files. 72

Sequence Name Handling.. 72

Automatic Format Recognition.. 73

Structures. 73

Sequences. 73

Alignments. 74

Libraries. 74

Substitution matrices. 76

ClustalW Style.. 76

BLAST Format [Recommanded] 76

Sequences Weights. 77

KNOWN PROBLEMS. 78

TECHNICAL NOTES. 78

Contributions. 78

 


License and Terms of Use

 

Please make sure you have agreed with the terms of the license attached to the package before using the T-Coffee package or its documentation. T-Coffee is a freeware open source distributed under a GPL license. This means that there is no restriction to its use, either in an academic or a non academic environment.

 


Addresses and Contacts

 

We are always very eager to get some user feedback. Please do not hesitate to drop us a line on:

cedric.notredame@europe.com

 

The latest updates of T-Coffee are always available from:

           

igs-server.cnrs-mrs.fr/~cnotred

 

On this address you will also find a link to some of the online T-Coffee servers, including Tcoffee@igs

           

                                    igs-server.cnrs-mrs.fr/Tcoffee/

 

 

T-Coffee can be used to automatically check if an updated version is available

 

t_coffee -update

 


Citations

It is important that you cite T-Coffee when you use it. Citing us is (almost) like giving us money: it helps us convincing our institutions that what we do is useful and that they should keep paying our salaries and delivering Donuts to our offices from time to time (Not that they ever did it, but it would be nice anyway).

 

Cite the server if you used it, otherwise, cite the original paper from 2000 (No, it was never named "T-Coffee 2000").

1:

Notredame C, Higgins DG, Heringa J.

Related Articles, Links

Abstract

T-Coffee: A novel method for fast and accurate multiple sequence alignment.
J Mol Biol. 2000 Sep 8;302(1):205-17.
PMID: 10964570 [PubMed - indexed for MEDLINE]

 

Other useful publications include:

T-Coffee

1:

Claude JB, Suhre K, Notredame C, Claverie JM, Abergel C.

Related Articles, Links

Free in PMC

CaspR: a web server for automated molecular replacement using homology modelling.
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W606-9.
PMID: 15215460 [PubMed - indexed for MEDLINE]

 

2:

Poirot O, Suhre K, Abergel C, O'Toole E, Notredame C.

Related Articles, Links

Free in PMC

3DCoffee@igs: a web server for combining sequences and structures into a multiple sequence alignment.
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W37-40.
PMID: 15215345 [PubMed - indexed for MEDLINE]

 

3:

O'Sullivan O, Suhre K, Abergel C, Higgins DG, Notredame C.

Related Articles, Links

Abstract

3DCoffee: combining protein sequences and structures within multiple sequence alignments.
J Mol Biol. 2004 Jul 2;340(2):385-95.
PMID: 15201059 [PubMed - indexed for MEDLINE]

 

4:

Poirot O, O'Toole E, Notredame C.

Related Articles, Links

Free in PMC

Tcoffee@igs: A web server for computing, evaluating and combining multiple sequence alignments.
Nucleic Acids Res. 2003 Jul 1;31(13):3503-6.
PMID: 12824354 [PubMed - indexed for MEDLINE]

 

5:

Notredame C.

Related Articles, Links

Free Full Text

Mocca: semi-automatic method for domain hunting.
Bioinformatics. 2001 Apr;17(4):373-4.
PMID: 11301309 [PubMed - indexed for MEDLINE]

 

6:

Notredame C, Higgins DG, Heringa J.

Related Articles, Links

Abstract

T-Coffee: A novel method for fast and accurate multiple sequence alignment.
J Mol Biol. 2000 Sep 8;302(1):205-17.
PMID: 10964570 [PubMed - indexed for MEDLINE]

 

7:

Notredame C, Holm L, Higgins DG.

Related Articles, Links

Free Full Text

COFFEE: an objective function for multiple sequence alignments.
Bioinformatics. 1998 Jun;14(5):407-22.
PMID: 9682054 [PubMed - indexed for MEDLINE]

 

Mocca

8:

Notredame C.

Related Articles, Links

Free Full Text

Mocca: semi-automatic method for domain hunting.
Bioinformatics. 2001 Apr;17(4):373-4.
PMID: 11301309 [PubMed - indexed for MEDLINE]

CORE

http://igs-server.cnrs-mrs.fr/~cnotred/Publications/Pdf/core.pp.pdf


DESCRIPTION

 

What is T-Coffee?

 

Before going deep into the core of the matter, here are a few words to quickly explain some of the things T-Coffee will do for you.

 

What does it do?

T-Coffee is a multiple sequence alignment program: given a set of sequences previously gathered using database search programs like BLAST, FASTA or Smith and Waterman, T-Coffee will produce a multiple sequence alignment. To use T-Coffee you must already have your sequences.

 

What can it align?

T-Coffee will align DNA and protein sequences alike. It will be able to use structural information for protein sequences with a known structure. We recently introduced a new mode that makes T-Coffee able to accurately align large datasets.

 

How can I use it?

T-Coffee is not an interactive program. It runs from your UNIX or Linux command line and you must provide it with the correct parameters. If you do not like typing commands, here is the simplest available mode where T-Coffee only needs the name of the sequence file:

 

          t_coffee sample_seq1.fasta [**]

 

If installing and running T-Coffee locally is beyond your computer abilities, we propose you use one of the available online servers.

 

Is T-Coffee different from ClustalW?

According to several benchmarks, T-Coffee appears to be more accurate than ClustalW. Yet, this increased accuracy comes at a price: T-Coffee is slower than Clustal (about N times).

 

If you are familiar with ClustalW, or if you run a ClustalW server, you will find that we made some efforts to ensure as much compatibility as possible between ClustalW and T-COFFEE. Whenever it was relevant, we have kept the flag name and the flag syntax of ClustalW. Yet, you will find that T-Coffee also has many extra possibilities…

 

If you want to align closely related sequences, T-Coffee can also be used in a fast mode, much faster than ClustalW, and about as accurate ( T-Coffee -very_fast) This mode is especially useful to align long sequences.

           

 

What Can T-Coffee do for you (and what it cannot do)

 

 

(NOT) Fetching Sequences

T-Coffee will NOT fetch sequences for you: you must select the sequences you want to align before hand. We suggest you use any BLAST server and format your sequences in FASTA so that T-COFFEE can use them easily.

 

Aligning Sequences

Making accurate multiple alignments of DNA, RNA or Protein sequences.

 

Combining Alignments

T-Coffee allows you to combine results obtained with several alignment methods. For instance if you have an alignment coming from ClustalW, an other alignment coming from Dialign, and a structural alignment of some of your sequences, T-Coffee will combine all that information and produce a new multiple sequence alignment having the best agreement with all these methods (see the FAQ for more details)

 

t_coffee –in= Asample_aln1.aln,Asample_aln2.aln,Asample_aln3.aln –outfile=combined_aln.aln [**]

 

Evaluating Alignments

You can use T-Coffee to measure the reliability of your Multiple Sequence alignment. If you want to find out about that, read the FAQ or the documentation for the -output flag.

 

t_coffee –infile=sample_aln1.aln –special_mode=evaluate [**]

 

Combining Sequences and Structures

One of the latest improvements of T-Coffee is to let you combine sequences and structures, so that your alignments are of higher quality. You need to have sap package installed to fully benefit of this facility.

t_coffee 3d.fasta –special_mode=3dcoffee [*#]

 

Using this mode will cause T-Coffee to automatically identify the target corresponding to your sequence as indicated by an NCBI BLAST. T-Coffee then obtains the required PDB sequences from RCSB. However, if you are also using –template_file, the program will use the template you specified and the corresponding files on your disk.

 

All these network based operations are carried out using wget. If wget is not installed on your system, you can get it for free from (www.wget.org). To make sure wget is installed on your system, type

which wget [**]

 

Identifying Occurrences of a Motif: Mocca

Mocca is a special mode of T-Coffee that allows you to extract a series of repeats from a single sequence or a set of sequences. In other words, if you know the coordinates of one copy of a repeat, you can extract all the other occurrences. If you want to use Mocca, simply type:

 

t_coffee –other_pg mocca sample_seq1.fasta [**]

 

The program needs some time to compute a library and it will then prompt you with an interactive menu. Follow the instructions.

How Does T-Coffee works

 

If you only want to make a standard multiple alignments, you may skip these explanations. But if you want to do more sophisticated things, these few indications may help before you start reading the doc and the papers.

 

When you run T-Coffee, the first thing it does is to compute a library. The library is a list of pairs of residues that could be aligned. It is like a Xmas list: you can ask anything you fancy, but it is down to Santa to assemble a collection of Toys that won't get him stuck at the airport, while going through the metal detector.

 

Given a standard library, it is not possible to have all the residues aligned at the same time because all the lines of the library may not agree. For instance, line 1 may say

 

Residue 1 of seq A with Residue 5 of seq B,

 

and line 100 may say

 

Residue 1 of seq A with Residue 29 of seq B,

 

Each of these constraints comes with a weight and in the end, the T-Coffee algorithm tries to generate the multiple alignment that contains constraints whose sum of weights yields the highest score. In other words, it tries to make happy as constraints as possible (replace the word constraint with, friends, family members, collaborators… and you will know exactly what we mean).

 

You can generate this list of constraints however you like. You may even provide it yourself, forcing important residues to be aligned by giving them high weights (see the FAQ). For your convenience, T-Coffee can generate (this is the default) its own list by making all the possible global pairwise alignments, and the 10 best local alignments associated with each pair of sequences. Each pair of residues observed aligned in these pairwise alignments becomes a line in the library.

 

Yet be aware that nothing forces you to use this library and that you could build it using other methods (see the FAQ). In protein language, T-COFEE is synonymous for freedom, the freedom of being aligned however you fancy (It is with that sort of statements that I got elected Chief Tryptophan Officer in some previous life).


INSTALLATION

 

Standard Installation

1-decompress distribution.tar.gz

gunzip distribution.tar.gz

2-untar distribution.tar

tar -xvf distribution.tar

 

3-This will create the distribution directory with the following structure:

distribution/bin

distribution/doc/t_coffee_doc.pdf,t_coffee_doc.html 

distribution/t_coffee_source

distribution/example

distribution/html

 

4-go into the main directory and type:

 

./install

 

You will know the installation proceeded completely if the mention:

 

Installation of t_coffee Successful

 

appears on your screen to indicate a proper completion.

 

5-add the bin folder to your path:

set path = ($path . <address of the t_coffee bin folder>)

 

Note: The latest t_coffee distribution (2.15 and higher) is self contained and only requires one executable. You may still require external modules (sap, blast, ClustalW) if you wish to use another mode than the default.

 

Note: When updating, make sure to remove the old distribution and any associated program from your path.

6-If you have PDB installed:

 

Assuming you have a standard PDB installation in your file system

setenv PDB_DIR pdb_dir/structures/all/

 

Note: This must be added to your login file.

Extended Installation and other Packages

 

By default, T-Coffee does not require any other package than those included in the distribution. However, depending on your needs, you may want to install some of the following:

 

 

Package       Function

===========================================================

ClustalW      can interact with t_coffee

-----------------------------------------------------------

wget          3DCoffee

              Automatic Downloading of Structures

               Remote use of the Fugue server

-----------------------------------------------------------

sap           structure/structure comparisons (obtain it from W. Taylor, NIMR-MRC).

-----------------------------------------------------------       

 

Once the package is installed, make sure make sure that the executable is on your path, so that t_coffee can find it automatically.

 

 

 


QUICK START

 

T-COFFEE

 

Write your sequences in the same file (Swiss-prot, Fasta or Pir) and type.

t_coffee sample_seq1.fasta [**]

 

This will output two files:

 

sample_seq1.aln

A multiple alignment in a format similar to ClustalW, that can be read by most programs.

 

sample_seq1.dnd

A dendrogram in newick format

 

Note: Using the dendrogram in place of a phylogenetic tree is something very bad, that will land you straight to hell, or in jail, or any place you don’t like, or turn . If you need a tree, compute one using your MSA and an appropriated phylogenetic package like phylip

MOCCA

 

Write your sequences in the same file (Swiss-prot, Fasta or Pir) and type.

t_coffee –other_pg mocca sample_seq1.fasta [**]

 

This command output one files (<your sequences>.mocca_lib) and starts an interactive menu.

 

Note: The computation of the mocca_lib file can take some time. Whenever you re-run mocca on the sane sequences, it will start looking for the mocca_lib file and read it, or will generate it if it does not exist.
RECENT MODIFICATIONS

 

A detailed log of the modifications (with versions), can be found in the to_do.txt file in the doc/ folder.

 

The most notable modifications have to do with the structure of the input. From version 2.20, all files must be tagged to indicate their nature (A: alignment, S: Sequence, L: Library…). We are becoming stricter, but that’s for your own good…


 

MANUAL

 

This manual is only at a very preliminary stage of redaction. It will only show you how to do the very basic with T-Coffee. In order to solve a more specific problem, or answer a query, we suggest you first go through the FAQ to see of your problem has been addressed, read it and then read carefully the documentation associated with corresponding flags. Of course, we also welcome queries and do our best to provide answers and clues in a timely manner.

 

Using T-Coffee

Standard Alignments

T-Coffee can align sequences, structures and profiles. The default mode when using t_coffee is:

     t_coffee sample_seq1.fasta

 

It is also possible to combine sequences from various sources:

t_coffee sample_seq1.fasta sample_seq2.fasta

 

Or even, sequences coming from sequences and alignment files:

t_coffee sample_seq1.fasta,Ssample_aln2.aln

 

Note the ‘S’ identifier, that tells the program to use the alignment as a collection of unaligned sequences.

 

Alignment Combination

It is possible to combine several alignments into one final alignment,

t_coffee –in Asample_aln1_1.aln,Asample_aln1_2.aln –outfile=combined_aln.aln

 

Note the ‘A’ identifier that tells the program to keep the sequences aligned.

 

Aligning Sequences and Structures

Assuming some structures are associated with your sequences, it is possible to align these sequences while using associated structural information. The easiest way to do this is to use 3dcoffee:

t_coffee 3d_coffee

 

Aligning Sequences and Profiles

T-Coffee can make multiple profile alignments. In this context, the alignments are treated as single sequences and aligned to one another in a progressive fashion. Currently, we only support profiles under the form of standard multiple sequence alignments. The profile must either be entered via the –profile flag:

t_coffee –profile sample_aln1.aln,sample_aln2.aln –outfile=combined_profiles.aln

 

It is also possible to read the profile via the –in flag, as long as they are preceded with the ‘R’ identifier.

t_coffee Ssample_seq1.fasta,Rsample_aln2.aln –outfile seqprofile_aln

 

All the internal methods should support profiles. External methods do not support profiles (unless specified otherwise).

 

Using Structures (Or templates) Within Profiles

If your profiles contain structures, you can make sure these will be used during the computatiuon by specifying the 3dcoffee special mode:

t_coffee  Rsample_profile1.aln,Rsample_profile2.aln –special_mode=3dcoffee –outfile=aligned_prf.aln

 

Note that when providing a collection of templates, the program will use the –template_file flag to look for templates within the sequences AND within the profiles associated with some sequences.

 

 

Using New and Existing Methods

Although, it does not necessarily do so explicitly, T-Coffee always end up combining libraries. Libraries are collections of pairs of residues. Given a set of libraries, T-Coffee makes an attempt to assemble the alignment with the highest level of consistence. You can think of the alignment as a timetable. Each library pair would be a request from students or teachers, and the job of T-Coffee would be to assemble the time table that makes as many people as possible happy…

 

In T-Coffee, methods replace the students/professors as constraints generators. These methods can be any standard/non standard alignment methods that can be used to generate alignments (pairwise, most of the time). These alignments can be viewed as collections of constraints that must be fit within the final alignment. Of course, the constraints do not have to agree with one another…

 

This section shows you what are the vailable method in T-Coffee, and how you can add your own methods, either through direct parameterization or via a perl script.

 

 

 

Using Methods Integrated in T-Coffee

 

Some packages already have an interface with t_coffee, these include:

     

      align_pdb:       ALIGN_PDB_4_TCOFFEE

      sap:                  SAP_4_TCOFFEE

      lalign2list:        LALIGN_4_TCOFFEE

      clustalw:          CLUSTALW_4_TCOFFEE

 

If these programs are installed on your system and you want t_coffee to use a specific version:

setenv CLUSTALW_4_TCOFFEE <path to your version>

 

Built in methods methods can be requested using the following names:

 

fast_pair

Makes a global fasta style pairwise alignment. For proteins, matrix=blosum62mt, gep=-1, gop=-10, ktup=2. For DNA, matrix=idmat (id=10), gep=-1, gop=-20, ktup=5. Each pair of residue is given a score function of the weighting mode defined by -weight.

slow_pair

Identical to fast pair, but does a full dynamic programming, using the myers and miller algorithm. This method is recommended if your sequences are distantly related.

ifast_pair [unsupported]

Makes a global fasta alignmnet using the previously computed pairs as a library. `i` stands for iterative. Each pair of residue is given a score function of the weighting mode defined by -weight.

 

align_pdb_pair [unsupported]

Uses the align_pdb routine to align two structures. The pairwise scores are those returnes by the align_pdb program. If a structure is missing, fast_pair is used instead. Each pair of residue is given a score function defined by align_pdb.

sap_pair

Uses sap to align two structures. Each pair of residue is given a score function defined by sap. You must have sap installed on your system to use this method.

 

clustalw_pair

Uses clustalw (default parameters) to align two sequences. Each pair of residue is given a score function of the weighting mode defined by -weight.

 

clustalw_aln

Makes a multiple alignment using ClustalW and adds it to the library. Each pair of residue is given a score function of the weighting mode defined by -weight.

 

lalign_id_pair

Same as lalign_rs_pir, but using the level of identity as a weight.

 

lalign_id_m_pair

Same as above, but does the alignment both way (m stands for miror).

 

lalign_s_pair

Same as above but does also the self comparison (s stands for self). This is needed when extracting repeats. The weights used that way are based on identity.

 

lalign_rs_s_pair

Same as above but does also the self comparison (s stands for self). This is needed when extracting repeats.

 

matrix

Amy matrix can be requested. Simply indicate as a method the name of the matrix preceded with an X (i.e. Xpam250mt). If you indicate such a matrix, all the other methods will simply be ignored, and a standard fast progressive alignment will be computed. If you want to change the substitution matrix used by the methods, use the –matrix flag.

 

fast_cdna_pair [unsupported]

This method computes the pairwise alignment of two cDNA sequences. It is a fast_pair alignment that only takes into account the amino-acid similarity and uses different penalties for amino-acid insertions and frameshifts.

 

This alignment is turned into a library where matched nucleotides receive a score equql to the average level of identity at the amino-acid level.

 

This mode is intended to clean cDNA obtained from ESTs, or to align pseudo-genes.

 

 

To request a method, see the -in flag. For instance, if you wish to request the use of fast_pair and lalign_id_pair (the current default):

 

t_coffee -in Ssample_seq1.fasta,Mfast_pair,Mlalign_id_pair [**]

 

The order in which methods are fed is irrelevant.

 

Integrating External Methods

Direct access to external methods

A special method exists in T-Coffee that can be used to invoke any existing program:

 

t_coffee sample_seq1.fasta –in=Mem0clustalw0pairwise [**]

 

In this context, Clustalw is a method that can be ran with the following command line:

     method –infile=<infile> -outfile=<outfile>

 

Clustalw can be replaced with any method using a similar syntax. If the program you want to use cannot be run this way, you can either write a perl wrapper that fits the bill or write a tc_method file adapted to your program (cf next section).

 

This special method (em, external method) uses the following syntax:

     Em0<method>0<aln_mode:pairwise¦ s_pairwise|multiple>

 

Note: The 0 is used as a separator. This symbol must not be part of the method name.

 

Customizing an external method (with parameters) for T-Coffee

 

T-Coffee can run external methods, using a tc_method file that can be used in place of an established method. Two such files are incorporated in T-Coffee. You can dump them and customize them according to your needs:

 

For instance if you have ClustalW installed, you can use the following file to run the method on your dataset:

t_coffee –other_pg unpack_clustalw_method.tc_method [**]

t_coffee –other_pg unpack_generic_method.tc_method [**]

The second file (generic_method.tc_method) contains many hints on how to customize your new method. The first file is a very straightforward example on how to have t_coffee to run Clustalw with a set of parameters you may be interested in:

 

 

*TC_METHOD_FORMAT_01

***************clustalw_method.tc_method*********

EXECUTABLE    clustalw

ALN_MODE       pairwise

IN_FLAG       -INFILE=

OUT_FLAG       -OUTFILE=

OUT_MODE      aln

PARAM         -gapopen=-10

SEQ_TYPE      S

*************************************************

 

This configuration file will cause T-Coffee to emit the following system call:

clustalw –INFILE=tmpfile1 –OUTFILE=tmpfile2 –gapopen=-10

 

Note that ALN_MODE instructs t_coffee to run clustalw on every pair of sequences (cf generic_method.tc_method for more details).

The tc_method files are treated like any standard established method in T-Coffee. For instance, if the file clustalw_method.tc_method is in your current directory, run:

t_coffee sample_seq1.fasta –in=Mclustalw_method.tc_method [**]

Managing a collection of tc_method files

It may be convenient to store all the method files in a single location on your system. By default, t_coffee will go looking into the directory ~/.t_coffee/methods/. You can change this by either:

 

-Modifying the METHODS_4_TCOFFEE in define_headers.h

-recompile

OR

     setenv METHODS_4_TCOFFEE <another location>

 

Advanced Method Integration

It may sometimes be difficult to customize the program you want to use through a tc_method file. In that case, you may rather use an external perl_script to run your external application. This can easily be achieved using the generic_method.tc_method file.

 

*TC_METHOD_FORMAT_01

***************generic_method.tc_method*********

EXECUTABLE    tc_generic_method.pl

ALN_MODE       pairwise

IN_FLAG       -infile=

OUT_FLAG       -outfile=

OUT_MODE       aln

PARAM         -method clustalw

PARAM         -gapopen=-10

SEQ_TYPE      S

*************************************************

* Note: &bsnp can be used to for  white spaces

 

When you run this method:

t_coffee sample_seq1.fasta –in=Mgeneric_method.tc_method [**]

 

 

T-Coffee runs the script tc_generic_method.pl on your data. It also provides the script with parameters. In this case –method clustalw indicates that the script should run clustalw on your data. The script tc_generic_method.pl is incorporated in t_coffee. Over the time, this script will be the place where novel methods will be integrated

 will be used to run the script tc_generic_method.pl. The file tc_generic_method.pl is a perl file, automatically generated by t_coffee. Over the time this file will make it possible to run all available methods.

 

You can dump the script using the following command:

t_coffee –other_pg=unpack_tc_generic_method.pl [**]

 

Note: If there is a copy of that script in your local directory, that copy will be used in place of the internal copy of T-Coffee.

 

Reference of tc_method file

 

*TC_METHOD_FORMAT_01

******************generic_method.tc_method*************

*

*       Incorporating new methods in T-Coffee

*       Cedric Notredame 17/04/05

*

*******************************************************

*This file is a method file

*Copy it and adapt it to your need so that the method

*you want to use can be incorporated within T-Coffee

*******************************************************

*                  USAGE                              *

*******************************************************

*This file is passed to t_coffee via –in:

*

*    t_coffee –in Mgeneric_method.method

*

*    The method is passed to the shell using the following

*call:

*<EXECUTABLE><IN_FLAG><seq_file><OUT_FLAG><outname><PARAM>

*

*Conventions:

*<FLAG_NAME> <TYPE>        <VALUE>

*<VALUE>: no_name   <=> Replaced with a space

*<VALUE>: &nbsp     <=> Replaced with a space

*

*******************************************************

*                  EXECUTABLE                         *

*******************************************************

*name of the executable

*passed to the shell: executable

*   

EXECUTABLE    tc_generic_method.pl

*

*******************************************************

*                  ALN_MODE                           *

*******************************************************

*pairwise   ->all Vs all (no self )[(n2-n)/2aln]

*m_pairwise ->all Vs all (no self)[n^2-n]^2

*s_pairwise ->all Vs all (self): [n^2-n]/2 + n

*multiple   ->All the sequences in one go

*

ALN_MODE      pairwise

*

*******************************************************

*                  OUT_MODE                           *

*******************************************************

* mode for the output:

*External methods:

* aln -> alignmnent File (Fasta or ClustalW Format)

* list-> List file (TC_LIB_FORMAT_01)

*Internal Methods:

* fL -> Internal Function returning a List (Librairie)

* fA -> Internal Function returning an Alignmnent

*

OUT_MODE      aln

*

*******************************************************

*                  IN_FLAG                             *

*******************************************************

*IN_FLAG

*flag indicating the name of the in coming sequences

*IN_FLAG S no_name ->no flag

*IN_FLAG S &nbsp–in&nbsp -> “ –in “

*

IN_FLAG       -infile=

*

*******************************************************

*                  OUT_FLAG                           *

*******************************************************

*OUT_FLAG

*flag indicating the name of the out-coming data

*same conventions as IN_FLAG

*OUT_FLAG S no_name ->no flag

*

OUT_FLAG      -outfile=

*

*******************************************************

*                  SEQ_TYPE                           *

*******************************************************

*G: Genomic, S: Sequence, P: PDB, R: Profile

*Examples:

*SEQTYPE  S    sequences against sequences (default)

*SEQTYPE  S_P  sequence against structure

*SEQTYPE  P_P  structure against structure

*SEQTYPE  PS   mix of sequences and structure   

*

SEQ_TYPE  S

*

*******************************************************

*                  PARAM                              *

*******************************************************

*Parameters sent to the EXECUTABLE

*If there is more than 1 PARAM line, the lines are

*concatenated

*

PARAM     -method clustalw

PARAM   -OUTORDER=INPUT -NEWTREE=core -align -gapopen=-15

*

*******************************************************

*                  END                                *

*******************************************************

 

 

 


Creating Your Own T-Coffee Libraries

If the method you want to use is not integrated, or impossible to integrate, you can generate your own libraries, either directly or by turning existing alignments into libraries.

 

Using Pre-Computed Alignments

If the method you wish to use is not supported, or if you simply have the alignments, the simplest thing to do is to generate yourself the pairwise/multiple alignments, in FASTA, ClustalW, msf or Pir format and feed them into t_coffee using the -in flag:

 

t_coffee –in=Asample_aln1_1.aln,Asample_aln1_2.aln –outfile=combined_aln.aln [**]

 

Customizing the Weighting Scheme

 

The previous integration method forces you to use the same weighting scheme for each alignment and the rest of the libraries generated on the fly. This weighting scheme is based on global pairwise sequence identity. If you want to use a more specific weighting scheme with a given method, you should either

 

      -generate your own library (cf next section)

      -convert your alignments into a library, using the –weight flag:

t_coffee –in Asample_aln1.aln –out_lib=test_lib.tc_lib –lib_only –weight=sim_pam250mt [**]

t_coffee –in=Asample_aln1.aln,Ltest_lib.tc_lib –outfile=outaln [**]

 

Note: Default methods are reset when you explicitly use -in, if you wish to keep using fast_pair and lalign_id_pair, you need to indicate these methods explicitly:

 

t_coffee –in=Asample_aln1_1.aln,Asample_aln1_2.aln,Mfast_pair,Mlalign_id_pair –outfile=out_aln [**]

 

Generating Your Own Libraries

 

This is suitable if you have local alignments, or very detailed information about your potential residue pairs, or if you want to use a very specific weighting scheme. You will need to generate your own libraries, using the format described in the last section.

 

Note: You can have up to 200 libraries. They do not need to contain the same sequences.

The Notion of Templates in T-Coffee [TO DO]


FAQ

 

Abnormal Terminations and Wrong Results

In order to maintain t_coffee and fix bugs and problem, we need to get as much fee

Q: The program keeps crashing when I give my sequences

A: This may be a format problem. Try to reformat your sequences using any utility (readseq...). We recommend the Fasta format. If the problem persists, contact us.

 

A: Your sequences may not be recognized for what they really are. Normally T-Coffee recognize the type of your sequences automatically, but if it fails, use:

t_coffee sample_seq1.fasta -type=PROTEIN [**]

 

Q: The default alignment is not good enough

A: see next question

 

Q: The alignment contains obvious mistakes

A: This happens with most multiple alignment procedures. However, wrong alignments are sometimes caused by a bugs or an implementation mistake. Please report the most unexpected results to the authors.

Q: The program is crashing

A: If you get the message:

 

FAILED TO ALLOCATE REQUIRED MEMORY

 

See the next question.

If the program crashes for some other reason, please check whether you are using the right syntax and if the problem persists get in touch with the authors.

Q: I am running out of memory

A: You can use a more accurate, slower and less memory hungry dynamic programming mode called myers_miller_pair_wise. Simply indicate the flag:

t_coffee sample_seq1.fasta –special_mode low_memory [**]

           

Note that this mode will be much less time efficient than the default, although it may be slightly more accurate. In practice the parameterization associate with special mode turns off every memeory expensive heuristic within T-Coffee. For version 2.11 this amounts to

t_coffee  sample_seq1.fasta -in=Mslow_pair,Mlalign_id_pair -tree_mode=slow -dp_mode=myers_miller_pair_wise [**]

 

If you keep running out of memory, you may also want to lower –maxnseq, to ensure that t_coffee_dpa will be used.

Input/Output Control

Q: How many Sequences can t_coffee handle

A: T-Coffee is limited to a maximum of 50 sequences. Above this number, the program automatically switches to a heuristic mode, named DPA, where DPA stands for Double Progressive Alignment.

 

DPA is still in development and the version currently shipped with T-Coffee is only a beta version.

 

Q: How many ways to pass parameters to t_coffee?

A: See the section well behaved parameters

 

Q: How can I change the default output format?

A: See the -output option, common output formats are:

 

t_coffee sample_seq1.fasta -output=msf,fasta_aln [**]

Q: My sequences are slightly different between all the alignments.

A: It does not matter. T-Coffee will reconstruct a set of sequences that incorporates all the residues potentially missing in some of the sequences ( see flag -in).

 

Q: Is it possible to pipe stuff OUT of t_coffee?

A: Specify stderr or stdout as output filename, the output will be redirected accordingly. For instance

 

t_coffee sample_seq1.fasta -outfile=stdout -out_lib=stdout [**]

 

This instruction will output the tree (in new hampshire format) and the alignment to stdout.

 

Q: Is it possible to pipe stuff INTO t_coffee?

A: If as a file name, you specify stdin, the content of this file will be expected throught pipe:

 

cat sample_seq1.fasta | t_coffee -infile=stdin [**]

 

will be equivalent to

 

t_coffee sample_seq1.fasta [**]

 

 

If you do not give any argument to t_coffee, they will be expected to come from pipe:

 

cat sample_param_file.param  | t_coffee -parameters=stdin [**]

 

For instance:

echo –in=Ssample_seq1.fasta,Mclustalw_pair | t_coffee –parameters=stdin [**]

 

Q: Can I read my parameters from a file?

A: See the well behaved parameters section.

 

Q: I want to  decide myself on the name of the output files!!!

A: Use the -run_name flag.

 

t_coffee sample_seq1.fasta –run_name=guacamole [**]

Q: I want to use the sequences in an alignment file

A: Simply fed your alignment, any way you like, but do not forget to append the prefix S for sequence:

 

t_coffee Ssample_aln1.aln [**]

t_coffee -infile=Ssample_aln1.aln [**]

t_coffee –in=Ssample_aln1.aln,Mslow_pair,Mlalign_id_pair –outfile=outaln

 

This means that the gaps will be reset and that the alignment you provide will not be considered as an alignment, but as a set of sequences.

 

Q: I only want to produce a library

A: use the –lib_only flag

 

t_coffee sample_seq1.fasta -out_lib=sample_lib1.tc_lib -lib_only [**]

 

Please, note that the previous usage supersedes the use of the –convert flag. Its main advantage is to restrict computation time to the actual library computation.

Q: I want to turn an alignment into a library

A: use the –lib_only flag

 

t_coffee –in=Asample_aln1.aln -out_lib=sample_lib1.tc_lib -lib_only [**]

 

It is also possible to control the weight associated with this alignment (see the –weight section).

 t_coffee –in=Asample_aln1.aln -out_lib=sample_lib1.tc_lib -lib_only –weight=1000 [**]

 

 

Q: I want to concatenate two libraries

A: You cannot concatenate these files on their own. You will have to use t_coffee. Assume you want to combine tc_lib1.tc_lib and tc_lib2.tc_lib.

 

t_coffee -in Lsample_lib1.tc_lib Lsample_lib2.tc_lib –lib_only -out_lib=sample_lib3.tc_lib [**]

 

 

 

Q: What happens to the gaps when an alignment is fed to T-Coffee

A: An alignment is ALWAYS considered as a library AND a set of sequences. If you want your alignment to be considered as a library only, use the S identifier.

     

t_coffee Ssample_aln1.aln –outfile=outaln[**]

 

It will be seen as a sequence file, even if it has an alignment format (gaps will be removed).

 

Q: I cannot print the html graphic display!!!

A: This is a problem that has to do with your browser. Instead of requesting the score_html output, request the score_ps output that can be read using ghostview:

     

t_coffee sample_seq1.fasta -output=score_ps [**]

or  

t_coffee sample_seq2.fasta -output=score_pdf [**]

 

Note: you need to have the converter ps2pdf installed on your system (standard under Linux and cygwin).

 

Note: the latest versions of Internet Explorer and Netscape now allow the user to print the HTML display. Do not forget to request Background printing.

 

Q: I want to output an html file and a regular file

A: see the next question

Q: I would like to output more than one alignment format at the same time

A: The flag -output accepts more than one parameter. For instance,

     

t_coffee sample_seq1.fasta -output=clustalw,score_html,score_ps,msf [**]

 

This will output four alignment files in the corresponding formats. Alignments' names will have the format name as an extension.

 

Alignment Computation

Q: I do not want to compute the alignment.

A: use the -convert flag

 

t_coffee sample_aln1.aln -convert -output=gcg [**]

     

This command will read the .aln file and turn it into an .msf alignment.

 

Q: I would like to force some residues to be aligned.

 

If you want to brutally force some residues to be aligned, you may use as a post processing, the force_aln function of seq_reformat:

 

t_coffee –other_pg seq_reformat –in sample_aln4.aln –action +force_aln seq1 10 seq2 15 [**]

 

t_coffee –other_pg seq_reformat –in sample_aln4.aln –action +force_aln sample_lib4.tc_lib02 [**]

 

sample_lib4.tc_lib02 is a T-Coffee library using the tc_lib02 format:

 

*TC_LIB_FORMAT_02

SeqX resY ResY_index    SeqZ ResW ResW_index

Warning: the TC_LIB_FORMAT_02 is still experimental and unsupported. It can only be used in the context of the force_aln function described here.

 

Given more than one constraint, these will be applied one after the other, in the order they are provided. This greedy procedure means that the Nth constraint may disrupt the (N-1)th previously imposed constraint, hence the importance of forcing the constraints in the right order, with the most important coming last.

 

We do not recommend imposing hard constraints on an alignment, and it is much more advisable to use the soft constraints provided by standard t_coffee libraries (cf. building your own libraries section)

 

Q: I would like to use structural alignments.

See the section Using structures in Multiple Sequence Alignments, or see the question I want to build my own libraries.

 

Q: I want to build my own libraries.

A: Turn your alignment into a library, forcing the residues to have a very good weight, using structure:

     

t_coffee –in Asample_seq1.aln -weight=1000 -out_lib=sample_seq1.tc_lib –lib_only [**]

 

The value 1000 is simply a high value that should make it more likely for the substitution found in your alignment to reoccur in the final alignment. This will produce the library sample_aln1.tc_lib that you can later use when aligning all the sequences:

 

t_coffee –in Ssample_seq1.fasta Lsample_seq1.tc_lib –outfile sample_seq1.aln [**]

 

 

If you only want some of these residues to be aligned, or want to give them individual weights, you will have to edit the library file yourself or use the –force_aln option (cf FAQ: I would like to force some residues to be aligned). A value of N*N * 1000 (N being the number of sequences) usually ensure the respect of a constraint.

 

Q: I want to use my own tree!!!!

A: Use the -usetree=<your own tree> flag.

t_coffee sample_seq1.fasta –usetree=sample_tree.dnd [**]

 

Q: I want to align coding DNA

A: use the fasta_cdna_pair method that compares two cDNA using the best reading frame and taking frameshifts into account.

 

Note: This method has not yet been fully tested and is only provided “as-is” with no warranty.

 

t_coffee sample_seq4.fasta –in Mcdna_fast_pair [**]

 

Notice that in the resulting alignments, all the gaps are of modulo3, except one small gap in the first line of sequence hmgl_trybr. This is a framshift, made on purpose. You can realign the same sequences while ignoring their coding potential and treating them like standard DNA:

t_coffee sample_seq4.fasta [**]

 

 

Q: I do not want to use all the possible pairs when computing the library

Q: I only want to use specific pairs to compute the library

A: Simply write in a file the list of sequence groups you want to use:

t_coffee sample_seq1.fasta –in=Mclustalw_pair,Mclustalw_aln –lib_list=sample_list1.lib_list –outfile=test

 

 

***************sample_list1.lib_list************

2 hmgl_trybr hmgt_mouse

2 hmgl_trybr hmgb_chite

2 hmgl_trybr hmgl_wheat

3 hmgl_trybr hmgl_wheat hmgl_mouse

***************sample_list1.lib_list************

 

 

 

 

Note: Pairwise methods (slow_pair…) will only be applied to list of pairs of sequences, while multiple methods (clustalw_aln) will be applied to any dataset having more than two sequences.

Q: There are duplicates or quasi-duplicates in my set

A: If you can remove them, this will make the program run faster, otherwise, the t_coffee scoring scheme should be able to avoid over-weighting of over-represented sequences.

                 

Using Structures and Profiles

Q: Can I align sequences to a profile with T-Coffee ?

A: Yes, you simply need to indicate that your alignment is a profile with the R tag..

t_coffee sample_seq1.fasta Rsample_aln2.aln –outfile tacos

Q: Can I align sequences Two or More Profiles ?

A: Yes, you, simply tag your profiles with the letter R and the program will treat them like standard sequences.

t_coffee Rsample_aln1.fasta Rsample_aln2.aln –outfile tacos

Q: Can I align two profiles according to the structures they contain?

A: Yes. As long as the structure sequences are named according to their PDB identifier

t_coffee  Rsample_profile1.aln,Rsample_profile2.aln –special_mode=3dcoffee –outfile=aligne_prf.aln

 

 

Alignment Evaluation

Q: How good is my alignment?

A: see what is the color index?

Q: What is that color index?

 

A: T-Coffee can provide you with a measure of consistency among all the methods used. You can produce such an output using:

 

t_coffee sample_seq1.fasta -output=score_html [**]

 

This will compute your_seq.score_html that you can view using netscape. An alternative is to use score_ps or score_pdf that can be viewed using ghostview or acroread, score_ascii will give you an alignment that can be parsed as a text file.

 

A book chapter describing the CORE index is available on:

http://igs-server.cnrs-mrs.fr/~cnotred/Publications/Pdf/core.pp.pdf

 

 

 

Q: Can I evaluate alignments NOT produced with T-Coffee?

A: Yes. You may have an alignment produced from any source you like. To evaluate it do:

t_coffee –infile=sample_aln1.aln - in=Lsample_aln1.tc_lib –special_mode=evaluate [**]

 

If you have no library available, the library will be computed on the fly using the following command. This can take some time, depending on your sample size. To monitor the progress in a situation where the default library is being built, use:

 

t_coffee –infile=sample_aln1.aln –special_mode evaluate [**]

Q: Can I Compare Two Alignments?

A: Yes. You can treat one of your alignments as a library and compare it with the second alignment:

 

t_coffee –infile=sample_aln1_1.aln -in=Asample_aln1_2.aln –special_mode=evaluate [**]

 

If you have no library available, the library will be computed on the fly using the following command. This can take some time, depending on your sample size. To monitor the progress in a situation where the default library is being built, use:

 

t_coffee –infile=sample_aln1.aln –special_mode evaluate [**]

 

Q: I am aligning sequences with long regions of very good overlapp

A: Increase the ktuple size ( up to 4 or 5 for DNA) and up to 3 for proteins.

 

t_coffee sample_seq1.fasta -ktuple=3

 

This will speed up the program. It can be very useful, especially when aligning ESTs.

Q: Why is T-Coffee changing the names of my sequences!!!!

A: If there is no duplicated name in your sequence set, T-Coffee's handling of names is consistent with Clustalw, see Sequence Name Handling in the Format section.

 

If your dataset contains sequences with identical names, these will automatically be renamed to:

************************

>seq1

>seq1

************************

>seq1

>seq1_1

************************

 

The situation where this renaming creates two sequence with a similar name is not currently supported.
FLAG DESCRIPTION: REFERENCE

 

This reference manual gives a list of all the flags that can be used to modify the behavior of T-Coffee. For your convenience, we have grouped them according to their nature. To display a list of all the flags used in the version of T-Coffee you are using (along with their default value), type:

t_coffee [**]

Or

t_coffee –help [**]

Or

t_coffee –help –in

 

Or any other parameter

 

Well Behaved Parameters

Separation

You can use any kind of separator you want (i.e. ,; <space>=). The syntax used in this document is meant to be consistent with that of ClustalW. However, in order to take advantage of the automatic filename compleation provided by many shells, you can replace “=” and “,” with a space.

 

Posix

T-Coffee is not POSIX compliant.

 

Entering the right parameters

There are many ways to enter parameters in T-Coffee, see the -parameter flag in

Parameters Syntax

Default Usage and Configuration Files

In the following documentation:

            sample_seq.seq is provided in the distribution, in the tutorial directory, along with all the other sample file mentioned in this documentation. The file sample_seq.pep is assumed to be a file containing sequences in any of the format recognized by T-Coffee.

Text Box: Parameters Priority

In general you will not need to use these complicated parameters. Yet, if you find yourself typing long command lines on a regular basis, it may be worth reading this section.

One may easily feel confused with the various manners in which the parameters can be passed to t_coffee. The reason for these many mechanisms is that they allow several levels of intervention. For instance, you may install t_coffee for all the users and decide that the defaults we provide are not the proper ones… In this case, you will need to make your own t_coffee_default file.

Later on, a user may find that he/she needs to keep re-using a specific set of parameters, different from those in t_coffee_default, hence the possibility to write an extra parameter file: parameters. In summary:

-parameters > prompt parameters > -t_coffee_defaults > -special_mode

This means that -parameters supersede all the others, while parameters provided via -special mode are the weakest.

No Flag

If no flag is used <your sequence> must be the first argument. See format for further information.

t_coffee sample_seq1.fasta [**]

Which is equivalent to

t_coffee Ssample_seq1.fasta [**]

 

When you do so, sample_seq1 is used as a name prefix for every file the program outputs.

 

Note: This is one of the exceptions (with –infile) where the identifier tag (S,A,L,M…) can be omitted. Any dataset provided this way will be assumed to be a sequence (S). These exceptions have been designed to keep the program compatible with ClustalW.

-parameters

Usage: -parameters=parameters_file

Default: no parameters file

                      

Indicates a file containing extra parameters. Parameters read this way behave as if they had been added on the right end of the command line that they either supersede(one value parameter) or complete (list of values). For instance, the following file (parameter.file) could be used

 

*******sample_param_file.param*********** 

     -in=Ssample_seq1.fasta,Mfast_pair

     -output=msf_aln

*****************************************

 

Note: This parameter file can ONLY contain valid parameters. Comments are not allowed. Parameters passed this way will be checked like normal parameters.

 

Used with:

t_coffee -parameters=sample_param_file.param[**]

 

Will cause t_coffee to apply the fast_pair method onto to the sequences contained in sample_seq.fasta. If you wish, you can also pipe these arguments into t_coffee, by naming the parameter file "stdin" (as a rule, any file named stdin is expected to receive its content via the stdin)

 

cat sample_param_file.param  | t_coffee -parameters=stdin [**]

 

-t_coffee_defaults

Usage: -t_coffee_defaults=<file_name>

Default: not used.

 

This flag tells the program to use some default parameter file for t_coffee. The format of that file is the same as the one used with -parameters. The file used is either:

      1. <file name> if a name has been specified

      2.  ~/.t_coffee_defaults if no file was specified

      3. The file indicated by the environment variable TCOFFEE_DEFAULTS

 

-special_mode

Usage: -special_mode= hard coded mode

Default: not used.

 

It indicates that t_coffee will use some hard coded parameters. These include:

      quickaln: very fast approximate alignment

      dali: a mode used to combine dali pairwise alignments

      evaluate: defaults for evaluating an alignment

      3dcoffee: runs t_coffee with the 3dcoffee parameterisation

 

Other modes exist that are not yet supported      

-score [Deprecated]

Usage: -score

Default: not used

Toggles on the evaluate mode and causes t_coffee to evaluates a precomputed alignment provided via -infile=<alignment>. The flag -output must be set to an appropriate format (i.e. -output=score_ascii, score_html or score_pdf). A better default parameterization is obtained when using the flag -special_mode=evaluate.

-evaluate

Usage: -evaluate

Default: not used

Replaces –score. This flag toggles on the evaluate mode and causes t_coffee to evaluates a pre-computed alignment provided via -infile=<alignment>. The flag -output must be set to an appropriate format (i.e. -output=score_ascii, score_html or score_pdf).

 

The main purpose of –evaluate is to let you control every aspect of the evaluation. Yet it is advisable to use pre-defined parameterization: special_mode=evaluate.

 

t_coffee –infile=sample_aln1.aln -special_mode=evaluate [**]

 

t_coffee –infile=sample_seq1.aln –in  Lsample_lib1.tc_lib –special_mode=evaluate [**]

 

 

-convert [cw]

Usage: -convert

Default: turned off

Toggles on the conversion mode and causes T-Coffee to convert the sequences, alignments, libraries or structures provided via the -infile and -in flags. The output format must be set via the -output flag. This flag can also be used if you simply want to compute a library (i.e. you have an alignment and you want to turn it into a library).

This flag is ClustalW compliant.

 

-do_align [cw]

Usage:  -do_align

Default: turned on

For compatibility with ClustalW

Special Parameters

-version

Usage: -version

Default: not used

Returns the current version number

-check_configuration

Usage: -check_configuration

Default: not used

Checks your system to determine whether all the programs T-Coffee can interact with are installed.

 

-cache

Usage: -cache=<use, update, ignore, <filename>>

Default: -cache=use

By default, t_coffee stores in a cache directory, the results of computationnaly expensive (structural alignment) or network intensive (BLAST search) operations.

 

-update

Usage: -update

Default: turned off

Causes a wget access that checks whether the t_coffee version you are using needs updating.

 

-full_log

Usage: -full_log=<filename>

Default: turned off

Causes t_coffee to output a full log file that contains all the input/output files.

 

-other_pg

Usage: -other_pg=<filename>

Default: turned off

Some rumours claim that Tetris is embedded within T-Coffee and could be ran using some special set of commands. We wish to deny these rumours, although we may admit that several interesting reformatting programs are now embedded in t_coffee and can be ran through the –other_pg flag.

seq_reformat: is a versatile reformat utility. You can get an idea of what it does by typing:

t_coffee –other_pg=seq_reformat [**]

extract_perl_scripts: will cause t_coffee to dump all its embedded perl scripts in corresponding files.

t_coffee –other_pg=unpack_all [**]

t_coffee –other_pg=unpack_extract_from_pdb [**]

    

extract_from_pdb is a useful perl utility we maintain, meant to parse PDB files. You can also run any of the scripts embedded in t_coffee by typing:

t_coffee –other_pg=extract_from_pdb –help [**]

Input

Sequence Input

-infile [cw]

To remain compatible with ClustalW, it is possible to indicate the sequences with this flag

 

t_coffee -infile=sample_seq1.fasta [**]

 

Note: Common multiple sequence alignments format constitute a valid input format.

Note: T-Coffee automatically removes the gaps before doing the alignment. This behaviour is different from that of ClustalW where the gaps are kept.

 

-in

Cf –in from the Method and Library Input section

-get_type

Usage: -get_type

Default: turned off

 

Forces t_coffee to identify the sequences type (PROTEIN, DNA).

-type [cw]

Usage: -type=DNA ¦ PROTEIN¦ DNA_PROTEIN

Default: -type=<automatically set>

 

This flag sets the type of the sequences. If omitted, the type is guessed automatically. This flag is compatible with ClustalW.

 

Note:  In case of low complexity or short sequences, it is recommended to set the type manually.