.T-coffee Tutorial


Centre National De LA Recherche scientifique
CENTRO DE REGULACCIO GENOMICA, Barcelona

Cédric Notredame
www.tcoffee.org

T-Coffee:
Cheat Sheet

Tutorial and FAQ

 


T-Coffee Tutorial
(Version 6.18, August 2008)
T-Coffee, PSI-Coffee
3D-Coffee/Expresso
M-Coffee
R-Coffee
APDB and iRMSD

ă Cédric Notredame, Centro de Regulaccio Genomica and Centre National de la Recherche Scientifique, France


Cheat Sheet: T-Coffee. 6

Proteins. 6

DNA.. 6

RNA.. 6

Memory Problems. 6

Before You Start….. 8

Foreword. 8

Pre-Requisite. 8

Getting the Example Files of the Tutorial 9

What Is  T-COFFEE ?. 10

What is T-Coffee?. 10

What does it do?. 10

What can it align?. 10

How can I use it?. 10

Is There an Online Server. 11

Is T-Coffee different from ClustalW?. 11

Is T-Coffee very accurate?. 11

What T-Coffee Can and Cannot do for you …... 12

(NOT) Fetching Sequences. 12

Aligning Sequences. 12

Combining Alignments. 12

Evaluating Alignments. 12

Combining Sequences and Structures. 12

Identifying Occurrences of a Motif: Mocca. 13

How Does T-Coffee works. 13

Preparing Your Data: Reformatting and Trimming With seq_reformat 15

Seq_reformat 15

Accessing the T-Coffee Reformatting Utility. 15

An overview of seq_reformat 16

Reformatting your data. 16

Changing MSA formats. 16

Dealing with Non-automatically recognized formats. 16

Automated Sequence Edition. 16

Removing the gaps from an alignment 16

Changing the case of your sequences. 16

Changing the case of specific residues. 17

Changing the case depending on the score. 17

Protecting Important Sequence Names. 17

Colouring/Editing Residues in an Alignment 18

Coloring specific types of residues. 18

Coloring a specific residue of a specific sequence. 18

Coloring according to the conservation. 18

Colouring/Editing residues in an Alignment Using a Cache. 19

Overview.. 19

Preparing a Sequence or Alignment Cache. 19

Preparing a Library Cache. 20

Coloring an Alignment using a cache. 21

Changing the default colors. 21

Evaluating an alignment and producing a cache. 22

Evaluating an alignment with T-Coffee. 22

Evaluating the level of conservation with a substitution matrix. 22

Selective Reformatting. 23

Removing gapped columns. 23

Selectively turn some residues to lower case. 23

Selectively modifying residues. 24

Keeping only the best portion of an alignment 24

Extracting Portions of Dataset 25

Extracting The High Scoring Blocks. 25

Extracting Sequences According to a Pattern. 26

Extracting Sequences by Names. 26

Removing Sequences by Names. 27

Extracting Blocks Within Alignment 27

Concatenating Alignments. 28

Analyzing your Multiple Sequence Alignment 28

Estimating the diversity in your alignment 28

Reducing and improving your dataset 28

Extracting the N most informative sequences. 29

Extracting all the sequences less than X% identical 29

Speeding up the process. 29

Forcing Specific Sequences to be kept 30

Identifying and Removing Outlayers. 31

Chaining Important Sequences. 31

Manipulating DNA sequences. 31

Translating DNA sequences into Proteins. 31

Back-Translation With the Bona-Fide DNA sequences. 32

Finding the Bona-Fide Sequences for the Back-Translation. 32

Guessing Your Back Translation. 32

Fetching a Structure. 32

Fetching a PDB structure. 32

Fetching The Sequence of a PDB structure. 33

Adapting extract_from_pdb to your own environment 33

Manipulating RNA sequences with seq_reformat 34

Producing a Stockholm output: adding predicted secondary structures. 34

Producing a consensus structure. 34

Adding a consensus structure to an alignment 34

Analyzing an alifold secondary structure prediction. 34

Analyzing matching columns. 35

Visualizing compensatory mutations. 35

Handling gapped columns. 35

Comparing alternative folds. 35

Manipulating Phylogenetic Trees with seq_reformat 37

Producing phylogenetic trees. 37

Comparing two phylogenetic trees. 38

Scanning Phylogenetic Trees. 38

Pruning Phylogenetic Trees. 39

Building Multiple Sequence Alignments. 40

How to generate The Alignment You Need?. 40

What is a Good Alignment?. 40

The Main Methods and their Scope. 41

Choosing The Right Package. 42

Computing Multiple Sequence Alignments With T-Coffee. 43

Computing Very accurate (but slow) alignments with PSI-Coffee. 43

A Simple Multiple Sequence Alignment 43

Controlling the Output Format 43

Computing a Phylogenetic tree. 43

Using Several Datasets. 44

How Good is Your Alignment 44

Doing it over the WWW... 44

Aligning Many Sequences. 45

Aligning Very Large Datasets with Muscle. 45

Aligning Very Large Alignments with Mafft 45

Aligning Very Large Alignments with T-Coffee. 45

Shrinking Large Alignments With T-Coffee. 45

Modifying the default parameters of T-Coffee. 45

Changing the Substitution Matrix. 46

Comparing Two Alternative Alignments. 46

Changing Gap Penalties. 48

Can You Guess The Optimal Parameters?. 49

Using Many Methods at once. 49

Using All the Methods at the Same Time: M-Coffee. 49

Using Selected Methods to Compute your MSA.. 50

Combining pre-Computed Alignments. 51

Aligning Profiles. 51

Using Profiles as templates. 51

Aligning One sequence to a Profile. 51

Aligning Many Sequences to a Profile. 52

Aligning Other Types of Sequences. 52

Splicing variants. 52

Aligning DNA sequences. 53

Aligning RNA sequences. 53

Noisy Coding DNA Sequences…... 53

Using Secondary Structure Predictions: 55

Single Sequence prediction. 55

Multiple Sequence Predictions. 55

Incorporation of the prediction in the alignment 56

Using other secondary structure predictions. 56

Output of the prediction. 57

Combining Sequences and 3D-Structures. 58

If you are in a Hurry: Expresso. 58

What is Expresso?. 58

Using Expresso. 59

Aligning Sequences and Structures. 59

Mixing Sequences and Structures. 59

Using Sequences only. 60

Aligning Profile using Structural Information. 60

How Good Is Your Alignment ?. 61

Evaluating Alignments with The CORE index. 61

Computing the Local CORE Index. 61

Computing the CORE index of any alignment 61

Filtering Bad Residues. 61

Filtering Gap Columns. 62

Evaluating an Alignment Using Structural Information: APDB and iRMSD.. 63

What is the iRMSD?. 63

How to Efficiently Use Structural Information. 64

Evaluating an Alignment With the iRMSD Package. 64

Evaluating Alternative Alignments. 64

Identifying the most distantly related sequences in your dataset 65

Evaluating an Alignment according to your own Criterion. 65

Establishing Your Own Criterion. 65

Integrating External Methods In T-Coffee. 67

What Are The Methods Already Integrated in T-Coffee. 67

List of INTERNAL Methods. 67

Plug-In: Using Methods Integrated in T-Coffee. 68

Modifying the parameters of Internal and External Methods. 70

Internal Methods. 70

External Methods. 70

Integrating External Methods. 71

Direct access to external methods. 71

Customizing an external method (with parameters) for T-Coffee. 71

Managing a collection of method files. 72

Advanced Method Integration. 72

The Mother of All method files…... 74

Weighting your Method. 75

Plug-Out: Using T-Coffee as a Plug-In. 76

Creating Your Own T-Coffee Libraries. 76

Using Pre-Computed Alignments. 76

Customizing the Weighting Scheme. 76

Generating Your Own Libraries. 77

Frequently Asked Questions. 78

Abnormal Terminations and Wrong Results. 78

Q: The program keeps crashing when I give my sequences. 78

Q: The default alignment is not good enough. 78

Q: The alignment contains obvious mistakes. 79

Q: The program is crashing. 79

Q: I am running out of memory. 79

Input/Output Control 79

Q: How many Sequences can t_coffee handle. 79

Q: Can I prevent the Output of all the warnings?. 79

Q: How many ways to pass parameters to t_coffee?. 79

Q: How can I change the default output format?. 80

Q: My sequences are slightly different between all the alignments. 80

Q: Is it possible to pipe stuff OUT of t_coffee?. 80

Q: Is it possible to pipe stuff INTO t_coffee?. 80

Q: Can I read my parameters from a file?.