Anonymous Sequences

Cedric Notredame

Purpose of the Exercises

Pick one of the sequences given below (Note that all the sequences have a similar complexity). Use some of the approaches outlined in the table below to characterize the sequence using bioinformatics tools and databases. Write a report describing all the analyses you did (in a way that will allow it to be reproduced later) and the conclusions you draw from these. The report (in pdf or Word format) will be submitted to no later than November 30th 2015 - the subject should include «Anonymous_sequence». We encourage you to think of this as a real research project - you have available all the tools that any researcher will have.

Characterizing a sequence means that you must find out as much as you can about the function of this sequence. Is it an enzyme, and if so what does it do? Where is its active site (and is it active). Does it have repeated elements, and if so, what are they. This is a complicated investigation where you must remember that each individual bioinformatics result you get is not enough on its own and must be supported by alternative results. Have Fun!

Where to start ?

There is no definite rule when it comes to studying a sequence and the purpose of this exercise is to let you invent and discover your own way of looking at sequences, yet, in case you need them, here are a few guidelines.

Remember that there may be more than one story to uncover in your sequence (two or three domains for instance). If this is more than you can handle, do things one at a time, start with a study of the domain you find the most exiting and then move to the next one. You do not need to be exhaustive and we will be happy with a nice study of at least one portion of your sequence!

Here is a list of simple things you could do, along with the chapter of Bioinformatics for dummies you could use:

Bioinformatics For Dummies
Type of Invistigation

The easiest thing you could do with your sequence is to find out about its various physico-chemical properties, and do simple prediction, or find out whether your sequence contain known domains.


The second easiest thing to do is comparing the sequence with a database. Using the ressources in the Chapter 7 of bioinformatics for Dummies. Of course the secret here is to use the right database to ask the right question. If you are not too sure on the databases, indications for proteins can be found in the Chapter 4. These databases will also come very handy to find out about obscure chemical information and complicated post-translational modifications.


If you have started gathering sequences, you may want to build a multiple sequence alignment. Yet before you do this, remember that simply comparing the sequence with itself can yield valuable clues. You can use dotlet for this purpose (and other tools) as explained in chapter 8.


Multiple alignments constitute the best way to present biological sequence information. If you think you have gathered the right sequences (or portion of sequences), you can try using some of the online tools presented in Chapter 9. Go to chapter 10 if you want to make your alignment look flashy.


Structures also help! Use Chapter 11 to find out if your there is a way to estimate the structure of your sequence. If your active site, or your phosporilation site appears to be right in the middle of the protein CORE, burried as deep as it can be, you are in trouble!


And here comes the complicated part! Making sense of all the information. There is a lots of noise in the data gathered using bioinformatics methods. You must clean it using concistency rules, just like if you were doing an enquiry, confronting witnesses testimonies. Try to folow these three rules

1Use a previous result to propose an hypothesisAccording to Prosite Residue 25 could be a Phosphorilation site
2Design an experiment to test your hypothesisPredict the secondary or the tertiary structure
3ConcludeResidue 25 seems to be deeply burried and is therefore not a good candidate for beeing a phosporilatiuon site


List of Resources



Single Sequence Analysis
Protparam@ExPASy Tools to estimate various physico-chemical properties
protscale Tools to estimate various physico-chemical properties
Pairwise Comparison and Database Search
dotlet dotlet pairwise comparison
BLAST BLAST and other database search resources
Lalign Lalign local alignments
Multiple Sequence Alignments
T-Coffee Various T-Coffee MSA tools
ClustalO ClustalO and other aligners at the EBI
Structural Analysis
NCBI/Structure The NCBI Section dedicated to Structures
TM-HMM Secondary structure analysis
PredictProtein Secondary structure analysis
Genome Analysis
ENSEMBL The ENSEMBL Genome Browser
UCSC The UCSC Genome Browser
TIGR The home of bacterial genomes

List of Sample Sequences