Anonymous Sequences

Cedric Notredame

Purpose of the Exercises

Pick one of the sequences given below (Note that all the sequences have a similar complexity). Use some of the approaches outlined in the table below to characterize the sequence using bioinformatics tools and databases. Write a report describing all the analyses you did (in a way that will allow it to be reproduced later) and the conclusions you draw from these. The report (in pdf or Word format) will be submitted to Christine.Stansberg@uib.no no later than November 30th 2015 - the subject should include «Anonymous_sequence». We encourage you to think of this as a real research project - you have available all the tools that any researcher will have.

Characterizing a sequence means that you must find out as much as you can about the function of this sequence. Is it an enzyme, and if so what does it do? Where is its active site (and is it active). Does it have repeated elements, and if so, what are they. This is a complicated investigation where you must remember that each individual bioinformatics result you get is not enough on its own and must be supported by alternative results. Have Fun!

Where to start ?

There is no definite rule when it comes to studying a sequence and the purpose of this exercise is to let you invent and discover your own way of looking at sequences, yet, in case you need them, here are a few guidelines.

Remember that there may be more than one story to uncover in your sequence (two or three domains for instance). If this is more than you can handle, do things one at a time, start with a study of the domain you find the most exiting and then move to the next one. You do not need to be exhaustive and we will be happy with a nice study of at least one portion of your sequence!

Here is a list of simple things you could do, along with the chapter of Bioinformatics for dummies you could use:

Chapter
of
Bioinformatics For Dummies Type of Invistigation

6
The easiest thing you could do with your sequence is to find out about its various physico-chemical properties, and do simple prediction, or find out whether your sequence contain known domains.

7
4
The second easiest thing to do is comparing the sequence with a database. Using the ressources in the Chapter 7 of bioinformatics for Dummies. Of course the secret here is to use the right database to ask the right question. If you are not too sure on the databases, indications for proteins can be found in the Chapter 4. These databases will also come very handy to find out about obscure chemical information and complicated post-translational modifications.

8
If you have started gathering sequences, you may want to build a multiple sequence alignment. Yet before you do this, remember that simply comparing the sequence with itself can yield valuable clues. You can use dotlet for this purpose (and other tools) as explained in chapter 8.

9
10
Multiple alignments constitute the best way to present biological sequence information. If you think you have gathered the right sequences (or portion of sequences), you can try using some of the online tools presented in Chapter 9. Go to chapter 10 if you want to make your alignment look flashy.

11

Structures also help! Use Chapter 11 to find out if your there is a way to estimate the structure of your sequence. If your active site, or your phosporilation site appears to be right in the middle of the protein CORE, burried as deep as it can be, you are in trouble!

.

And here comes the complicated part! Making sense of all the information. There is a lots of noise in the data gathered using bioinformatics methods. You must clean it using concistency rules, just like if you were doing an enquiry, confronting witnesses testimonies. Try to folow these three rules

.

Step Question Example

1 Use a previous result to propose an hypothesis According to Prosite Residue 25 could be a Phosphorilation site

2 Design an experiment to test your hypothesis Predict the secondary or the tertiary structure

3 Conclude Residue 25 seems to be deeply burried and is therefore not a good candidate for beeing a phosporilatiuon site

Indications

Formulate your questions clearly: What are you asking? How do you intend to find an answer? How do you interpret the results
Do not hesitate to cut and chop your sequences to simplify some analyses
READ THE DOCUMENTATION OF THE PROGRAMS!!!!!!!!

List of Resources

Site	Content
Single Sequence Analysis
Protparam@ExPASy	Tools to estimate various physico-chemical properties
protscale	Tools to estimate various physico-chemical properties
Pairwise Comparison and Database Search
dotlet	dotlet pairwise comparison
BLAST	BLAST and other database search resources
Lalign	Lalign local alignments
Multiple Sequence Alignments
T-Coffee	Various T-Coffee MSA tools
ClustalO	ClustalO and other aligners at the EBI
Structural Analysis
NCBI/Structure	The NCBI Section dedicated to Structures
TM-HMM	Secondary structure analysis
PredictProtein	Secondary structure analysis
Genome Analysis
ENSEMBL	The ENSEMBL Genome Browser
UCSC	The UCSC Genome Browser
TIGR	The home of bacterial genomes

List of Sample Sequences

FILES

Chapter of Bioinformatics For Dummies	Type of Invistigation
6	The easiest thing you could do with your sequence is to find out about its various physico-chemical properties, and do simple prediction, or find out whether your sequence contain known domains.
7 4	The second easiest thing to do is comparing the sequence with a database. Using the ressources in the Chapter 7 of bioinformatics for Dummies. Of course the secret here is to use the right database to ask the right question. If you are not too sure on the databases, indications for proteins can be found in the Chapter 4. These databases will also come very handy to find out about obscure chemical information and complicated post-translational modifications.
8	If you have started gathering sequences, you may want to build a multiple sequence alignment. Yet before you do this, remember that simply comparing the sequence with itself can yield valuable clues. You can use dotlet for this purpose (and other tools) as explained in chapter 8.
9 10	Multiple alignments constitute the best way to present biological sequence information. If you think you have gathered the right sequences (or portion of sequences), you can try using some of the online tools presented in Chapter 9. Go to chapter 10 if you want to make your alignment look flashy.
11	Structures also help! Use Chapter 11 to find out if your there is a way to estimate the structure of your sequence. If your active site, or your phosporilation site appears to be right in the middle of the protein CORE, burried as deep as it can be, you are in trouble!

Step	Question	Example
1	Use a previous result to propose an hypothesis	According to Prosite Residue 25 could be a Phosphorilation site
2	Design an experiment to test your hypothesis	Predict the secondary or the tertiary structure
3	Conclude	Residue 25 seems to be deeply burried and is therefore not a good candidate for beeing a phosporilatiuon site