An Introduction To Sequence Comparison and Database Search

Bergen University, May 2020

Cedric Notredame





OBJECTIVES

The first part is dedicated to molecular evolution. We will focus on the implications of molecular evolution on sequence variation. We will use these concepts to define homology. We will then see how specific mathematical models (the substitution matrices) have been derived in order to quantify the evolutionary relationship between sequences. In the second part we will cover the pairwise and single sequence analysis. We will introduce the Needleman and Wunsch algorithm (Dynamic programming), a very basic algorithm that makes it possible to derive pairwise alignments from the sequences while using the substitution matrices. Implementation aspects will be extensively covered, including Linear Space implementations and pair-HMM alternatives for pairwise sequence alignments. These dynamic programming algorithms will be compared for both scope and application to the Burrow Wheeler Transform algorithm an essential component of genomic data aligners. The course will finish with an introduction of the Zuker RNA fold algorithm including an overview of the CYK algorithm . The third part of the course will deal with Multiple Sequence Alignment algorithms of both RNA and Protein sequences. We will introduce the basic notion of this important class of algorithm and will then move to the most recent algorithms, able to take into account protein and RNA 3D structure while building the models. We will discuss some recent results on the relationship between MSA computation and Tree reconstruction. The course will finish with a detailed description of the most recent large scale algorithm including the recently published Regressive Algorithm.




REQUISITES

Practicals will be in python 3. They will involve adapting existing scripts rather than programming from scratch. For this reason, students with no practical knowledge of python but with a good grasp of programming languages like Perl, C or JAVA should manage reasonnably well. Students are expected to bring their own lap-top and advised to have a LINUX boot.



Send your Questions to: cedric.notredame@crg.eu



DateLocationSessionTitleLinks
Day 1NORBISLECTUREPairwise comparisons in an evolutionary contextL
Day 1NORBISPRACTICALSParsing Biological FilesP
Day 1NORBISPRACTICALSComputing Substitution MatricesP
.
Day 2NORBISLECTUREIntroduction to Dynamic ProgrammingL
Day 2NORBISPRACTICALSIntroduction to Dynamic Programming 1P
Day 2NORBISPRACTICALSIntroduction to Dynamic Programming 2P
.
Day 3NORBISLECTUREIntroduction to Multiple Sequence AlignmentL
Day 3NORBISLECTUREMultiple Sequence Alignment AlgorithmsL
Day 3NORBISPRACTICALSParsing and manipulating binary treesP
Day 3NORBISPRACTICALSImplementing your own multiple sequence alignerP
Day 3NORBISBIBLIOMSA papersPPPP
.
Day 4NORBISLECTURERNA Folding AlgorithmsL
Day 4NORBISPRACTICALSAdpating the Nussinv AlgorithmP
Day 4NORBISBIBLIORNA Secondary Structure PredictionPP
.
Day 5NORBISLECTURERNA Genomic Analysis using RNAseqL
Day 5NORBISBIBLIOShort Reads MappingPPP
.
Day 5NORBISPRACTICALSProject Report - Delivery date: 15/06/2020P
.



REFERENCES

1. Algorithms: Durbin et al., Biological Sequence Analysis, 1999, Oxford Press

2. Algorithms: Python for Biologists: A complete programming course for beginners, Martin Jones,2013 , Createspace Independent Publishing Platform

3. Evolution: Pathy, Protein Evolution, 2007, Blackwell



This Entire Course Was Automatically Generated Using BED, the Bioinformatics Exercise Database. BED is a freeware available on request Cedric Notredame