This page centralizes all the code information relative to the PAVIE/Bioinformatics project, a collaboration between the PAVIE (Jacques-Antoine Gauthier and Eric Widmer) and the Swiss Institute of Bioinformatics (Philipp Bucher, Cedric Notredame)
seq_reformat -in |your sequences| -action +pavie_seq2pavie_mat [_IDXX_TWEXX[THRid]_[CHANNELn]
Various mode of identity measure are implemented (Trainning WEighting)
Various mode of identity measure are implemented
Various mode of identity measure are implemented
>string1.channel1 abcdef >string2.channel1 ab >string3.channel1 abc >string1.channel2 abzeff >string2.channel2 ef >string3.channel2 fxx
It is possible to use the age as a channel. This simply requires generating two extra channels that will be used to encode the age, along with the associated substitution matrices
EXAMPLE:seq_reformat -in myseq.fasta -output pavie_age_channel -out xyz
>name _FIRSTYEARXX_where XX will be used as the offset of the first year
Validation is made by replacing a symbol (a for instance) with two other arbitrarily chosen symbols (c and d) that are otherwise absent from the sequences. The substitution is made across the entire sequence set
The new dataset should then be used to train a matrix. If the trainning procedure is adequate, the matrix should have the following properties:
The random sequences can be generated as folows:
EXAMPLE: seq_reformat -in myseq.fasta -action +pavie_seq2random_seq axw > outseq
in this case, axw indicates that a will be replaced with x OR w
EXAMPLE: seq_reformat -in myseq.fasta -action +pavie_seq2pavie_aln pavie_matrix.cycle_1.mat_list _MATDIST_ID01_
EXAMPLE: seq_reformat -in myseq -action +pavie_seq2pavie_aln pavie_matrix_.cy_0.mat_list |
EXAMPLE: seq_reformat -in myseq.fasta -action +pavie_seq2pavie_aln pavie_matrix.cycle_1.mat_list _ID02_
EXAMPLE: seq_reformat -in myseq.fasta -action +pavie_seq2pavie_aln pavie_matrix.cycle_1.mat_list _ID02_MCHSCORE1_
EXAMPLE:seq_reformat -in myseq.fasta -output transitions|