Statistics for Sequence Alignment
Karlin-Altschul statistics
- Evaluate statistical significance of hits in blast search
- 1990, PNAS, S Karlin & S F Altschul, Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes
- The expected number of ungapped alignments with score \(S\) found with random sequences \(E\) was demostrated to be
-
\(m\) is length of sequence 1, \(n\) is length of sequence 2, \(K\) is a constant
-
Given the substitution matrix \(S_{i,j}\), and frequency of the character \(a_{i}\) and \(a_{j}\)
- Alternatively, the alignment score \(S\) follows extreme value distribution (EVD), where \(u=\frac{lnKmn}{\lambda}\)
-
\(E(S)\) can be interpreted as expected value in poisson distribution
-
Readings:
- http://www.bioinfo.org.cn/lectures/index-47.html
- https://personal.utdallas.edu/~prr105020/biol6385/2018/lecture/Stat_sig.pdf
- http://pedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro_prev/articles/sequence_alignment/Korf_BLAST_essential_OReilly.pdf
- https://www.sciencedirect.com/science/article/pii/S0076687996660297?via%3Dihub