Notes for deep learning in biological sequence analysis
- A collection of studies that applies neural network to learn representations for biological sequence
 
Related Resources
- 
    
2017, Cell Systems, Enhancing Evolutionary Couplings with DeepConvolutional Neural Networks
 - 
    
2019, Pre-Training of Deep Bidirectional Protein Sequence Representations with Structural Information
 - 2019, Nature Method, Unified rational protein engineering with sequence-based deep representation learning
    
- MLSTM, minimize next amino-acid prediction cross entropy loss, use fixed size hidden states as feature representation
 - One hidden state activity for each amino acid
 - Average the activity cross all AAs in the full length sequence to get representation for the protein
 - Another choice is use the activity of the last hidden state
 - doc2vec: https://github.com/fhalab/embeddings_reproduction
 - https://github.com/churchlab/UniRep
 
 - 2018, NeurIPS, Neural Edit Operations for Biological Sequences
    
- Replace argmax with softmax in sequence alignment, to make the sequence alignment loss differentiable
 - Related works:
        
- Differentiable DTW loss for time series: https://arxiv.org/pdf/1703.01541.pdf
 - Sequence alignment kernel: 2004, Bioinformatics, Protein homology detection using string alignment kernels
 
 
 - 
    
2021, ICLR, Bertology Meets Biology Interpreting Attention Protein Language Moldes
 - 2021, Bioinformatics, DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome
 - 2021, PNAS, Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
 - 2021, Current Potocols, Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets