Basic Concepts in Information Theory
Information content
- 越稀少的事件带来的surprise越大
 
Entropy
- 
    
熵是信息量的期望
 - 
    
离散的情形
 
- 连续的情形下叫做微分熵
 
- 
    
a gaussian variable has the largest entropy among all random variables of equal variance
 - 
    
条件分布的熵
 - 
    
The conditional entropy \(H(Y \mid X)\) is the average additional informaion needed to specified
Y 
- The bayes rule for conditional entropy
 
Cross entropy
- 交叉熵
 
- 
    
交叉熵和分类问题中的负对数似然损失函数是等价的,见https://en.wikipedia.org/wiki/Cross_entropy
 - 
    
注意交叉熵和联合分布的熵不是一个东西!https://math.stackexchange.com/questions/2505015/relation-between-cross-entropy-and-joint-entropy
 
KL divergence
- KL散度/相对熵
 
- Note \(D_{KL}(X\|Y) \neq D_{KL}(Y\|X)\)
 
Mutual information
- 互信息/互传信息量
 - 如果\(X\)和\(Y\)独立,我们有 \(p_{x,y}(X,Y) = p_{X}(x)p_{Y}(y)\)
 - 互信息\(X\)和\(Y\)的互信息定义为联合分布 \((X,Y)\) 和 服从\(p_{X}(x)p_{Y}(y)\)的随机变量之间的KL divergence
 
- Some relationship