Basic Concepts in Information Theory
Information content
- 越稀少的事件带来的surprise越大
Entropy
-
熵是信息量的期望
-
离散的情形
- 连续的情形下叫做微分熵
-
a gaussian variable has the largest entropy among all random variables of equal variance
-
条件分布的熵
-
The conditional entropy \(H(Y \mid X)\) is the average additional informaion needed to specified
Y
- The bayes rule for conditional entropy
Cross entropy
- 交叉熵
-
交叉熵和分类问题中的负对数似然损失函数是等价的,见https://en.wikipedia.org/wiki/Cross_entropy
-
注意交叉熵和联合分布的熵不是一个东西!https://math.stackexchange.com/questions/2505015/relation-between-cross-entropy-and-joint-entropy
KL divergence
- KL散度/相对熵
- Note \(D_{KL}(X\|Y) \neq D_{KL}(Y\|X)\)
Mutual information
- 互信息/互传信息量
- 如果\(X\)和\(Y\)独立,我们有 \(p_{x,y}(X,Y) = p_{X}(x)p_{Y}(y)\)
- 互信息\(X\)和\(Y\)的互信息定义为联合分布 \((X,Y)\) 和 服从\(p_{X}(x)p_{Y}(y)\)的随机变量之间的KL divergence
- Some relationship