Paper list

Implementations

Unsupervised hashing techniques

  • iterative quantization (ITQ)
  • spectral hashing
  • discrete graph hashing
  • spherical Hashing
  • anchor graph hashing
  • stochastic generative hashing

Reading Notes

Review on deep hash

  • 2020, A Survey on Deep Hashing Methods

  • Network Architecture
    • Shallower architectures (AlexNet, CNN-F) for simple datasets (MINST, CIFAR-10)
    • Deep architectures (VGGNet, ResNet50) for complex datasets such as NUSWIDE and COCO
  • Similarity measurement
    • Input space: the ground truth
    • Embedding space and hash space
  • The loss fucntion
    • The similarity loss: preserve the similarity ordering in input space to hash space
    • Classification loss: sematic label of input samples may be utilized
    • Quantization loss: push embedding close to -1 or 1
      • pairwise quantization loss
      • Cauchy quantization loss
    • Other regularizing terms
  • Optimization
    • the sign function is non-differentiable, and can not produce back propogate gradient
    • so-called ill-posed gradient problem
    • Several solution
      • add quantization loss as a penalty term
      • only use back propagation for solving a subproblem
      • adopt that continuous relaxation by replacing sign function with tanh or sigmoid

Decorrelate the embedding

  • We expect compact hash code, but sometimes different bits in the learnt embedding can be highly correlated, lead to information redundancy
  • Some study decorrelate the embedding by an additional loss term
  • Denote the embedded binary code as \(B\), then the decorrelation loss can be defined as
\[L_{DeCov} = \frac{1}{2}(\|C\|_{F}^{2}-\|diag(C)\|_{F}^{2})\]
  • \(C\) is the covariance matrix of the binary embedding
\[C_{i,j}=\frac{1}{N}\sum_{n}(B_{i}^{n}-\mu_{i})(B_{j}^{n}-\mu_{j})^T\]

Balancing the code usage

HashNet

  • Subject to the ill-posed gradient difficulty in the optimization with sign activations, existing deep learning to hash methods need to first learn continuous representations and then generate binary hash codes in a separated binarization step, which suffer from substantial loss of retrieval quality

DPSH

Notes

  • For binary code \(b_{i}\) and \(b_{j}\) of \(K\) bits, their hamming distance is:
\[dist_{H} = \frac{1}{2} (K-b_{i}^{T}b_{j})\]
  • Suppose we have \(N\) samples, the label contains \(C\) categories, a binary encoded label is \(Y_{c,N}\)
  • According to the label matrix, we have a similarity matrix \(S_{N,N}\), \(S_{i,j}=1\) if sample \(i\) and sample \(j\) shares same label, otherwise \(S_{i,j}=0\)
  • Suppose the binary codes is \(B_{K,N}\). Given the similarity matrix \(S_{N,N}\), the posterior probability of the binary code is:
\[p(B|S) \propto P(S|B)P(B) = \sum_{s_{i,j} \in S}{p(s_{i,j}|B)p(B)}\]
  • We can see the larger the dot product, the smaller the hamming distance

  • Use a sigmoid function for modeling the relation between \(s_{i,j}\) and learnt binary code

\[p(s_{i,j}=1|B) = \frac{1}{1+e^{-\frac{1}{2}b_{i}^{T}b_{j}}}\]
  • We use negative conditional loglikelihood of similarity matrix as dissimilarity loss
\[\begin{align*} J =& -\ln p(S|B) = - \sum_{s_{i,j} \in S}{p(s_{i,j}|B)}\\ =& -\sum_{s_{i,j}=0}{p(s_{i,j}|B)}-\sum_{s_{i,j}=1}{p(s_{i,j}|B)} \\ =& -\sum_{s_{i,j}=1}{\frac{1}{2}b_{i}^{T}b_{j}-\ln(1+e^{\frac{1}{2}b_{i}^{T}b_{j}})} -\sum_{s_{i,j}=0}{\ln{1}-\ln(1+e^{\frac{1}{2}b_{i}^{T}b_{j}})} \\ =& -\sum_{s_{i,j}=1}{\frac{1}{2}b_{i}^{T}b_{j}*1-\ln(1+e^{\frac{1}{2}b_{i}^{T}b_{j}})} -\sum_{s_{i,j}=0}{\frac{1}{2}b_{i}^{T}b_{j}*0-\ln(1+e^{\frac{1}{2}b_{i}^{T}b_{j}})} \\ =& -\sum_{s_{i,j} \in S}{(\frac{1}{2}s_{i,j}b_{i}^{T}b_{j}-\ln(1+e^{\frac{1}{2}b_{i}^{T}b_{j}}))} \end{align*}\]
  • Ideal binary code should achieve good classification performance

  • Suppose \(W_{K,C}\) is the weight parameter, \(Y=W^{T}B\)

  • The classification loss can be written as

\[Q = \sum_{i=1}^{N}L(y_{i},W^{T}B) + \lambda||W||^2_F\]
  • it is essential to keep the discrete nature of the binary codes
  • Solution: use a auxiliary variable

Implementations