QUESTION: What is the relationship between the variance and entropy of independent one-hot vectors?
This document proves inequalities relating variance, collision entropy and
Shannon entropy of sequences of independent one-hot vectors.
Introduction
Both variance and entropy are measures of uncertainty.
Variance assumes values vary as points in a space separated by distances.
In this document, the variance of a random vector refers to the
variance of the distance from its mean
(sum of the variances of each component).
Random one-hot vectors are a
convenient spacial representation for categorical random variables.
A one-hot vector has
all components equal to 0 except one component that equals 1.
This representation has been used in genetics [1].
For genetic loci with only two alleles, a one-hot vector has two redundant
components. “Half” of such one-hot vectors are typically used in genetics (e.g.
[2] p.40,
[3],
[4]
).
The variance of the “half one-hot vector” is exactly half the variance of its
full one-hot vector.
Main Result
Given N independent random one-hot vectors X1, X2, …, XN, define
X∗=X1×X2×⋯×XN
as the Cartesian product.
The variance of X∗ can be adjusted to form a lower bound to the collision
entropy, H2(X∗), and Shannon entropy, H(X∗):
−log2(1−NVar(X∗))≤NH2(X∗)≤NH(X∗)
If every Xi takes only two equally likely values, then the lower bounds reach equality:
−log2(1−NVar(X∗))=NH2(X∗)=NH(X∗)=1
Proof
Let Mi be length of Xi (the number of categorical values represented by
Xi). Let pi,j represent the probability of Xi taking the j-th
categorical value.
For every 1≤i≤N,
j=1∑Mipi,j=1
The expectation and variance of the i-th one-hot vector Xi is
E(Xi)Var(Xi)=(pi,1,pi,2,…,pi,Mi)=j=1∑Mipi,j⎣⎡(1−pi,j)2+k=j∑(0−pi,k)2⎦⎤=j=1∑Mipi,j[1−2pi,j+k=1∑Mipi,k2]=1−2j=1∑Mipi,j2+k=1∑Mipi,k2=1−j=1∑Mipi,j2
Thus the variance of Xi equals the probability of two independent
samples from Xi being distinct. This probability of distinction has been
called logical entropy [5].
The complement
1−Var(Xi)=j=1∑Mipi,j2
is the chance of repetition, which is expected probability. Taking the negative
log gives Rényi entropy of
order 2, also called collision entropy:
−log2(1−Var(Xi))=−log2(j=1∑Mipi,j2)=H2(Xi)
Since negative log is a concave function, the negative log of expected
probability (collision entropy), is a lower bound to the expected negative log
of probability (Shannon entropy) by Jensen’s
inequality:
H2(Xi)=−log2(j=1∑Mipi,j2)≤j=1∑Mipi,j(−log2pi,j)=H(Xi)
The total variance, can be adjusted to equal the average probability
of one-hot vector repetition (per one-hot vector):
1−NVar(X∗)=1−N1i=1∑NVar(Xi)=N1i=1∑Nj=1∑Mipi,j2
Negative log with Jensen’s
inequality
can then establish yet another lower bound:
−log2(N1i=1∑Nj=1∑Mipi,j2)≤N1i=1∑N(−log2j=1∑Mipi,j2)=N1i=1∑NH2(Xi)
Collision and Shannon entropy are additive for independent variables. Putting
everything together we get
−log2(1−NVar(X∗))≤NH2(X∗)≤NH(X∗)
References
1.
Menozzi P, Piazza A, Cavalli-Sforza L. Synthetic maps of human gene frequencies in Europeans. Science. 1978;201: 786–792. doi:10.1126/science.356262
2.
Weir BS. Genetic data analysis II: Methods for discrete population genetic data. Sunderland, Mass: Sinauer Associates; 1996.
Patterson N, Price AL, Reich D. Population Structure and Eigenanalysis. PLoS Genetics. 2006;2: e190–. doi:10.1371/journal.pgen.0020190
5.
Ellerman D. Logical information theory: New logical foundations for information theory. Logic Journal of the IGPL. 2017;25: 806–835. doi:10.1093/jigpal/jzx022