- Send email
-
Post an issue
on gitlab.com -
with Hypothesis
Abstract
DOCUMENT TYPE: Open Study Answer
QUESTION: What is the relationship between the variance and entropy of independent one-hot vectors?
This document proves inequalities relating variance, collision entropy and Shannon entropy of sequences of independent one-hot vectors.
Introduction
Both variance and entropy are measures of uncertainty. Variance assumes values vary as points in a space separated by distances. In this document, the variance of a random vector refers to the variance of the distance from its mean (sum of the variances of each component).
Random one-hot vectors are a convenient spacial representation for categorical random variables. A one-hot vector has all components equal to except one component that equals . This representation has been used in genetics [1]. For genetic loci with only two alleles, a one-hot vector has two redundant components. “Half” of such one-hot vectors are typically used in genetics (e.g. [2] p.40, [3], [4] ). The variance of the “half one-hot vector” is exactly half the variance of its full one-hot vector.
Main Result
Given independent random one-hot vectors , , …, , define
The variance of can be adjusted to form a lower bound to the collision entropy, , and Shannon entropy, :
If every takes only two equally likely values, then the lower bounds reach equality:
Proof
Let be length of (the number of categorical values represented by ). Let represent the probability of taking the -th categorical value.
For every ,
The expectation and variance of the -th one-hot vector is
Thus the variance of equals the probability of two independent samples from being distinct. This probability of distinction has been called logical entropy [5].
The complement
The total variance, can be adjusted to equal the average probability of one-hot vector repetition (per one-hot vector):
Negative log with Jensen’s inequality can then establish yet another lower bound:
Collision and Shannon entropy are additive for independent variables. Putting everything together we get
References
-
Menozzi, P, A Piazza, and L Cavalli-Sforza. “Synthetic maps of human gene frequencies in Europeans”. Science, vol. 201, no. 4358, Sep. 1978, pp. 786–792, 2 Jul. 2020, https://doi.org/10.1126/science.356262, https://www.sciencemag.org/lookup/doi/10.1126/science.356262.
-
Weir, B. S. Genetic data analysis II: Methods for discrete population genetic data. Sunderland, Mass: Sinauer Associates, 1996.
-
Weir, B. S., and W. G. Hill. “Estimating F-Statistics”. Annual Review of Genetics, vol. 36, no. 1, Dec. 2002, pp. 721–750, 5 Mar. 2020, https://doi.org/10.1146/annurev.genet.36.050802.093940, http://www.annualreviews.org/doi/10.1146/annurev.genet.36.050802.093940.
-
Patterson, Nick, Alkes L. Price, and David Reich. “Population Structure and Eigenanalysis”. PLoS Genetics, vol. 2, no. 12, 2006, p. e190, 2 Jul. 2020, https://doi.org/10.1371/journal.pgen.0020190, https://dx.plos.org/10.1371/journal.pgen.0020190.
-
Ellerman, David. “Logical information theory: New logical foundations for information theory”. Logic Journal of the IGPL, vol. 25, no. 5, Oct. 2017, pp. 806–835, 13 Jun. 2020, https://doi.org/10.1093/jigpal/jzx022, http://academic.oup.com/jigpal/article/25/5/806/4070969/Logical-information-theory-new-logical-foundations.