Both variance and entropy are measures of uncertainty and thus information.
Scoped entropy is a generalization of both statistical variance and Shannon
entropy [1]. It jointly generalizes by using a
construct of scope of relevance.
Scoped entropy can be appiled to both discrete and continuous variables.
A random object is a function with a domain of a probability space
[2]. When the function values are real numbers, it is a
random variable (also called real random object in this document).
In this document, a finite random object means a random object that takes on
finitely many values. Or in other words, the range of a finite random object
is a set of finite size.
Both entropy and variance are functions of random objects.
In the case of Shannon entropy, the random object is finite (with values often
referred to as symbols).
In the case of variance, the random object is a (real) random variable.
The distances between values of a finite random variable affect variance, but
not Shannon entropy.
Given a scope of relevance θ and a finite or real random object X,
scoped entropy is denoted as the function
Vθ(X)
There is a unirelevant decomposition function h which given any
finite random object X generates a scope h(X) such that
Vh(X)(X)=H(X)
where H(X) is Shannon entropy.
Similarly, there is a distribution decomposition function d
which given any random variable X generates a scope d(X) such that
Vd(X)(X)=Var(X)
Entropy Scope of Relevance
Let domf denote the domain of a function f.
Given π a set of sets, let ∪π denote the union of all member
sets in π. When π is a partition, π covers ∪π.
In a scope of relevance, partitions represent questions with each member
set (part) an answer. Each answer represents an event of a probability space.
Given a function over partitions of events of probability space Ω,
define levels of the domain:
dom0θdomi+1θ:={π∈domθ:∪π=Ω}:={π∈domθ:∪π∈ρ,ρ∈domiθ,π∈domiθ}
A scope of relevanceθ is a non-negative real-valued function
over partitions of events (subsets) of a probability space Ω
with the following conditions:
At most one partition can cover any event.
Formally, ∪π=∪ρ implies π=ρ
for any partitions π and ρ in domθ.
domθ=⋃i=0∞domiθ
The domain of a scope of relevance is all relevant questions.
Each real value assigned to a question is a degree of relevance.
The second condition in the definition means the partitions (questions)
are dividing the probability space Ω in a nested hierarchical manner.
Or in other words, either the event of all outcomes (Ω) or the event of
an answer to a question is the event under which another question can be asked.
Scope product
π×ρπ×B:={A∩B:A∈π,B∈ρ}:={A∩B:A∈π}
Consider any two scopes θ and ϕ sharing probability space Ω.
Define
θ×ϕ:=i=0⋃∞π∈domiθρ∈domiϕ(∪π)∩(∪ρ)=∅⋃C(π,ρ)
where
C(π,ρ)=⎩⎨⎧{(π×ρ)↦θ(π)}{π↦θ(π)}∪⋃A∈π{ρ×A↦ϕ(ρ)}{ρ↦ϕ(ρ)}∪⋃A∈ρ{π×A↦θ(π)} when θ(π)=ϕ(ρ) when θ(π)>ϕ(ρ) when θ(π)<ϕ(ρ)
Theorem (Scope product is a scope of relevance)
For any two scopes of relevance θ and ϕ sharing probability space
Ω, θ×ϕ is also a scope of relevance.
Proof
TBD
Distribution Decomposition
Given a set of random variables {Zi}i∈I with countable index set I,
and probability space Ω,
for each i∈I,
define functions Bi(s) and ci(s) over strings of alphabet
{0,1}[3].
Bi(ϵ):=Ω
For any string s with P(Bi(s))>0 ,
ci(s)Bi(s0)Bi(s1):=E(Zi∣Bi(s)):=Bi(s)∩{Zi<ci(s)}:=Bi(s)∩{Zi>ci(s)}
otherwise
Bi(sℓ):=∅
A new probability space Ω′ is defined by extending Ω
such that for each i∈I and string s∈{0,1}∗
with P(Bi(s))−P({Zi=ci(s)})>0,
there are new disjoint events Ci,0(s) and Ci,1(s) such that
{Zi=ci(s)}=Ci,0(s)∪Ci,1(s)
and for each ℓ∈{0,1}
and all events S in ΩP(S∩Ci,ℓ(s))=P(S∩{Zi=ci(s)})P(Bi(s))−P({Zi=ci(s)})P(Bi(sℓ))
Define
Bi′(s)vi(s)hi(s)θi:=Bi(s)∪Ci,ℓ(s)=ℓ∈{0,1}∑P(Bi(sℓ)∣Bi(s))(ci(sℓ)−c(0))2=ℓ∈{0,1}∑h(Bi(sℓ)∣Bi(s)):={{Bi′(s0),Bi′(s1)}↦vi(s)/hi(s):P(Bi(s))>0}
The scope product across all θi is the distribution decomposition
of {Zi}i∈I.
TODO: scope product across countable index set
Scoped Entropy
Given a scope of relevance θ, let:
Aθ(0)Aθ(i+1)h(A∣B)Uπ(Q)Uθ(Q)Uθ(π)Vθ(Q)Vθ(π)kerXUθ(X)Vθ(X):={Ω}:={A∩B:A∈Aθ(i),B∈π,π∈domiθ}:=P(A∣B)log2P(A∣B)1:=P(∪π∣Q)S∈π∑h(S∣∪π∩Q):=i=0∑∞A∈Aθ(i)∑P(A∣Q)π∈domiθ∑θ(π)Uπ(A∩Q):=sup⎩⎨⎧Q∈ρ∑P(Q)Uθ(Q): finite partition ρ(or equal to) coarser than π⎭⎬⎫:=Uθ(Ω)−Uθ(Q):=Uθ({Ω})−Uθ(π):={{ω∈Ω:X(ω)=v}:v in the range of X}:=Uθ(kerX):=Vθ(kerX)
Shannon Entropy Equality
Given any finite random object X, a unirelevant decomposition is the
trivial mapping of X to a scope θ assigning 1 to a single partition
consisting of the events for each value taken by X:
domθθ(kerX)={kerX}=1
Given any finite random objects X and Y and scope θ
equal to the unirelevant decomposition of X,
Uθ(Y)=H(X∣Y)
It follows as a corollary that
Vθ(X)=H(X)
Proof
Consider any single partition scope θ={π↦1}.
Both Aθ(i) and domiθ are only non-empty
at i=0 and equal {Ω} and {π} respectively. Thus by definition
Uθ(Y)=Q∈kerY∑P(Q)P(Ω∣Q)θ(π)P(∪π∣Ω,Q)B∈π∑h(B∣∪π,Ω,Q)=Q∈kerY∑P(Q)B∈π∑h(B∣Q)=Q∈kerY∑B∈π∑P(Q∩B)log2P(B∣Q)1=H(X∣Q)
Proof of the corollary is:
Vθ(X)=Uθ(Ω)−Uθ(X)=H(X∣{Ω})−H(X∣X)=H(X)
Benefit of scoped entropy over variance
A feature analogous to mutual information is the following:
For any random variable X and random object Y of finitely many values,
Vd(X)(Y)=0 if and only if X and Y are independent
References
1.
Shannon CE, Weaver W. The mathematical theory of communication. Urbana: Univ. of Illinois Press; 1998.
2.
Ash RB, Doléans-Dade C, Ash RB. Probability and measure theory. 2nd ed. San Diego: Harcourt/Academic Press; 2000.
3.
Hopcroft JE, Ullman JD. Introduction to automata theory, languages, and computation. Reading, Mass: Addison-Wesley; 1979.