WORKING DRAFT

Both variance and entropy are measures of uncertainty and thus information. Scoped entropy is a generalization of both statistical variance and Shannon entropy [1]. It jointly generalizes by using a construct of scope of relevance. Scoped entropy can be appiled to both discrete and continuous variables.

A random object is a function with a domain of a probability space [2]. When the function values are real numbers, it is a random variable (also called real random object in this document). In this document, a finite random object means a random object that takes on finitely many values. Or in other words, the range of a finite random object is a set of finite size.

Both entropy and variance are functions of random objects. In the case of Shannon entropy, the random object is finite (with values often referred to as symbols). In the case of variance, the random object is a (real) random variable. The distances between values of a finite random variable affect variance, but not Shannon entropy.

Given a scope of relevance θ\theta and a finite or real random object XX, scoped entropy is denoted as the function Vθ(X) \operatorname{V}_{ \theta}({ X})

There is a unirelevant decomposition function h\mathfrak{h} which given any finite random object XX generates a scope h(X)\mathfrak{h}(X) such that Vh(X)(X)=H(X) \operatorname{V}_{ \mathfrak{h}(X)}({ X}) = \operatorname{H}({ X}) where H(X)\operatorname{H}({ X}) is Shannon entropy.

Similarly, there is a distribution decomposition function d\mathfrak{d} which given any random variable XX generates a scope d(X)\mathfrak{d}(X) such that Vd(X)(X)=Var(X) \operatorname{V}_{ \mathfrak{d}(X)}({ X}) = \operatorname{Var}({ X})

Entropy Scope of Relevance

Let domf\operatorname{dom}{ f} denote the domain of a function ff. Given π\pi a set of sets, let π{\cup{ \pi}} denote the union of all member sets in π\pi. When π\pi is a partition, π\pi covers π{\cup{ \pi}}.

In a scope of relevance, partitions represent questions with each member set (part) an answer. Each answer represents an event of a probability space.

Given a function over partitions of events of probability space Ω\Omega, define levels of the domain: dom0θ:={πdomθ:π=Ω}domi+1θ:={πdomθ:πρ,ρdomiθ,π∉domiθ} \begin{aligned} \operatorname{dom}_{ 0}{ \theta} & := \{ \pi \in \operatorname{dom}{ \theta} : {\cup{ \pi}} = \Omega \} \\ \operatorname{dom}_{ i+1}{ \theta} & := \left\{ \pi \in \operatorname{dom}{ \theta} : {\cup{ \pi}} \in \rho, \rho \in \operatorname{dom}_{ i}{ \theta} , \pi \not\in \operatorname{dom}_{ i}{ \theta} \right\} \\ \end{aligned}

A scope of relevance θ\theta is a non-negative real-valued function over partitions of events (subsets) of a probability space Ω\Omega with the following conditions:

  1. At most one partition can cover any event. Formally, π=ρ{\cup{ \pi}} = {\cup{ \rho}} implies π=ρ\pi = \rho for any partitions π\pi and ρ\rho in domθ\operatorname{dom}{ \theta}.

  2. domθ=i=0domiθ\operatorname{dom}{ \theta} = \bigcup_{i=0}^\infty \operatorname{dom}_{ i}{ \theta}

The domain of a scope of relevance is all relevant questions. Each real value assigned to a question is a degree of relevance. The second condition in the definition means the partitions (questions) are dividing the probability space Ω\Omega in a nested hierarchical manner. Or in other words, either the event of all outcomes (Ω\Omega) or the event of an answer to a question is the event under which another question can be asked.

Scope product

π×ρ:={AB:Aπ,Bρ}π×B:={AB:Aπ} \begin{aligned} \pi \times \rho & := \{ A \cap B : A \in \pi, B \in \rho \} \\ \pi \times B & := \{ A \cap B : A \in \pi \} \\ \end{aligned}

Consider any two scopes θ\theta and ϕ\phi sharing probability space Ω\Omega. Define θ×ϕ:=i=0πdomiθρdomiϕ(π)(ρ)C(π,ρ) \theta \times \phi := \bigcup_{i=0}^\infty \bigcup_{\substack{\pi \in \operatorname{dom}_{ i}{ \theta} \\ \rho \in \operatorname{dom}_{ i}{ \phi} \\ ({\cup{ \pi}}) \cap ({\cup{ \rho}}) \not= \emptyset }} \mathcal{C}(\pi, \rho) where C(π,ρ)={{(π×ρ)θ(π)} when θ(π)=ϕ(ρ){πθ(π)}Aπ{ρ×Aϕ(ρ)} when θ(π)>ϕ(ρ){ρϕ(ρ)}Aρ{π×Aθ(π)} when θ(π)<ϕ(ρ) \mathcal{C}(\pi, \rho) = \begin{cases} \{ (\pi \times \rho) \mapsto \theta(\pi) \} & \text{ when } \theta(\pi) = \phi(\rho) \\ \left\{ \pi \mapsto \theta(\pi) \right\} \cup \bigcup_{A \in \pi} \left\{ \rho \times A \mapsto \phi(\rho) \right\} & \text{ when } \theta(\pi) > \phi(\rho) \\ \left\{ \rho \mapsto \phi(\rho) \right\} \cup \bigcup_{A \in \rho} \left\{ \pi \times A \mapsto \theta(\pi) \right\} & \text{ when } \theta(\pi) < \phi(\rho) \\ \end{cases}

Theorem (Scope product is a scope of relevance)

For any two scopes of relevance θ\theta and ϕ\phi sharing probability space Ω\Omega, θ×ϕ\theta \times \phi is also a scope of relevance.

Proof

TBD

Distribution Decomposition

Given a set of random variables {Zi}iI\{ Z_i \}_{i \in I} with countable index set II, and probability space Ω\Omega, for each iIi \in I, define functions Bi(s){B}_{ i}(s) and ci(s)c_i(s) over strings of alphabet {0,1}\{ \mathtt{0}, \mathtt{1} \} [3].

Bi(ϵ):=Ω {B}_{ i}(\epsilon) := \Omega For any string ss with P(Bi(s))>0{ \operatorname{P}({ {B}_{ i}(s) }) > 0} , ci(s):=E ⁣(ZiBi(s))Bi(s0):=Bi(s){Zi<ci(s)}Bi(s1):=Bi(s){Zi>ci(s)} \begin{aligned} c_i(s) & := \operatorname{E}\!\left({ Z_i | {B}_{ i}(s) }\right) \\ {B}_{ i}(s \mathtt{0}) & := {B}_{ i}(s) \cap \{ Z_i < c_i(s) \} \\ {B}_{ i}(s \mathtt{1}) & := {B}_{ i}(s) \cap \{ Z_i > c_i(s) \} \\ \end{aligned} otherwise Bi(s):= {B}_{ i}(s\ell) := \emptyset

A new probability space Ω\Omega' is defined by extending Ω\Omega such that for each iIi \in I and string s{0,1}s \in \{ \mathtt{0}, \mathtt{1} \}^* with P(Bi(s))P({Zi=ci(s)})>0\operatorname{P}({ {B}_{ i}(s)}) - \operatorname{P}({ \{Z_i = c_i(s)\} }) > 0, there are new disjoint events Ci,0(s)C_{i,0}(s) and Ci,1(s)C_{i,1}(s) such that {Zi=ci(s)}=Ci,0(s)Ci,1(s) \{Z_i = c_i(s)\} = C_{i,0}(s) \cup C_{i,1}(s) and for each {0,1}\ell \in \{ \mathtt{0}, \mathtt{1} \} and all events SS in Ω\Omega P(SCi,(s))=P(S{Zi=ci(s)})P(Bi(s))P(Bi(s))P({Zi=ci(s)}) \operatorname{P}({ S \cap C_{i,\ell}(s) }) = \operatorname{P}({ S \cap \{Z_i = c_i(s)\} }) \frac{ \operatorname{P}({ {B}_{ i}(s\ell)}) }{ \operatorname{P}({ {B}_{ i}(s)})-\operatorname{P}({ \{Z_i = c_i(s)\} }) } Define Bi(s):=Bi(s)Ci,(s)vi(s)={0,1}P(Bi(s)Bi(s))(ci(s)c(0))2hi(s)={0,1}h(Bi(s)Bi(s))θi:={{Bi(s0),Bi(s1)}vi(s)/hi(s):P(Bi(s))>0} \begin{aligned} B'_i(s) & := {B}_{ i}(s) \cup C_{i,\ell}(s) \\ v_i(s) & = \sum_{\ell \in \{ \mathtt{0}, \mathtt{1} \}} \operatorname{P}({ {B}_{ i}(s\ell)|{B}_{ i}(s)}) \left( c_i(s\ell)-c(0) \right)^2 \\ h_i(s) & = \sum_{\ell \in \{ \mathtt{0}, \mathtt{1} \}} h({B}_{ i}(s\ell)|{B}_{ i}(s)) \\ \theta_i & := \left\{ \{B'_i(s\mathtt{0}),B'_i(s\mathtt{1})\} \mapsto v_i(s)/h_i(s) : \operatorname{P}({ {B}_{ i}(s)}) > 0 \right\} \\ \end{aligned}

The scope product across all θi\theta_i is the distribution decomposition of {Zi}iI\{ Z_i \}_{i \in I}.

TODO: scope product across countable index set

Scoped Entropy

Given a scope of relevance θ\theta, let:

Aθ(0):={Ω}Aθ(i+1):={AB:AAθ(i),Bπ,πdomiθ}h(AB):=P(AB)log21P(AB)Uπ(Q):=P(πQ)Sπh(SπQ)Uθ(Q):=i=0AAθ(i)P(AQ)πdomiθθ(π)Uπ(AQ)Uθ(π):=sup{QρP(Q)Uθ(Q): finite partition ρ coarser than (or equal to)π}Vθ(Q):=Uθ(Ω)Uθ(Q)Vθ(π):=Uθ({Ω})Uθ(π)kerX:={{ωΩ:X(ω)=v}:v in the range of X}Uθ(X):=Uθ(kerX)Vθ(X):=Vθ(kerX) \begin{aligned} \mathcal{A}_\theta(0) & := \{ \Omega \} \\ \mathcal{A}_\theta(i+1) & := \left\{ A \cap B : A \in \mathcal{A}_\theta(i), B \in \pi, \pi \in \operatorname{dom}_{ i}{ \theta} \right\} \\ h(A|B) & := \operatorname{P}({ A|B}) \log_2\frac{1}{\operatorname{P}({ A|B})} \\ \operatorname{U}_{ \pi}(Q) & := \operatorname{P}({ {\cup{ \pi}}|Q}) \sum_{S \in \pi} h(S|{\cup{ \pi}} \cap Q) \\ \operatorname{U}_{ \theta}(Q) & := \sum^\infty_{i = 0} \sum_{A \in \mathcal{A}_\theta(i)} \operatorname{P}({ A|Q}) \sum_{\pi \in \operatorname{dom}_{ i}{ \theta}} \theta(\pi) \operatorname{U}_{ \pi}(A \cap Q) \\ \operatorname{U}_{ \theta}(\pi) & := \sup \left\{ \sum_{Q \in \rho} \operatorname{P}({ Q}) \operatorname{U}_{ \theta}(Q) : \text{ finite partition } \rho \underset{\text{(or equal to)}}{\text{ coarser than }} \pi \right\} \\ \operatorname{V}_{ \theta}({ Q}) & := \operatorname{U}_{ \theta}(\Omega) - \operatorname{U}_{ \theta}(Q) \\ \operatorname{V}_{ \theta}({ \pi}) & := \operatorname{U}_{ \theta}(\{\Omega\}) - \operatorname{U}_{ \theta}(\pi) \\ \ker{X} & := \left\{ \{ \omega \in \Omega : X(\omega)=v \} : v \text{ in the range of } X \right\} \\ \operatorname{U}_{ \theta}(X) & := \operatorname{U}_{ \theta}(\ker{X}) \\ \operatorname{V}_{ \theta}({ X}) & := \operatorname{V}_{ \theta}({ \ker{X}}) \\ \end{aligned}

Shannon Entropy Equality

Given any finite random object XX, a unirelevant decomposition is the trivial mapping of XX to a scope θ\theta assigning 11 to a single partition consisting of the events for each value taken by XX: domθ={kerX}θ(kerX)=1 \begin{aligned} \operatorname{dom}{ \theta} & = \{\ker{X}\} \\ \theta(\ker{X}) & = 1 \\ \end{aligned}

Theorem: Unirelevant Scoped Entropy equals Shannon Entropy

Given any finite random objects XX and YY and scope θ\theta equal to the unirelevant decomposition of XX, Uθ(Y)=H(XY) \operatorname{U}_{ \theta}(Y) = \operatorname{H}({ X|Y}) It follows as a corollary that Vθ(X)=H(X) \operatorname{V}_{ \theta}({ X}) = \operatorname{H}({ X})

Proof

Consider any single partition scope θ={π1}\theta = \{\pi \mapsto 1 \}. Both Aθ(i)\mathcal{A}_\theta(i) and domiθ\operatorname{dom}_{ i}{ \theta} are only non-empty at i=0i=0 and equal {Ω}\{\Omega\} and {π}\{\pi\} respectively. Thus by definition Uθ(Y)=QkerYP(Q)P(ΩQ)θ(π)P(πΩ,Q)Bπh(Bπ,Ω,Q)=QkerYP(Q)Bπh(BQ)=QkerYBπP(QB)log21P(BQ)=H(XQ) \begin{aligned} \operatorname{U}_{ \theta}(Y) & = \sum_{Q \in \ker{Y}} \operatorname{P}({ Q}) \operatorname{P}({ \Omega|Q}) \theta(\pi) \operatorname{P}({ {\cup{ \pi}}|\Omega,Q}) \sum_{B \in \pi} h(B|{\cup{ \pi}},\Omega,Q) \\ & = \sum_{Q \in \ker{Y}} \operatorname{P}({ Q}) \sum_{B \in \pi} h(B|Q) \\ & = \sum_{Q \in \ker{Y}} \sum_{B \in \pi} \operatorname{P}({ Q \cap B}) \log_2\frac{1}{\operatorname{P}({ B|Q})} \\ & = \operatorname{H}({ X|Q}) \\ \end{aligned} Proof of the corollary is: Vθ(X)=Uθ(Ω)Uθ(X)=H(X{Ω})H(XX)=H(X) \operatorname{V}_{ \theta}({ X}) = \operatorname{U}_{ \theta}(\Omega) - \operatorname{U}_{ \theta}(X) = \operatorname{H}({ X|\{\Omega\}}) - \operatorname{H}({ X|X}) = \operatorname{H}({ X})

Benefit of scoped entropy over variance

A feature analogous to mutual information is the following:

For any random variable XX and random object YY of finitely many values, Vd(X)(Y)=0 if and only if  X and Y are independent \begin{array}{lcr} \operatorname{V}_{ \mathfrak{d}(X)}({ Y}) = 0 & \text{ if and only if } & \text{ $X$ and $Y$ are independent} \end{array}

References

1.
Shannon CE, Weaver W. The mathematical theory of communication. Urbana: Univ. of Illinois Press; 1998.
2.
Ash RB, Doléans-Dade C, Ash RB. Probability and measure theory. 2nd ed. San Diego: Harcourt/Academic Press; 2000.
3.
Hopcroft JE, Ullman JD. Introduction to automata theory, languages, and computation. Reading, Mass: Addison-Wesley; 1979.