Scoped Entropy

WORKING DRAFT

Both variance and entropy are measures of uncertainty and thus information. Scoped entropy is a generalization of both statistical variance and Shannon entropy [1]. It jointly generalizes by using a construct of scope of relevance. Scoped entropy can be appiled to both discrete and continuous variables.

A random object is a function with a domain of a probability space [2]. When the function values are real numbers, it is a random variable (also called real random object in this document). In this document, a finite random object means a random object that takes on finitely many values. Or in other words, the range of a finite random object is a set of finite size.

Both entropy and variance are functions of random objects. In the case of Shannon entropy, the random object is finite (with values often referred to as symbols). In the case of variance, the random object is a (real) random variable. The distances between values of a finite random variable affect variance, but not Shannon entropy.

Given a scope of relevance $\theta$ and a finite or real random object $X$ , scoped entropy is denoted as the function $\operatorname{V}_{ \theta}({ X})$

There is a unirelevant decomposition function $\mathfrak{h}$ which given any finite random object $X$ generates a scope $\mathfrak{h}(X)$ such that $\operatorname{V}_{ \mathfrak{h}(X)}({ X}) = \operatorname{H}({ X})$ where $\operatorname{H}({ X})$ is Shannon entropy.

Similarly, there is a distribution decomposition function $\mathfrak{d}$ which given any random variable $X$ generates a scope $\mathfrak{d}(X)$ such that $\operatorname{V}_{ \mathfrak{d}(X)}({ X}) = \operatorname{Var}({ X})$

Entropy Scope of Relevance

Let $\operatorname{dom}{ f}$ denote the domain of a function $f$ . Given $\pi$ a set of sets, let ${\cup{ \pi}}$ denote the union of all member sets in $\pi$ . When $\pi$ is a partition, $\pi$ covers ${\cup{ \pi}}$ .

In a scope of relevance, partitions represent questions with each member set (part) an answer. Each answer represents an event of a probability space.

Given a function over partitions of events of probability space $\Omega$ , define levels of the domain: $\begin{aligned} \operatorname{dom}_{ 0}{ \theta} & := \{ \pi \in \operatorname{dom}{ \theta} : {\cup{ \pi}} = \Omega \} \\ \operatorname{dom}_{ i+1}{ \theta} & := \left\{ \pi \in \operatorname{dom}{ \theta} : {\cup{ \pi}} \in \rho, \rho \in \operatorname{dom}_{ i}{ \theta} , \pi \not\in \operatorname{dom}_{ i}{ \theta} \right\} \\ \end{aligned}$

A scope of relevance $\theta$ is a non-negative real-valued function over partitions of events (subsets) of a probability space $\Omega$ with the following conditions:

At most one partition can cover any event. Formally, ${\cup{ \pi}} = {\cup{ \rho}}$ implies $\pi = \rho$ for any partitions $\pi$ and $\rho$ in $\operatorname{dom}{ \theta}$ .
$\operatorname{dom}{ \theta} = \bigcup_{i=0}^\infty \operatorname{dom}_{ i}{ \theta}$

The domain of a scope of relevance is all relevant questions. Each real value assigned to a question is a degree of relevance. The second condition in the definition means the partitions (questions) are dividing the probability space $\Omega$ in a nested hierarchical manner. Or in other words, either the event of all outcomes ( $\Omega$ ) or the event of an answer to a question is the event under which another question can be asked.

Scope product

$\begin{aligned} \pi \times \rho & := \{ A \cap B : A \in \pi, B \in \rho \} \\ \pi \times B & := \{ A \cap B : A \in \pi \} \\ \end{aligned}$

Consider any two scopes $\theta$ and $\phi$ sharing probability space $\Omega$ . Define $\theta \times \phi := \bigcup_{i=0}^\infty \bigcup_{\substack{\pi \in \operatorname{dom}_{ i}{ \theta} \\ \rho \in \operatorname{dom}_{ i}{ \phi} \\ ({\cup{ \pi}}) \cap ({\cup{ \rho}}) \not= \emptyset }} \mathcal{C}(\pi, \rho)$ where $\mathcal{C}(\pi, \rho) = \begin{cases} \{ (\pi \times \rho) \mapsto \theta(\pi) \} & \text{ when } \theta(\pi) = \phi(\rho) \\ \left\{ \pi \mapsto \theta(\pi) \right\} \cup \bigcup_{A \in \pi} \left\{ \rho \times A \mapsto \phi(\rho) \right\} & \text{ when } \theta(\pi) > \phi(\rho) \\ \left\{ \rho \mapsto \phi(\rho) \right\} \cup \bigcup_{A \in \rho} \left\{ \pi \times A \mapsto \theta(\pi) \right\} & \text{ when } \theta(\pi) < \phi(\rho) \\ \end{cases}$

Theorem (Scope product is a scope of relevance)

For any two scopes of relevance $\theta$ and $\phi$ sharing probability space $\Omega$ , $\theta \times \phi$ is also a scope of relevance.

Proof

TBD

Distribution Decomposition

Given a set of random variables $\{ Z_i \}_{i \in I}$ with countable index set $I$ , and probability space $\Omega$ , for each $i \in I$ , define functions ${B}_{ i}(s)$ and $c_i(s)$ over strings of alphabet $\{ \mathtt{0}, \mathtt{1} \}$ [3].

${B}_{ i}(\epsilon) := \Omega$ For any string $s$ with ${ \operatorname{P}({ {B}_{ i}(s) }) > 0}$ , $\begin{aligned} c_i(s) & := \operatorname{E}\!\left({ Z_i | {B}_{ i}(s) }\right) \\ {B}_{ i}(s \mathtt{0}) & := {B}_{ i}(s) \cap \{ Z_i < c_i(s) \} \\ {B}_{ i}(s \mathtt{1}) & := {B}_{ i}(s) \cap \{ Z_i > c_i(s) \} \\ \end{aligned}$ otherwise ${B}_{ i}(s\ell) := \emptyset$

A new probability space $\Omega'$ is defined by extending $\Omega$ such that for each $i \in I$ and string $s \in \{ \mathtt{0}, \mathtt{1} \}^*$ with $\operatorname{P}({ {B}_{ i}(s)}) - \operatorname{P}({ \{Z_i = c_i(s)\} }) > 0$ , there are new disjoint events $C_{i,0}(s)$ and $C_{i,1}(s)$ such that $\{Z_i = c_i(s)\} = C_{i,0}(s) \cup C_{i,1}(s)$ and for each $\ell \in \{ \mathtt{0}, \mathtt{1} \}$ and all events $S$ in $\Omega$ $\operatorname{P}({ S \cap C_{i,\ell}(s) }) = \operatorname{P}({ S \cap \{Z_i = c_i(s)\} }) \frac{ \operatorname{P}({ {B}_{ i}(s\ell)}) }{ \operatorname{P}({ {B}_{ i}(s)})-\operatorname{P}({ \{Z_i = c_i(s)\} }) }$ Define $\begin{aligned} B'_i(s) & := {B}_{ i}(s) \cup C_{i,\ell}(s) \\ v_i(s) & = \sum_{\ell \in \{ \mathtt{0}, \mathtt{1} \}} \operatorname{P}({ {B}_{ i}(s\ell)|{B}_{ i}(s)}) \left( c_i(s\ell)-c(0) \right)^2 \\ h_i(s) & = \sum_{\ell \in \{ \mathtt{0}, \mathtt{1} \}} h({B}_{ i}(s\ell)|{B}_{ i}(s)) \\ \theta_i & := \left\{ \{B'_i(s\mathtt{0}),B'_i(s\mathtt{1})\} \mapsto v_i(s)/h_i(s) : \operatorname{P}({ {B}_{ i}(s)}) > 0 \right\} \\ \end{aligned}$

The scope product across all $\theta_i$ is the distribution decomposition of $\{ Z_i \}_{i \in I}$ .

TODO: scope product across countable index set

Scoped Entropy

Given a scope of relevance $\theta$ , let:

$\begin{aligned} \mathcal{A}_\theta(0) & := \{ \Omega \} \\ \mathcal{A}_\theta(i+1) & := \left\{ A \cap B : A \in \mathcal{A}_\theta(i), B \in \pi, \pi \in \operatorname{dom}_{ i}{ \theta} \right\} \\ h(A|B) & := \operatorname{P}({ A|B}) \log_2\frac{1}{\operatorname{P}({ A|B})} \\ \operatorname{U}_{ \pi}(Q) & := \operatorname{P}({ {\cup{ \pi}}|Q}) \sum_{S \in \pi} h(S|{\cup{ \pi}} \cap Q) \\ \operatorname{U}_{ \theta}(Q) & := \sum^\infty_{i = 0} \sum_{A \in \mathcal{A}_\theta(i)} \operatorname{P}({ A|Q}) \sum_{\pi \in \operatorname{dom}_{ i}{ \theta}} \theta(\pi) \operatorname{U}_{ \pi}(A \cap Q) \\ \operatorname{U}_{ \theta}(\pi) & := \sup \left\{ \sum_{Q \in \rho} \operatorname{P}({ Q}) \operatorname{U}_{ \theta}(Q) : \text{ finite partition } \rho \underset{\text{(or equal to)}}{\text{ coarser than }} \pi \right\} \\ \operatorname{V}_{ \theta}({ Q}) & := \operatorname{U}_{ \theta}(\Omega) - \operatorname{U}_{ \theta}(Q) \\ \operatorname{V}_{ \theta}({ \pi}) & := \operatorname{U}_{ \theta}(\{\Omega\}) - \operatorname{U}_{ \theta}(\pi) \\ \ker{X} & := \left\{ \{ \omega \in \Omega : X(\omega)=v \} : v \text{ in the range of } X \right\} \\ \operatorname{U}_{ \theta}(X) & := \operatorname{U}_{ \theta}(\ker{X}) \\ \operatorname{V}_{ \theta}({ X}) & := \operatorname{V}_{ \theta}({ \ker{X}}) \\ \end{aligned}$

Shannon Entropy Equality

Given any finite random object $X$ , a unirelevant decomposition is the trivial mapping of $X$ to a scope $\theta$ assigning $1$ to a single partition consisting of the events for each value taken by $X$ : $\begin{aligned} \operatorname{dom}{ \theta} & = \{\ker{X}\} \\ \theta(\ker{X}) & = 1 \\ \end{aligned}$

Theorem: Unirelevant Scoped Entropy equals Shannon Entropy

Given any finite random objects $X$ and $Y$ and scope $\theta$ equal to the unirelevant decomposition of $X$ , $\operatorname{U}_{ \theta}(Y) = \operatorname{H}({ X|Y})$ It follows as a corollary that $\operatorname{V}_{ \theta}({ X}) = \operatorname{H}({ X})$

Proof

Consider any single partition scope $\theta = \{\pi \mapsto 1 \}$ . Both $\mathcal{A}_\theta(i)$ and $\operatorname{dom}_{ i}{ \theta}$ are only non-empty at $i=0$ and equal $\{\Omega\}$ and $\{\pi\}$ respectively. Thus by definition $\begin{aligned} \operatorname{U}_{ \theta}(Y) & = \sum_{Q \in \ker{Y}} \operatorname{P}({ Q}) \operatorname{P}({ \Omega|Q}) \theta(\pi) \operatorname{P}({ {\cup{ \pi}}|\Omega,Q}) \sum_{B \in \pi} h(B|{\cup{ \pi}},\Omega,Q) \\ & = \sum_{Q \in \ker{Y}} \operatorname{P}({ Q}) \sum_{B \in \pi} h(B|Q) \\ & = \sum_{Q \in \ker{Y}} \sum_{B \in \pi} \operatorname{P}({ Q \cap B}) \log_2\frac{1}{\operatorname{P}({ B|Q})} \\ & = \operatorname{H}({ X|Q}) \\ \end{aligned}$ Proof of the corollary is: $\operatorname{V}_{ \theta}({ X}) = \operatorname{U}_{ \theta}(\Omega) - \operatorname{U}_{ \theta}(X) = \operatorname{H}({ X|\{\Omega\}}) - \operatorname{H}({ X|X}) = \operatorname{H}({ X})$

Benefit of scoped entropy over variance

A feature analogous to mutual information is the following:

For any random variable $X$ and random object $Y$ of finitely many values, $\begin{array}{lcr} \operatorname{V}_{ \mathfrak{d}(X)}({ Y}) = 0 & \text{ if and only if } & \text{ $X$ and $Y$ are independent} \end{array}$

References

Shannon CE, Weaver W. The mathematical theory of communication. Urbana: Univ. of Illinois Press; 1998.

Ash RB, Doléans-Dade C, Ash RB. Probability and measure theory. 2nd ed. San Diego: Harcourt/Academic Press; 2000.

Hopcroft JE, Ullman JD. Introduction to automata theory, languages, and computation. Reading, Mass: Addison-Wesley; 1979.

Castedo Ellerman

Scoped Entropy

Abstract

Entropy Scope of Relevance

Scope product

Theorem (Scope product is a scope of relevance)

Distribution Decomposition

Scoped Entropy

Shannon Entropy Equality

Theorem: Unirelevant Scoped Entropy equals Shannon Entropy

Benefit of scoped entropy over variance

References