The gametic entropy of populations

WORK IN PROGRESS

Introduction

An early application of Shannon entropy [1] in population genetics is the famous article “The Apportionment of Human Diversity” by Lewontin in 1972 [2]. Shannon entropy is a measure of information, expressed in units of bits (base 2 logarithm of probability). The idea of genetic information being stored and quantified in bits appears in the context of evolutionary genetics as early as 1961 [3].

Eight years after Lewontin’s article, BDH Latter wrote:

“The Shannon information measure used by Lewontin (1972) … is extremely difficult to interpret genetically.” [4]

Inversely, CT Bergstrom & M Rosvall have written that genetic information lacks an obvious interpretation as communication measured by Shannon entropy:

“Geneticists are not so fortunate. For them, the analogy to communication theory is less obvious. Efforts to make this analogy explicit seem forced at best, …” [5]

This document explains an interpretation and names it the gametic entropy of a population. The goal of this document is to provide population geneticists an easy interpretation and obvious example of communication measured by Shannon entropy.

The Interpretation

The gametic entropy of a population is:

The amount of information transmitted by a gamete in the propagation of a population.

In the specific application of the 1972 Lewontin article [2], the interpretation is:

The amount of information transmitted by a gamete, at a random locus, in the propagation of a population.

Or equivalently: “a population’s gametic entropy per random locus”.

Before getting into the biology, we review the core concepts upon which Shannon entropy is based. We follow by observing where these concepts appear in the propagation of populations. Only sexually reproducing populations are discussed initially. Clonal populations will be discussed later as a degenerate special case.

Messages, senders, and receivers

Shannon entropy is the key measurement of information theory [1]. Entropy measures the degree of uncertainty on what messages a sender communicates to a receiver. Entropy measures a real tangible minimum capacity required of any channel or storage used to communicate those messages. This minimum requirement applies to all channel and storage mechanisms, no matter the physical mechanism.

Roughly speaking, entropy is the best possible “score” in a game of “20 questions”. That is the minimum number of yes/no questions that can be asked to determine a yet to be known message. One bit is the quantity of information gained (uncertainty reduced) by one yes/no question about two equally likely possibilities.

In population genetics, a key process of interest is the propagation of populations. In this process, what are the messages, senders and receivers?

Gametic messages

The propagation of a population requires new members to be born. For sexually reproducing species, a new birth requires genetic information to be passed down from a mother and a father in the population. Each half of this genetic information is stored and transmitted by a gamete. It is at the unit of a gamete that we clearly see a complete message, sent by each of the two parents to new offspring, the receiver.

For a next possible birth of a population, there is a probabilistic distribution of possible genetic messages carried by each gamete. The Shannon entropy of the distribution of possible messages is the gametic entropy of the population. One can roughly think of gametic entropy as the number of yes/no questions offspring need to ask per parent to find out what alleles they are to inherit: “Hey parent, do I get an Rh+ or an Rh- allele from you?”

When only looking at the entropy of autosomal genetic information, the distinction between maternal and paternal does not matter, but when considering the information in sex chromosomes, the gametic entropy can be specifically maternal or paternal. The unqualified gametic entropy is a per gamete entropy and thus the average of both maternal and paternal gametic entropies, since every birth requires one of each kind of gametic message.

Simple illustrations

Pure Unicorns

We consider a simple example of a sexually reproducing population with the lowest possible gametic entropy. This population consists of “pure unicorns” which are all genetically identical except for males and females having XY and XX chromosomes, respectively. The maternal gametic entropy is zero because there is no uncertainty about what genetic information is communicated. But the paternal gametic entropy is 1 bit since there is equal chance of an X or a Y chromosome being communicated. Thus the gametic entropy of the pure unicorn population is \(0.5\), the average of the parent-specific entropies.

Linkage Disequilibrium

We now consider a impure unicorn population with an Rh gene and a pigmentation gene. We assume that both genes have two equally likely alleles. If these two genes are autosomal and not linked then the gametic entropy of the population will be \(2.5\). If these two genes are completely linked then the gametic entropy of the population will be \(1.5\).

Interplanetary IVF (in vitro fertilization)

Reflecting on DNA sequencing technology, maternal spindle transfer in “three-parent” IVF [6] [7] and artificial synthesis of an entire (baterial) genome [8], it is not hard to imagine a possibility of completely dematerialized transmission of the genetic information carried by gametes. This transmission can be over any of the channels described by information theory, including telegraph schemes.

We now can propose an answer to the following question posed by CT Bergstrom & M Rosvall:

“Is there a clean mapping from informational processes in biology onto the telegraph schema?” [9]

The practical utility of such telegraphy, if any, is hard to imagine today. But as an entertaining mind experiment in science fiction, we can image the utility of interplanetary transmission of the genetic information in donor gametes for IVF. Rather than physically transporting gametes between planets, which could take months to years, the pure data can be transmitted in minutes or hours. All of the facts about transmission rates in telegraphy apply in a hypothetical interplanetary IVF system. For a given population of equally likely possible donors, the maternal (paternal) gametic entropy, is the best possible rate, measures in bits, for transmitting the genetic information of a egg (sperm) donor.

Degenerate case of clonal populations

In the case of asexually reproducing clonal populations, there is no distinction between the genetic information in a gamete vs a parent. The “gametic message” is simply the entire genome of a parent, which is sent in its entirety to its single-parent offspring. The gametic entropy of a clonal population is the same as the entropy of the distribution of distinct genomes in the population.

References

1. Shannon CE, Weaver W (1998) The mathematical theory of communication. Univ. of Illinois Press, Urbana

2. Lewontin RC (1972) The Apportionment of Human Diversity. In: Dobzhansky T, Hecht MK, Steere WC (eds) Evolutionary Biology. Springer US, New York, NY, pp 381–398

3. Kimura M (1961) Natural selection as the process of accumulating genetic information in adaptive evolution. Genetical Research 2:127–140. https://doi.org/10.1017/S0016672300000616

4. Latter BDH (1980) Genetic Differences Within and Between Populations of the Major Human Subgroups. The American Naturalist 116:220–237. https://doi.org/10.1086/283624

5. Bergstrom CT, Rosvall M (2011) The transmission sense of information. Biology & Philosophy 26:159–176. https://doi.org/10.1007/s10539-009-9180-z

6. Amato P, Tachibana M, Sparman M, Mitalipov S (2014) Three-parent in vitro fertilization: Gene replacement for the prevention of inherited mitochondrial diseases. Fertility and Sterility 101:31–35. https://doi.org/10.1016/j.fertnstert.2013.11.030

7. Zhang J, Liu H, Luo S, et al (2017) Live birth derived from oocyte spindle transfer to prevent mitochondrial disease. Reproductive BioMedicine Online 34:361–368. https://doi.org/10.1016/j.rbmo.2017.01.013

8. Gibson DG, Glass JI, Lartigue C, et al (2010) Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome. Science 329:52–56. https://doi.org/10.1126/science.1190719

9. Bergstrom CT, Rosvall M (2011) Response to commentaries on “The Transmission Sense of Information”. Biology & Philosophy 26:195–200. https://doi.org/10.1007/s10539-011-9257-3