Introduction

Lineal admixture time [1] [2] is a microscale measure of admixture timing. In this document, we derive an estimator of the average lineal admixture time of a population. This derivation is based on a simple class of haploid lineage processes [2].

Notation

  • N0\mathbb{N}_0 represents the non-negative integers (0, 1, 2, 3, …).

  • f:DIf : D \mapsto I denotes a function ff maps domain DD to image II.

Simple Model

We consider a simple model of population migration and reproduction as follows:

  • discrete regular time steps,

  • lifespans of only one time step,

  • non-overlapping generations (like Wright-Fisher model),

  • geographic “islands” of single interior zone and multiple peripheral isolated zones,

  • constant expected flow of migrants from isolated zones to interior zone,

  • each time step consists of migration followed by mating, and

  • mating is random within each zone (following migration).

Haploid Lineage Process

We use a haploid lineage process [2] to mathematically model population migration and reproduction. The motivation for this mathematical model is to analytically derive an estimator of lineal admixture time.

A haploid lineage process is defined in terms of a fertilization function. We choose a fertilization function Fert\mathrm{Fert} whose image of time points Tim\mathrm{Tim} is the set of integers Z\mathbb{Z}.

Formally, geography is modeled via a function Geo:DipN0\mathrm{Geo}: \mathrm{Dip}\mapsto \mathbb{N}_0 which maps diploids to the geographic zone in which they were fertilized. Zero indexes the interior zone where admixture can occur. Positive integers index the isolated geographic zones where only non-admixed diploids are found.

ISSUE: Need to justify why admixture time with the following categorization function makes sense.

Similarly, we define a categorization function Cat:DipN0\mathrm{Cat}: \mathrm{Dip}\mapsto \mathbb{N}_0 for lineal admixture time where zero indexes admixed diploids and positive integers index non-admixed diploids. Non-admixed diploids of having matching category and isolated zone location: Geo(d)0 implies Geo(d)=Cat(d) \mathrm{Geo}(d) \not= 0 \text{ implies } \mathrm{Geo}(d) = \mathrm{Cat}(d) for all dDipd \in \mathrm{Dip}.

ISSUE: Relying on zero to representing interior zone does not seem explicit enough.

ISSUE: The following math formal details should move into the math definition of lineal admixture time.

We only consider haploid lineage processes which satisfy the following requirements regarding the categorization function for lineal admixture time [3]:

  1. For all child haploids (d,s)domPar(d, s) \in \operatorname{dom}\mathrm{Par} if Cat(d)0\mathrm{Cat}(d) \not= 0 then Cat(Par((d,s)))=Cat(d)\mathrm{Cat}(\mathrm{Par}((d,s))) = \mathrm{Cat}(d), and

  2. if Cat(Par((d,0)))=Cat(Par((d,1)))\mathrm{Cat}(\mathrm{Par}((d,0))) = \mathrm{Cat}(\mathrm{Par}((d,1))) then Cat(d)=Cat(Par((d,0)))\mathrm{Cat}(d) = \mathrm{Cat}(\mathrm{Par}((d,0))).

Random individuals and lineages

We model the population of interior zone individuals living at time tt as Popt:={d:(d,s)domPart1 and Geo(d)=0} \mathrm{Pop}_t := \{ d : (d,s) \in \operatorname{dom}\mathrm{Par}_{t-1} \text{ and } \mathrm{Geo}(d) = 0 \} since domPart1\operatorname{dom}\mathrm{Par}_{t-1} is the set of children fertilized one time step prior when the previous generation was living.

We define Δt\Delta_t to be a random variable which is any member of Popt\mathrm{Pop}_t with equal probability. Formally, given outcome space Ω\Omega, for every tTimt \in \mathrm{Tim}, ωΩ\omega \in \Omega, and diploid dPopt(ω)d \in \mathrm{Pop}_t(\omega), P({ω:Δt(ω)=d}{ω:Popt(ω)=Popt(ω)})=Popt(ω)1 \operatorname{\mathbb{P}}\big( \{ \omega' : \Delta_t(\omega') = d \} \mid \{ \omega' : \mathrm{Pop}_t(\omega') = \mathrm{Pop}_t(\omega) \} \big) = \left| \mathrm{Pop}_t(\omega) \right|^{-1}

We define S{0,1}S \in \{0, 1\} to be a Bernoulli random variable representing a random gamete or sex.

We define ΛLoc\Lambda \in \mathrm{Loc} to be a random genomic location.

Formal model assumptions

Formally, the mathematical assumptions are:

  • proportion αi\alpha_i from ii-th ancestral isolated populations,

  • immigration such that ϕ\phi of the interior population is new non-admixed immigrants

  • stationary distribution of lineal admixture times per generation.

Generations are non-overlapping: for all tTimt \in \mathrm{Tim}, hPopt×{0,1}h \in \mathrm{Pop}_t \times \{0, 1\}, Par(h)Popt1 . \mathrm{Par}(h) \in \mathrm{Pop}_{t-1} \text{ .}

We assume that all mating occurs within a single geographic zone: Geo(Par((d,0)))=Geo(Par((d,1)))) \mathrm{Geo}(\mathrm{Par}((d,0))) = \mathrm{Geo}(\mathrm{Par}((d,1)))) for all dDipd \in \mathrm{Dip}.

P(Geo(Par((Δt,0)))=0)=1ϕ \operatorname{\mathbb{P}}\big( \mathrm{Geo}(\mathrm{Par}((\Delta_t,0))) = 0 \big) = 1 - \phi and for all isolated zones i>0i > 0, P(Geo(Par((Δt,0)))=i)=ϕαi . \operatorname{\mathbb{P}}\big( \mathrm{Geo}(\mathrm{Par}((\Delta_t,0))) = i \big) = \phi \alpha_i \text{ .}

Main Result

Formal notation

We define random lineal admixture time at time tt as Mt:=Latt(Lin(Λ,(Δt,S))) M_t := \mathrm{Lat}_t(\mathrm{Lin}(\Lambda, (\Delta_t, S)))

Base Facts

For i>0i > 0, P(Cat(Δt)=i)=ϕαi+(1ϕ)P(Cat(Δt1)=i)2 \operatorname{\mathbb{P}}\left( \mathrm{Cat}(\Delta_t) = i \right) = \phi \alpha_i + (1 - \phi) \operatorname{\mathbb{P}}\left( \mathrm{Cat}(\Delta_{t-1}) = i \right)^2

From the definition of lineal admixture time E ⁣[Mt+1]=(E ⁣[MtMt>0]+1)P{Mt>0}2(1ϕ)+2(12E ⁣[MtMt>0]+1)P(Mt>0)P(Mt=0)(1ϕ)+(P(Mt=0)2iP(Cat(Δt)=i)2)(1ϕ) \begin{aligned} {\operatorname{E}\!\left[{ M_{t+1}}\right]} & = ({\operatorname{E}\!\left[{ M_t | M_t>0}\right]} + 1) \operatorname{\mathbb{P}}\{M_t>0\}^2 (1-\phi) \\ & + 2 \left(\frac{1}{2} {\operatorname{E}\!\left[{ M_t | M_t>0}\right]} + 1\right) \operatorname{\mathbb{P}}(M_t>0) \operatorname{\mathbb{P}}(M_t=0) (1-\phi) \\ & + \left( \operatorname{\mathbb{P}}(M_t=0)^2 - \sum_i \operatorname{\mathbb{P}}(\mathrm{Cat}(\Delta_t)=i)^2 \right) (1-\phi) \end{aligned}

Derivation

Given the assumptions of stationarity, we can define:

Let xi:=P(Cat(Δt)=i)x_i := \operatorname{\mathbb{P}}( \mathrm{Cat}(\Delta_t) = i ) so that xi=ϕαi+(1ϕ)xi2 . x_i = \phi \alpha_i + (1-\phi) x_i^2 \text{ .}

By theorem 1, the only quadratic solution for xix_i is

xi=114ϕ(1ϕ)αi2(1ϕ) x_i = \frac{ 1 - \sqrt{1 - 4 \phi (1- \phi) \alpha_i} }{ 2(1-\phi) }

Let q:=P(Mt=0)q := \operatorname{\mathbb{P}}(M_t=0), thus q=ϕ+(1ϕ)ixi2 q = \phi + (1 - \phi) \sum_i x_i^2

We define μ:=E ⁣[Mt] \mu := {\operatorname{E}\!\left[{ M_t}\right]} which is the expected lineal admixture time (and generation number).

Given the base facts, we make the following deduction using the newly defined variables μ\mu, ϕ\phi and xix_i. μ=(E ⁣[MtMt>0]+1)(1q)2(1ϕ)+2(12E ⁣[MtMt>0]+1)(1q)q(1ϕ)+(q2ixi2)(1ϕ)=μ(1q)(1ϕ)+(1q)2(1ϕ)+μq(1ϕ)+2(1q)q(1ϕ)+(q2ixi2)(1ϕ)=μ(1ϕ)+((1q)+q)2(1ϕ)(1ϕ)ixi20=μϕ+1qμ=1qϕ \begin{aligned} \mu & = ({\operatorname{E}\!\left[{ M_t | M_t>0}\right]} + 1) (1-q)^2 (1-\phi) \\ & + 2 \left(\frac{1}{2} {\operatorname{E}\!\left[{ M_t | M_t>0}\right]} + 1\right) (1-q) q (1-\phi) \\ & + \left( q^2 - \sum_i x_i^2 \right) (1-\phi) \\ & = \mu (1-q) (1-\phi) + (1-q)^2 (1-\phi) \\ & + \mu q (1-\phi) + 2 (1-q) q (1-\phi) \\ & + \left( q^2 - \sum_i x_i^2 \right) (1-\phi) \\ & = \mu (1-\phi) + ((1 - q) + q)^2 (1-\phi) - (1-\phi) \sum_i x_i^2 \\ 0 & = - \mu \phi + 1 - q \\ \mu & = \frac{1-q}{\phi} \end{aligned}

Replacing qq gets μ=1ϕϕ(1ixi2) \mu = \frac{1-\phi}{\phi} \left( 1 - \sum_i x_i^2 \right)

We conjecture that this formula serves as a consistent maximum likelihood estimator.

Estimation of ϕ\phi

Let α¨i,j\ddot{\alpha}_{i,j} denote the frequency of a diploid genotype with an ii-th maternal ancestral source and jj-ith paternal ancestral source.

Thus α¨i,i=ϕαi+(1ϕ)αi2=ϕαi(1αi)+αi2ϕ=α¨i,iαi2αi(1αi) \begin{aligned} \ddot{\alpha}_{i,i} & = \phi \alpha_i + (1-\phi) \alpha_i^2 \\ & = \phi \alpha_i (1 - \alpha_i) + \alpha_i^2 \\ \phi & = \frac{ \ddot{\alpha}_{i,i} - \alpha_i^2 }{ \alpha_i (1 - \alpha_i) } \end{aligned}

Consider the case of only two ancestral sources. With β:=α¨0,1+α¨1,0\beta := \ddot{\alpha}_{0,1} + \ddot{\alpha}_{1,0} we deduce that α¨0,0+α¨1,1=1β \ddot{\alpha}_{0,0} + \ddot{\alpha}_{1,1} = 1 - \beta α1=1α0 \alpha_1 = 1 - \alpha_0 α02+α12=12α0(1α0) \alpha_0^2 + \alpha_1^2 = 1 - 2 \alpha_0 (1 - \alpha_0)

ϕ=ϕ+ϕ2=α¨0,0+α¨1,1α02α122α0(1α0)=1β2α0(1α0) \begin{aligned} \phi & = \frac{\phi + \phi}{2} \\ & = \frac{ \ddot{\alpha}_{0,0} + \ddot{\alpha}_{1,1} - \alpha_0^2 - \alpha_1^2 }{ 2 \alpha_0 (1 - \alpha_0) } \\ & = 1 - \frac{\beta}{ 2 \alpha_0 (1 - \alpha_0) } \end{aligned}

This form is the same as the inbreeding coefficient but with ancestral source as the allele state rather than haplotype.

Theorem 1

The solution to xix_i given stationarity etc… can not be xi=1+14ϕ(1ϕ)αi2(1ϕ) x_i = \frac{ 1 + \sqrt{1 - 4 \phi (1- \phi) \alpha_i} }{ 2(1-\phi) } when αi<1\alpha_i < 1.

PROOF

Assume the contrary. Since ϕ<1\phi < 1 and xi1x_i \le 1, we have 11+14ϕ(1ϕ)αi2(1ϕ)2(1ϕ)1+14ϕ(1ϕ)αi12ϕ14ϕ(1ϕ)αi14ϕ+4ϕ214ϕ(1ϕ)αi4ϕ(1ϕ)4ϕ(1ϕ)αi1αi \begin{aligned} 1 & \ge \frac{ 1 + \sqrt{1 - 4 \phi (1- \phi) \alpha_i} }{ 2(1-\phi) } \\ 2(1-\phi) & \ge 1 + \sqrt{1 - 4 \phi (1- \phi) \alpha_i} \\ 1- 2 \phi & \ge \sqrt{1 - 4 \phi (1- \phi) \alpha_i} \\ 1 - 4 \phi + 4 \phi^2 & \ge 1 - 4 \phi (1- \phi) \alpha_i \\ - 4 \phi (1- \phi) & \ge - 4 \phi (1- \phi) \alpha_i \\ 1 & \le \alpha_i \\ \end{aligned} which can not be true given αi<1\alpha_i < 1.

References

1.
Ellerman EC. Lineal admixture time: An interdisciplinary definition. 2023. Available: https://perm.pub/D9qSdCY6GPrxthT3ZnFouEU35ow
2.
Ellerman EC. Haploid lineage process. Available: https://castedo.com/doc/153
3.
Ellerman EC. Lineal admixture time: A mathematical definition. Available: https://castedo.com/doc/cP