Deduce estimator of lineal admixture time based on simple class of haploid lineage processes
Introduction
Lineal admixture time[1][2] is a microscale measure of admixture timing.
In this document, we derive an estimator of the average lineal admixture time of a
population.
This derivation is based on a simple class of
haploid lineage processes[2].
Notation
N0 represents the non-negative integers (0, 1, 2, 3, …).
f:D↦I denotes a function f maps domain D to image I.
Simple Model
We consider a simple model of population migration and reproduction as follows:
geographic “islands” of single interior zone and multiple peripheral isolated zones,
constant expected flow of migrants from isolated zones to interior zone,
each time step consists of migration followed by mating, and
mating is random within each zone (following migration).
Haploid Lineage Process
We use a haploid lineage process [2] to mathematically model
population migration and reproduction. The motivation for this mathematical model
is to analytically derive an estimator of lineal admixture time.
A haploid lineage process is defined in terms of a fertilization function.
We choose a fertilization function Fert whose image of time points Tim is the set
of integers Z.
Formally, geography is modeled via a function Geo:Dip↦N0
which maps diploids to the geographic zone in which they were fertilized.
Zero indexes the interior zone where admixture can occur.
Positive integers index the isolated geographic zones
where only non-admixed diploids are found.
ISSUE:
Need to justify why admixture time with the following categorization function
makes sense.
Similarly, we define a categorization function Cat:Dip↦N0
for lineal admixture time where zero indexes admixed diploids
and positive integers index non-admixed diploids.
Non-admixed diploids of having matching category and isolated zone location:
Geo(d)=0 implies Geo(d)=Cat(d)
for all d∈Dip.
ISSUE:
Relying on zero to representing interior zone does not seem explicit enough.
ISSUE:
The following math formal details should move into the math definition of lineal
admixture time.
We only consider haploid lineage processes which satisfy the following
requirements regarding the categorization function for lineal admixture time [3]:
For all child haploids (d,s)∈domPar if Cat(d)=0 then Cat(Par((d,s)))=Cat(d), and
if Cat(Par((d,0)))=Cat(Par((d,1))) then Cat(d)=Cat(Par((d,0))).
Random individuals and lineages
We model the population of interior zone individuals living at time t as
Popt:={d:(d,s)∈domPart−1 and Geo(d)=0}
since domPart−1 is the set of children fertilized one time step
prior when the previous generation was living.
We define Δt to be a random variable which is any member of Popt
with equal probability. Formally, given outcome space Ω,
for every t∈Tim, ω∈Ω, and diploid d∈Popt(ω),
P({ω′:Δt(ω′)=d}∣{ω′:Popt(ω′)=Popt(ω)})=∣Popt(ω)∣−1
We define S∈{0,1} to be a Bernoulli random variable representing a random
gamete or sex.
We define Λ∈Loc to be a random genomic location.
Formal model assumptions
Formally, the mathematical assumptions are:
proportion αi from i-th ancestral isolated populations,
immigration such that ϕ of the interior population is new non-admixed immigrants
stationary distribution of lineal admixture times per generation.
Generations are non-overlapping:
for all t∈Tim, h∈Popt×{0,1},
Par(h)∈Popt−1 .
We assume that all mating occurs within a single geographic zone:
Geo(Par((d,0)))=Geo(Par((d,1))))
for all d∈Dip.
P(Geo(Par((Δt,0)))=0)=1−ϕ
and for all isolated zones i>0,
P(Geo(Par((Δt,0)))=i)=ϕαi .
Main Result
Formal notation
We define random lineal admixture time at time t as
Mt:=Latt(Lin(Λ,(Δt,S)))
Base Facts
For i>0,
P(Cat(Δt)=i)=ϕαi+(1−ϕ)P(Cat(Δt−1)=i)2
From the definition of lineal admixture time
E[Mt+1]=(E[Mt∣Mt>0]+1)P{Mt>0}2(1−ϕ)+2(21E[Mt∣Mt>0]+1)P(Mt>0)P(Mt=0)(1−ϕ)+(P(Mt=0)2−i∑P(Cat(Δt)=i)2)(1−ϕ)
Derivation
Given the assumptions of stationarity, we can define:
Let xi:=P(Cat(Δt)=i) so that
xi=ϕαi+(1−ϕ)xi2 .
By theorem 1, the only quadratic solution for xi is
xi=2(1−ϕ)1−1−4ϕ(1−ϕ)αi
Let q:=P(Mt=0), thus
q=ϕ+(1−ϕ)i∑xi2
We define
μ:=E[Mt]
which is the expected lineal admixture time (and generation number).
Given the base facts, we make the following deduction using the newly
defined variables μ, ϕ and xi.
μ0μ=(E[Mt∣Mt>0]+1)(1−q)2(1−ϕ)+2(21E[Mt∣Mt>0]+1)(1−q)q(1−ϕ)+(q2−i∑xi2)(1−ϕ)=μ(1−q)(1−ϕ)+(1−q)2(1−ϕ)+μq(1−ϕ)+2(1−q)q(1−ϕ)+(q2−i∑xi2)(1−ϕ)=μ(1−ϕ)+((1−q)+q)2(1−ϕ)−(1−ϕ)i∑xi2=−μϕ+1−q=ϕ1−q
Replacing q gets
μ=ϕ1−ϕ(1−i∑xi2)
We conjecture that this formula serves as a consistent maximum likelihood
estimator.
Estimation of ϕ
Let α¨i,j denote the frequency of a diploid genotype
with an i-th maternal ancestral source and j-ith paternal ancestral
source.
This form is the same as the inbreeding coefficient but with
ancestral source as the allele state rather than haplotype.
Theorem 1
The solution to xi given stationarity etc…
can not be
xi=2(1−ϕ)1+1−4ϕ(1−ϕ)αi
when αi<1.
PROOF
Assume the contrary. Since ϕ<1 and xi≤1, we have
12(1−ϕ)1−2ϕ1−4ϕ+4ϕ2−4ϕ(1−ϕ)1≥2(1−ϕ)1+1−4ϕ(1−ϕ)αi≥1+1−4ϕ(1−ϕ)αi≥1−4ϕ(1−ϕ)αi≥1−4ϕ(1−ϕ)αi≥−4ϕ(1−ϕ)αi≤αi
which can not be true given αi<1.