



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
This is the Exam of Statistical Science which includes Stochastic Differential Equation, Brownian Motion, Solution, Measurable Function, Markov Process, Starting, Bounded Functions, Local Martingale, First Time etc. Key important points are: Statistical and Population Genetics, Problem, Coalescent Process, Approximates, Population, Generations,, Segment Per Generation, Number of Mutations, Coalescent Tree, Expected Value
Typology: Exams
1 / 5
This page cannot be seen from the preview
Don't miss anything!
Friday 4 June, 2004 13:30 to 15:
Attempt THREE questions. There are four questions in total.
The questions carry equal weight.
1 This problem concerns the coalescent process that approximates the evolution of the ancestry of a sample of n chromosomal segments from a population of large but constant size N. Time is measured in units of N generations, and θ = 2N u is the mutation parameter, u being the mutation rate per segment per generation. The effects of recombination in the segment may be ignored. Let S be the number of mutations that occur on the coalescent tree of the sample.
(i) Show that the expected value of S is given by
IES = θ
n∑− 1
i=
i
(ii) Find an expression for the variance of S.
(iii) Using the result of (i), write down an unbiassed estimator θˆ (say) of θ, and show that it is asymptotically consistent as n → ∞.
(iv) For j = 2, 3 ,... , n, let Yj be the number of mutations that arise on the coalescent tree while there are j distinct ancestors of the sample. Show that the distribution of Yj is geometric.
Note: if X has a Poisson distribution with parameter λ, then the probability generating function of X is
IEsX^ = e−λ(1−s), 0 6 s 6 1.
(v) Using (iv) or otherwise, establish that the quantity
log n(θˆ − θ) is asymptotically Normally distributed as n → ∞ , and identify the variance.
(vi) What are the practical implications of the result in (v)?
2 One of the major problems in statistical and population genetics is to understand linkage disequilibrium (LD). Write an essay on this topic. You should include brief descriptions of the patterns of LD across chromosome 21, the ancestral recombination graph, its role in understanding LD, and its role in fine-scale mapping.
Statistical and Population Genetics
gf gm
gc
The above diagram shows a pedigree drawing for a trio consisting of a father, mother and affected child, with genotypes at a single genetic locus denoted gf , gm, gc respectively.
In a genetic association study, twelve such families are collected with genotypes as tabulated below (where ‘?’ denotes unknown genotypes).
F amily gf gm gc
1 2 / 2 1 / 1 1 / 2 2 1 / 2 1 / 2 1 / 1 3 1 / 2 2 / 2 1 / 2 4 1 / 2 1 / 2 1 / 2 5 1 / 1 2 / 2 1 / 2 6 1 / 2 1 / 2 1 / 1 7 1 / 2 ?/? 1 / 2 8 1 / 2 1 / 2 2 / 2 9 1 / 2 1 / 2 1 / 1 10 1 / 2 ?/? 1 / 1 11 1 / 2 1 / 2 1 / 2 12 2 / 2 1 / 2 1 / 2
i) For each family, calculate the contribution that it would make to the cells of the following transmission table and thus calculate the values of the counts a, b, c, d in the table.
Transmitted allele Untransmitted allele
1 2 1 a b 2 c d
ii) Calculate the value of transmission disequilibrium test (TDT) from this table. Is there any evidence for genetic association? (You may need to know that the percentage points for the upper 5% level are 1.64 for the standard normal distribution and 3.84 for a χ^2 distribution on 1df).
Statistical and Population Genetics
iii) Convert the data in the transmission table to the following table based on unmatched transmissions:
Marker allele Transmitted Untransmitted
1 w y 2 x z
and use the data in the cells of this table to calculate the haplotype relative risk (GHRR) odds ratio, and test of association.
iv) Prove that for such a trio, the probability of the child’s genotypes, given the parents’ genotypes and the event (D) that the child is affected with disease P (gc|gf , gm, D) may be written as
P (gc|gm, gf , D) =
Rgc ∑ g∗∈G Rg∗
where Rg is the relative risk for genotype g relative to some arbitrary baseline genotype, and the sum in the denominator is over the set G of the four possible offspring genotypes that the parents can produce.
v) Thus prove that the likelihood contribution from the 10 families with both parents genotyped is R^31 / 1 R^41 / 2 R 2 / 2 (R 1 / 1 + 2R 1 / 2 + R 2 / 2 )^6 (R 1 / 2 + R 2 / 2 )^2
Statistical and Population Genetics