Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Biostatistical Theory and Models for Survival Analysis, Exercises of Biostatistics

Product Limitestimator - Survival Analysis

Typology: Exercises

2019/2020

Uploaded on 03/12/2020

richard-ngaya
richard-ngaya 🇺🇸

1 document

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
BIOS 924 Biostatistical Theory and Models for Survival Analysis
Homework 2
Due by 11:59 PM on Wednesday, 03/15/2020
Student Name: ____________________________________________
Total points: ______(out of 105)
N.B. (i) The values (in points) are shown for each problem, and for some individual parts. It is
difficult for this HW to provide the value of each portion of the individual parts, but the rule for
grading will be that the more essential work you show, the more credit you will get.
(ii) Please answer each question by adding text to this MS Word file, immediately after the
respective question, using BLUE INK. Please don’t use red ink or black ink, with one exception -
the R/SAS code and output could be in BLACK INK, but all explanations should be in BLUE INK.
(iii) For each problem (when relevant), include ALL of your R (and, if applicable, SAS) code, but
only the relevant parts of the R (and/or SAS) output.
(iv) When saving this file (before submitting it on Canvas), include your LAST_NAME, HW#, and
DATE in the file name (for us not to confuse your work with someone else’s).
Example: DOE-HW#2-031520
(v) You could use LaTeX (rather than MS Word) as your text editor, but you should provide your
original LaTeX file, in addition to a PDF.
1. [15 pts] Suppose that a continuous, positive random variable T has a hazard function defined by
h(t) = hj when aj-1 t < aj, j = 1, 2, …, k,
where a0 = 0 and ak = ∞, a0 < a1 < … < ak and hj > 0, j = 1, 2, …, k.
(a) Express S(t), the survivor function of T, in terms of the hj’s.
(b) Express f(t), the probability density function of T, in terms of the hj’s.
(c) If you observe a random sample of size n of the form (t1, δ1), …, (tn, δn), where δi = 1 if ti is a
failure time, and 0 if ti is a right-censoring time, derive the maximum likelihood estimates of the
hj’s and, hence, the maximum likelihood estimate of S(t).
(d) What happens to the maximum likelihood estimate of S(t) as k becomes “large” and the distances
aj aj-1 get “small”?
SOLUTION:
2. [15 pts] (a) Formulate the Cochran-Mantel-Haenszel (CMH) test.
(b) The log-rank test (sometimes called Cox-Mantel test) for comparison of two survivor curves is
obtained by constructing a 2x2 table at each distinct death time, and comparing the death rates
pf3
pf4

Partial preview of the text

Download Biostatistical Theory and Models for Survival Analysis and more Exercises Biostatistics in PDF only on Docsity!

BIOS 924 Biostatistical Theory and Models for Survival Analysis Homework 2 Due by 11:59 PM on Wednesday, 03/15/ Student Name: ____________________________________________ Total points: ______(out of 105) N.B. (i) The values (in points) are shown for each problem, and for some individual parts. It is difficult for this HW to provide the value of each portion of the individual parts, but the rule for grading will be that the more essential work you show, the more credit you will get. (ii) Please answer each question by adding text to this MS Word file, immediately after the respective question, using BLUE INK. Please don’t use red ink or black ink, with one exception - the R/SAS code and output could be in BLACK INK, but all explanations should be in BLUE INK. (iii) For each problem (when relevant), include ALL of your R (and, if applicable, SAS) code, but only the relevant parts of the R (and/or SAS) output. (iv) When saving this file (before submitting it on Canvas), include your LAST_NAME, HW#, and DATE in the file name (for us not to confuse your work with someone else’s). Example: DOE-HW#2- (v) You could use LaTeX (rather than MS Word) as your text editor, but you should provide your original LaTeX file, in addition to a PDF.

1. [ 15 pts] Suppose that a continuous, positive random variable T has a hazard function defined by h ( t ) = hj when aj- 1 ≤ t < aj , j = 1, 2, …, k , where a 0 = 0 and ak = ∞, a 0 < a 1 < … < ak and hj > 0, j = 1, 2, …, k. (a) Express S ( t ), the survivor function of T , in terms of the hj ’s. (b) Express f ( t ), the probability density function of T , in terms of the hj ’s. (c) If you observe a random sample of size n of the form ( t 1 , δ 1 ), …, ( tn , δn ), where δi = 1 if ti is a failure time, and 0 if ti is a right-censoring time, derive the maximum likelihood estimates of the hj ’s and, hence, the maximum likelihood estimate of S ( t ). (d) What happens to the maximum likelihood estimate of S ( t ) as k becomes “large” and the distances ajaj -1 get “small”? SOLUTION: 2. [ 15 pts] (a) Formulate the Cochran-Mantel-Haenszel (CMH) test. (b) The log-rank test (sometimes called Cox-Mantel test) for comparison of two survivor curves is obtained by constructing a 2x2 table at each distinct death time, and comparing the death rates

between the two groups, conditioned on the number at risk in the groups. The tables are then combined using the Cochran-Mantel-Haenszel test. Thus, if the j th table (with fixed marginals) is Die/Fail Group Yes No Total 0 d 0 j r 0 jd 0 j r 0 j 1 d 1 j r 1 jd 1 j r 1 j Total dj rjdj rj where the d ’s are the numbers of dead at the particular death time and the r ’s are the numbers at risk at the particular death time, write down the log-rank test statistic in terms of the d ’s and the r ’s and the number of death times. (c) What is the distribution of the log-rank test statistic? Why? What assumptions are necessary for its validity? Do censoring patterns matter for its validity? (d) Based on the motivation for the log-rank test, which of the survival-related quantities (survivor functions, hazard functions, or cumulative hazard functions) are we comparing at each death time point? (e) Does the log-rank test depend on the concrete event times or, rather, on what? [HINT: Log-what test?] (f) Analogous to the CMH test for a series of tables at different levels of a confounder, the log-rank test is most powerful when “odds ratios” are constant over time intervals. That is, it is most powerful for “proportional hazards”. What to do when the proportional hazards are not present? (g) What is the variance of the log-rank test statistic? Show a formula? Where is it coming from? Comment on its derivation. SOLUTION:

3. [ 20 pts] The Kaplan-Meier (KM) estimator distributes the total mass (of 1.0 or 100%) among the different death/event times, but no masses are assigned at the (right-) censored times. That is, no jumps are present in the survivor curve at censored times, but only at event times. Assume no ties, and consider the following algorithm for mass distribution: (i) order all n observed times (both censored and event times) in an increasing order from left to right; (ii) distribute the total mass equally among all n observed times, i.e. assign 1/ n to each of the n observed times; (iii) start moving from left to right and, once reaching the leftmost censored time with non-zero mass (say, the j th censored time from left to right), remove its mass, and distribute it equally between all event and censored times that are to the right of that censored time; the reasoning for that is, since no event did occur at that censored time, it could only occur to the right of it and, most likely at one of those observed times, which are all equally likely to be the true event time for that censored observation; note that the censored time has no mass after its mass gets distributed; note also that after redistributing the mass from the first censored time, the second smallest censored time will have mass 1/ n + (1/ n )/( n - j ) = 1/ n + 1/[ n ( n - j )], which will have itself to be redistributed, according to step (iv) next. (iv) repeat step (iii) until all censored times have zero mass, and the total mass (of 1.0) is distributed to the event times only. Prove that the above algorithm is equivalent to the KM estimator.

8. [ 5 pts] Consider PROC LIFETEST in SAS/STAT 12.3, Example 52.3 Life-Table Estimates for Males with Angina Pectoris, from SAS Help and Documentation. Repeat the analyses from this example, but using R.