


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Product Limitestimator - Survival Analysis
Typology: Exercises
1 / 4
This page cannot be seen from the preview
Don't miss anything!
BIOS 924 Biostatistical Theory and Models for Survival Analysis Homework 2 Due by 11:59 PM on Wednesday, 03/15/ Student Name: ____________________________________________ Total points: ______(out of 105) N.B. (i) The values (in points) are shown for each problem, and for some individual parts. It is difficult for this HW to provide the value of each portion of the individual parts, but the rule for grading will be that the more essential work you show, the more credit you will get. (ii) Please answer each question by adding text to this MS Word file, immediately after the respective question, using BLUE INK. Please don’t use red ink or black ink, with one exception - the R/SAS code and output could be in BLACK INK, but all explanations should be in BLUE INK. (iii) For each problem (when relevant), include ALL of your R (and, if applicable, SAS) code, but only the relevant parts of the R (and/or SAS) output. (iv) When saving this file (before submitting it on Canvas), include your LAST_NAME, HW#, and DATE in the file name (for us not to confuse your work with someone else’s). Example: DOE-HW#2- (v) You could use LaTeX (rather than MS Word) as your text editor, but you should provide your original LaTeX file, in addition to a PDF.
1. [ 15 pts] Suppose that a continuous, positive random variable T has a hazard function defined by h ( t ) = hj when aj- 1 ≤ t < aj , j = 1, 2, …, k , where a 0 = 0 and ak = ∞, a 0 < a 1 < … < ak and hj > 0, j = 1, 2, …, k. (a) Express S ( t ), the survivor function of T , in terms of the hj ’s. (b) Express f ( t ), the probability density function of T , in terms of the hj ’s. (c) If you observe a random sample of size n of the form ( t 1 , δ 1 ), …, ( tn , δn ), where δi = 1 if ti is a failure time, and 0 if ti is a right-censoring time, derive the maximum likelihood estimates of the hj ’s and, hence, the maximum likelihood estimate of S ( t ). (d) What happens to the maximum likelihood estimate of S ( t ) as k becomes “large” and the distances aj – aj -1 get “small”? SOLUTION: 2. [ 15 pts] (a) Formulate the Cochran-Mantel-Haenszel (CMH) test. (b) The log-rank test (sometimes called Cox-Mantel test) for comparison of two survivor curves is obtained by constructing a 2x2 table at each distinct death time, and comparing the death rates
between the two groups, conditioned on the number at risk in the groups. The tables are then combined using the Cochran-Mantel-Haenszel test. Thus, if the j th table (with fixed marginals) is Die/Fail Group Yes No Total 0 d 0 j r 0 j ‒ d 0 j r 0 j 1 d 1 j r 1 j ‒ d 1 j r 1 j Total dj rj ‒ dj rj where the d ’s are the numbers of dead at the particular death time and the r ’s are the numbers at risk at the particular death time, write down the log-rank test statistic in terms of the d ’s and the r ’s and the number of death times. (c) What is the distribution of the log-rank test statistic? Why? What assumptions are necessary for its validity? Do censoring patterns matter for its validity? (d) Based on the motivation for the log-rank test, which of the survival-related quantities (survivor functions, hazard functions, or cumulative hazard functions) are we comparing at each death time point? (e) Does the log-rank test depend on the concrete event times or, rather, on what? [HINT: Log-what test?] (f) Analogous to the CMH test for a series of tables at different levels of a confounder, the log-rank test is most powerful when “odds ratios” are constant over time intervals. That is, it is most powerful for “proportional hazards”. What to do when the proportional hazards are not present? (g) What is the variance of the log-rank test statistic? Show a formula? Where is it coming from? Comment on its derivation. SOLUTION:
3. [ 20 pts] The Kaplan-Meier (KM) estimator distributes the total mass (of 1.0 or 100%) among the different death/event times, but no masses are assigned at the (right-) censored times. That is, no jumps are present in the survivor curve at censored times, but only at event times. Assume no ties, and consider the following algorithm for mass distribution: (i) order all n observed times (both censored and event times) in an increasing order from left to right; (ii) distribute the total mass equally among all n observed times, i.e. assign 1/ n to each of the n observed times; (iii) start moving from left to right and, once reaching the leftmost censored time with non-zero mass (say, the j th censored time from left to right), remove its mass, and distribute it equally between all event and censored times that are to the right of that censored time; the reasoning for that is, since no event did occur at that censored time, it could only occur to the right of it and, most likely at one of those observed times, which are all equally likely to be the true event time for that censored observation; note that the censored time has no mass after its mass gets distributed; note also that after redistributing the mass from the first censored time, the second smallest censored time will have mass 1/ n + (1/ n )/( n - j ) = 1/ n + 1/[ n ( n - j )], which will have itself to be redistributed, according to step (iv) next. (iv) repeat step (iii) until all censored times have zero mass, and the total mass (of 1.0) is distributed to the event times only. Prove that the above algorithm is equivalent to the KM estimator.
8. [ 5 pts] Consider PROC LIFETEST in SAS/STAT 12.3, Example 52.3 Life-Table Estimates for Males with Angina Pectoris, from SAS Help and Documentation. Repeat the analyses from this example, but using R.