Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Applied Statistics Examination: M. Phil. in Statistical Science - June 2006, Exams of Statistics

The instructions and data sets for an applied statistics examination held in june 2006. The examination covers various topics including statistical analysis, data visualization, and hypothesis testing. Candidates are required to answer three questions based on the provided data sets, which include motor insurance premiums and mrsa bacteraemia rates. The examination is an 'open book' event, and candidates are allowed to use the statistical laboratory's network of workstations.

Typology: Exams

2012/2013

Uploaded on 02/26/2013

dharmanand
dharmanand 🇮🇳

3.3

(3)

61 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
M. PHIL. IN STATISTICAL SCIENCE
9am Tuesday 13 June to 1pm Friday 16 June 2006
APPLIED STATISTICS
Attempt THREE questions. There are FOUR questions in total.
Marks for each question are indicated on the paper in square brackets.
Each question is worth a total of 20 marks.
This is an ‘Open Book’ examination, involving the use of the Statistical Laboratory’s
network of workstations. Candidates will receive this paper at 9.00am on Tuesday
13 June, and must hand in their scripts to the Chairman of Examiners by 1.00pm
on Fridy 16 June.
The data sets will be emailed to candidates on Tuesday 13 June.
(The Statistical Laboratory Computer Officer and Examiner will normally be avail-
able for consultation if required between 9.00am and 4.30pm on these four days.)
Each candidate should submit his/her script with a signed statement that the work
has been carried out without any collaboration with others.
The scripts may be handwritten. Candidates are requested to submit at most 25
pages in total. They are advised that the total work set should take between 4 and 6
hours.
STATIONERY REQUIREMENTS SPECIAL REQUIREMENTS
Cover sheet None
Treasury Tag
Script paper
You may not start to read the questions
printed on the subsequent pages until
instructed to do so by the Invigilator.
pf3
pf4
pf5

Partial preview of the text

Download Applied Statistics Examination: M. Phil. in Statistical Science - June 2006 and more Exams Statistics in PDF only on Docsity!

M. PHIL. IN STATISTICAL SCIENCE

9am Tuesday 13 June to 1pm Friday 16 June 2006

APPLIED STATISTICS

Attempt THREE questions. There are FOUR questions in total. Marks for each question are indicated on the paper in square brackets. Each question is worth a total of 20 marks.

This is an ‘Open Book’ examination, involving the use of the Statistical Laboratory’s

network of workstations. Candidates will receive this paper at 9.00am on Tuesday

13 June, and must hand in their scripts to the Chairman of Examiners by 1.00pm

on Fridy 16 June.

The data sets will be emailed to candidates on Tuesday 13 June.

(The Statistical Laboratory Computer Officer and Examiner will normally be avail-

able for consultation if required between 9.00am and 4.30pm on these four days.)

Each candidate should submit his/her script with a signed statement that the work

has been carried out without any collaboration with others.

The scripts may be handwritten. Candidates are requested to submit at most 25

pages in total. They are advised that the total work set should take between 4 and 6

hours.

STATIONERY REQUIREMENTS SPECIAL REQUIREMENTS

Cover sheet None

Treasury Tag

Script paper

You may not start to read the questions

printed on the subsequent pages until

instructed to do so by the Invigilator.

1 The table below appeared in the Times (17 October 2005) under the heading

“Caught on camera: the effect of speeding convictions upon insurance premiums (in £)”.

Number of points 0 3 6 9

21 year old male 306 384 384 409 500 555 555 605

21 year old female 266 304 279 287 435 430 464 478

30 year old female 177 177 177 213 320 325 325 368

40 year old male 154 162 162 189 230 230 230 295

The table shows the motor insurance premiums for various categories of policyholders,

with 0,3,6 or 9 points on their driving licenses (note that 3 points are incurred for each

speeding conviction). For each category of policyholder, the top row gives the premiums to

be paid for third party fire and theft only policies, and the bottom row gives the premiums

to be paid for comprehensive policies.

(i) Carry out initial plots to illustrate how the “response variable” premiums paid depends on the four factors Age, Sex, Type of Policy and Number of Points. [3]

(ii) Fit an additive model for this dependence and summarise it. Test the hypothesis that premiums are not affected by the first two speeding convictions. [8]

(iii) What happens if you now include two-factor interactions? How might you improve your model by transforming the response? Summarise how premiums depend on the four factors. [9]

Applied Statistics

between 2001-2 and 2002-3, and between 2002-3 and 2003-4. [3]

(iii) London Hospital Trusts are indicated by an asterisk. Use nonparametric methods to investigate whether there is a difference in rates between London and non-London Hospital Trusts. [6]

(iv) Using Poisson models for the number of reports, investigate the dependence of the rates of MRSA on the year and on whether or not the Trust is in London. Comment briefly on the fit of such models. [8]

3 The data below are taken from the British Medical Journal (March 2005) and show

the numbers of cases and deaths over a three year period for 24 surgeons carrying out a

particular type of operation in four hospitals, with low- and high-risk patients.

low risk patients high risk patients Surgeon Hospital cases deaths cases deaths 1 1 349 1 76 4 2 2 223 2 35 1 3 2 248 2 42 3 4 2 347 3 53 6 5 3 415 5 112 8 6 3 469 4 98 4 7 1 379 1 69 1 8 3 252 6 56 2 9 3 230 3 63 8 10 4 311 5 51 2 11 4 349 1 64 1 12 2 247 1 19 0 13 2 191 1 48 2 14 4 275 3 53 3 15 3 412 4 76 4 16 1 419 5 84 6 17 4 286 5 51 4 18 3 149 2 48 5 19 4 375 7 63 7 20 3 406 3 107 5 21 3 290 5 81 4 22 1 229 1 51 0 23 2 330 4 56 2 24 2 323 2 65 1

Carry out preliminary summaries and plots for this data set. [4]

By fitting a suitable model and interpreting the output, investigate the mortality rates for

these surgeons, taking account of this risk status of the patients. [9]

Investigate whether any observed differences are adequately explained as differences

between hospitals. [7]

Applied Statistics

4 Shown below is a subset of dataset on 500 patients from a GP practice where the

body mass index (bmi), age and sex of the patients were recorded.

id sex age bmi

1 1 18 23. 2 0 47 21. 3 0 30 30. 4 0 60 18. .. . 499 0 40 24. 500 1 25 24.

id = Patient’s anonymous identifier

sex = Sex of patient (0 corresponds to female; 1 to male)

age = Age of patient (in years)

bmi = Body mass index (kg/m^2 )

The researchers who collected the data are interested in determining how bmi

depends on age and sex, and whether patients, after taking into account their age and

sex, can be clustered into two different groups based on their body mass index. They

try lm(bmi∼ age + sex), and then realize that they do not know how to proceed to the

clustering. They approach you with their data.

(a) Plot a histogram of the residuals from the regression, and fit a suitable

parametric mixture distribution. You need to provide all relevant plots and details of

the mixture model to justify the number of groups you have decided upon. [15]

(b) The researchers also wish to know how they would go about classifying a female

patient aged 20 with a body mass index of 23 into one of the groups that you have derived.

Use the parametric mixture model that you have derived in (a) to demonstrate how best

to allocate this female patient to a group. You need to justify the allocation rule that you

use. [5]

END OF PAPER

Applied Statistics