





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Description About Frequency Data and analyze data with R-studio (includes code).
Typology: Assignments
1 / 9
This page cannot be seen from the preview
Don't miss anything!
Section 1: Introduction (10 pts) Frequency of a specific measurement in a sample is the number of observations having a particular value of the measurement. The frequency distribution shows how often each value of the variable occurs in the sample. We use the frequency distribution of a sample to inform us about the distribution of the variable in the population from the data set it originated. The frequency distribution will show the shape of the data which allows us to detect patterns. Frequency distribution allows us to visualize data for a single variable and the frequency distribution of the variable gives the number of occurrences for all values in the data. Relative frequency is the proportion of observations having a given measurement which is calculated as the frequency divided by the total observations. The relative frequency distribution is the proportion of occurrences of each value in the data set. One learning outcome is to be able to calculate a confidence interval for a proportion. Other learning outcomes includes make a hypothesis test about proportions, fitting frequency data to a model, and testing for a fit to a Poisson distribution. Section 2: Question 1-4 (R scripts & graphs & answers to individual questions) (Q1 10pts, Q2 20pts, Q3 20 pts, Q4 30 pts)
b. Use chisq.test() to test the null hypothesis that the selection of the stockings was independent of position. X^2 test statistic is given by where Oi is the observed frequencies and Ei is the expected frequency at the ith position The following table will help us get the values: Position Subjects Expected (O-E) (O-E)^2 Chi-sq Far-Left 6 13 -7 49 3. Left-Middle 9 13 -4 16 1. Right-Middle 16 13 3 9 0. Far-Right 21 13 8 64 4. Total 52 52 10. The critical value of Chi-square at 3 df is 7.8147. Since the calculated value of X^2 is >the critical value, we reject the null hypothesis and conclude that the selection of stockings are not independent of the position. c. (Optional) The function chisq.test() can take the data either as a data frame, as above, or as a vector of the observed counts, as a parameter called x as input: chisq.test(x = c(6,9,16,21), p = c(0.25,0.25,0.25,0.25)) Try it using the specification of the counts, to see that you get the same answer as in (b). chisq.test(x = c(6,9,16,21), p = c(0.25,0.25,0.25,0.25)) Chi-squared test for given probabilities data: c(6, 9, 16, 21) X-squared = 10.615, df = 3, p-value = 0.
circumstances in which birth month can have a strong effect on later life? One prediction is that elite athletes will disproportionately have been born in the months just after the age cutoff used to separate levels for young players of the sport. The prediction is that those athletes that are oldest within an age group will do better by being relatively older, and therefore will gain more confidence and attract more coaching attention than the relatively younger players in their same groups. As a result, they may be more likely to dedicate themselves to the sport and do well later. In the case of soccer, the cutoff for different age groups is generally August. a. The birth months (by three month interval) of soccer players competing in the Under- 20’s World Tournament are recorded in the data file “soccer_birth_quarter.csv” (from Barnsley et al. 1992). Plot these data. Do you see a pattern? getwd() library(readr) library(readr) soccer_birth_quarter <- read_csv("DataForLabs/soccer_birth_quarter.csv") View(soccer_birth_quarter) library(ggplot2) ggplot(data=soccer_birth_quarter,aes(x=birth_quarter)) +geom_histogram(stat="count")
Cardiactable MMlist<-read.csv("DataForLabs/cardiac arrests out of hospital.csv",stringsAsFactors = TRUE) Cardiactable 0 1 2 3 4 5 6 36 79 60 41 28 10 7 b. What is the mean number of heart attacks per week? lambda<-mean(cardiac_arrests_out_of_hospital$out_of_hospital_cardiac_arrests) lambda
c. For the mean you just calculated, use dpois() to calculate the probability of 0 heart attacks in a week assuming a Poisson distribution. Multiply that probability by the number of data points to calculate the expected frequency of 0 in these data under the null hypothesis of a Poisson distribution. dpois(x=0,lambda=2.015326,log=FALSE)
sum(cardiac_arrests_out_of_hospital) 526 261*0.
d. Here is a table of the expected frequencies under the null hypothesis. (The expected frequency for zero heart attacks should match your calculation above.) Are these frequencies acceptable for use in a χχ^2 goodness of fit test? Number of heart attacks Expected 0 34. 1 70. 2 70.
Number of heart attacks Expected 3 47. 4 23. 5 9. 6 or more expected_frequency<-261*dpois(x=0:6,lambda=2.015326,log=FALSE) expected_frequency 34.785283 70.103686 70.640891 47.454808 23.909227 9.636977 3.
e. Create vectors in R for the observed and expected frequencies. observed_frequency<-Cardiactable observed_frequency 0 1 2 3 4 5 6 36 79 60 41 28 10 7 f. Calculate the χχ^2 for this hypothesis test, using chisq.test()$statistic. chisq.test(observed_frequency,p=expected_frequency,rescale.p=TRUE)$statistic X-squared
g. How many degrees of freedom should this χχ^2 goodness of fit test have? Degree of freedom= h. Calculate the P -value for this test, using pchisq(). pchisq(q=8.693435,df=5,lower.tail=FALSE)