


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
How to use t-tests to determine the probability that a population mean is larger or smaller than a given value based on a single sample, or to compare the means of two samples. One-tailed and two-tailed tests for single samples, as well as two-tailed tests for two samples. It also discusses the importance of checking the normality of the sample distribution before running the test. R and sas code examples for conducting t-tests.
Typology: Study Guides, Projects, Research
1 / 4
This page cannot be seen from the preview
Don't miss anything!
As we saw previously, z-scores are an easy way to see where a single value falls in a population distribution when we know the mean and standard deviation. Similarly, t-statistics tell us where a value falls within a sampling distribution (which we can also calculate with a mean and standard error). Therefore, we can use a t-test to determine the probability that a true population mean is greater/less than a given value when we only have a single sample from that population. Also, we can use a t-test to compare two samples to each other. This is usually to determine if they came “from the same population”, which is another way to ask if the two samples are different from one another in a statistically significant way. T-tests are among the simplest and most powerful statistical analyses that we can do.
11.1 One-tailed T-test for a single sample
A one-tailed T test for a single sample works like calculating a confidence interval in reverse. It basically compares the mean of a sample to some known value (usually the mean of the population). The test returns the probability that the mean of a sample is above or below (not both, of course) a known threshold value (hence, it is a one-way test ).
All t-tests assume that the sample data is normally distributed. You should always check the normality of the sample distribution before running the t-test. Remember from Lab 8, you can check normality visually with histograms and boxplots and statistically with a Shapiro-Wilk test.
Using data from a previous lab for lentil variety A (VarA) from Farm 1, determine the probability that the true population mean for this variety is larger than 650. You will need to import and attach the lentil data (again). You will recognize the first few lines of code as the same as for the confidence interval calculation from Lab 10:
H 0 : The sample mean is not significantly larger than 650. H 1 : The sample mean is significantly larger than 650. shapiro.test( VarA ) # Test your data for normality xA =mean( VarA ) # The mean of the sample xA seA =sd( VarA )/sqrt( 4 ) # The SE of the sample seA tA =( xA - 650 )/ seA # The t value, given a threshold of 650 tA # Actual t value
We can now determine the probability that the sample mean is actually larger than 650 using the pt() command in R. We can also use the qt() command to see what t value would be required to meet an alpha of 0.05, given that we have 3 degrees of freedom. In the qt() command, we use 0. instead of 0.05 because we are interested in the right tail of the curve only:
pt( tA , 3 ) # Probability that the sample mean > 650 qt( 0.95 , 3 ) # Critical t value for p > 0.
Because the calculated t value is larger than our critical t value (and our p-value is subsequently less than 0.05), we reject the null hypothesis and conclude that the mean is significantly larger than 650.
Just as with confidence intervals, however, R has a useful shortcut. So, you can check the results you developed above from first principles (i.e. math) with the proper R function for a t-test. In this case, we must specify our threshold value (mu=) and whether we are testing (alternative=) if the sample mean is higher (greater), lower (less), or just different (two.sided) than this value. Also have a look at the help file for more options:
t.test( VarA , mu= 650 , alternative=" greater ") ?t.test
You can also use SAS for this. In this case, we are testing if the mean of the variable “Variable1” is significantly different from 650. Note: in SAS, proc ttest only does two-tailed tests. But you can divide the resulting p-value by two to get it for a one tailed test (i.e. you are asking if your sample is significantly larger than 650 (not significantly different from 650):
proc ttest data= yourdata H0= 650 ; var Variable1 ; run;
11.2 Two-tailed T-test for two samples
Now, let’s expand the T-test to two samples where we ask if the true means of two populations are significantly different. Remember that a simple comparison of two means isn’t that meaningful (of course they’re at least slightly different!) unless we also have some idea of the variation (spread) of each sample. Again, we likely don’t know the population parameters, so we are using statistics from samples to infer about the populations.
We will modify the code above to also test lentil varieties A versus C and B versus C. It will be helpful to make R objects for each of the varieties, as well as the means for each variety. Remember, you also need to test each of your samples for normality:
VarA=data[data$FARM=="Farm1" & data$VARIETY=="A","YIELD"] VarB=data[data$FARM=="Farm1" & data$VARIETY=="B","YIELD"] VarC=data[data$FARM=="Farm1" & data$VARIETY=="C","YIELD"] xA=mean(VarA) xB=mean(VarB) xC=mean(VarC)
Now, we will compute a t-statistic, comparing Variety A to B, using the formula:
tAB=(xA-xB)/sqrt(var(VarA)/4+var(VarB)/4) tAB
Our threshold is set at α=0.05. We can use qt() to check the t-value for the threshold. Remember that we must divide 0.05 in half because this is a two-tailed test. Also, we can check the critical t- value for 0.975 or 0.025, as they return the same value (one positive and one negative, on either tail of the distribution). We can then compare this critical t-value to our calculated t-statistic above (tAB). Instead of comparing the critical t and t-statistic manually, we can also just use pt() to determine the p-value (percentile) of our t-statistic.
qt(0.975,6) pt(tAB,6)