Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

T-Tests: One-Tailed and Two-Tailed for Single and Paired Samples, Study Guides, Projects, Research of Statistics

How to use t-tests to determine the probability that a population mean is larger or smaller than a given value based on a single sample, or to compare the means of two samples. One-tailed and two-tailed tests for single samples, as well as two-tailed tests for two samples. It also discusses the importance of checking the normality of the sample distribution before running the test. R and sas code examples for conducting t-tests.

Typology: Study Guides, Projects, Research

2021/2022

Uploaded on 09/27/2022

jacksonhh
jacksonhh 🇬🇧

4.2

(23)

251 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
11. T-Tests
As we saw previously, z-scores are an easy way to see where a single value falls in a population
distribution when we know the mean and standard deviation. Similarly, t-statistics tell us where a value
falls within a sampling distribution (which we can also calculate with a mean and standard error).
Therefore, we can use a t-test to determine the probability that a true population mean is greater/less
than a given value when we only have a single sample from that population. Also, we can use a t-test to
compare two samples to each other. This is usually to determine if they came “from the same population”,
which is another way to ask if the two samples are different from one another in a statistically significant
way. T-tests are among the simplest and most powerful statistical analyses that we can do.
11.1 One-tailed T-test for a single sample
A one-tailed T test for a single sample works like calculating a confidence interval in reverse. It
basically compares the mean of a sample to some known value (usually the mean of the population).
The test returns the probability that the mean of a sample is above or below (not both, of course) a
known threshold value (hence, it is a one-way test).
All t-tests assume that the sample data is normally distributed. You should always check the normality
of the sample distribution before running the t-test. Remember from Lab 8, you can check normality
visually with histograms and boxplots and statistically with a Shapiro-Wilk test.
Using data from a previous lab for lentil variety A (VarA) from Farm 1, determine the probability that
the true population mean for this variety is larger than 650. You will need to import and attach the
lentil data (again). You will recognize the first few lines of code as the same as for the confidence
interval calculation from Lab 10:
H0: The sample mean is not significantly larger than 650.
H1: The sample mean is significantly larger than 650.
shapiro.test(VarA) # Test your data for normality
xA=mean(VarA) # The mean of the sample
xA
seA=sd(VarA)/sqrt(4) # The SE of the sample
seA
tA=(xA-650)/seA # The t value, given a threshold of 650
tA # Actual t value
We can now determine the probability that the sample mean is actually larger than 650 using the
pt() command in R. We can also use the qt() command to see what t value would be required to
meet an alpha of 0.05, given that we have 3 degrees of freedom. In the qt() command, we use 0.95
instead of 0.05 because we are interested in the right tail of the curve only:
pt(tA,3) # Probability that the sample mean > 650
qt(0.95,3) # Critical t value for p > 0.05
Because the calculated t value is larger than our critical t value (and our p-value is subsequently less
than 0.05), we reject the null hypothesis and conclude that the mean is significantly larger than 650.
pf3
pf4

Partial preview of the text

Download T-Tests: One-Tailed and Two-Tailed for Single and Paired Samples and more Study Guides, Projects, Research Statistics in PDF only on Docsity!

11. T-Tests

As we saw previously, z-scores are an easy way to see where a single value falls in a population distribution when we know the mean and standard deviation. Similarly, t-statistics tell us where a value falls within a sampling distribution (which we can also calculate with a mean and standard error). Therefore, we can use a t-test to determine the probability that a true population mean is greater/less than a given value when we only have a single sample from that population. Also, we can use a t-test to compare two samples to each other. This is usually to determine if they came “from the same population”, which is another way to ask if the two samples are different from one another in a statistically significant way. T-tests are among the simplest and most powerful statistical analyses that we can do.

11.1 One-tailed T-test for a single sample

 A one-tailed T test for a single sample works like calculating a confidence interval in reverse. It basically compares the mean of a sample to some known value (usually the mean of the population). The test returns the probability that the mean of a sample is above or below (not both, of course) a known threshold value (hence, it is a one-way test ).

 All t-tests assume that the sample data is normally distributed. You should always check the normality of the sample distribution before running the t-test. Remember from Lab 8, you can check normality visually with histograms and boxplots and statistically with a Shapiro-Wilk test.

 Using data from a previous lab for lentil variety A (VarA) from Farm 1, determine the probability that the true population mean for this variety is larger than 650. You will need to import and attach the lentil data (again). You will recognize the first few lines of code as the same as for the confidence interval calculation from Lab 10:

H 0 : The sample mean is not significantly larger than 650. H 1 : The sample mean is significantly larger than 650. shapiro.test( VarA ) # Test your data for normality xA =mean( VarA ) # The mean of the sample xA seA =sd( VarA )/sqrt( 4 ) # The SE of the sample seA tA =( xA - 650 )/ seA # The t value, given a threshold of 650 tA # Actual t value

 We can now determine the probability that the sample mean is actually larger than 650 using the pt() command in R. We can also use the qt() command to see what t value would be required to meet an alpha of 0.05, given that we have 3 degrees of freedom. In the qt() command, we use 0. instead of 0.05 because we are interested in the right tail of the curve only:

pt( tA , 3 ) # Probability that the sample mean > 650 qt( 0.95 , 3 ) # Critical t value for p > 0.

Because the calculated t value is larger than our critical t value (and our p-value is subsequently less than 0.05), we reject the null hypothesis and conclude that the mean is significantly larger than 650.

 Just as with confidence intervals, however, R has a useful shortcut. So, you can check the results you developed above from first principles (i.e. math) with the proper R function for a t-test. In this case, we must specify our threshold value (mu=) and whether we are testing (alternative=) if the sample mean is higher (greater), lower (less), or just different (two.sided) than this value. Also have a look at the help file for more options:

t.test( VarA , mu= 650 , alternative=" greater ") ?t.test

 You can also use SAS for this. In this case, we are testing if the mean of the variable “Variable1” is significantly different from 650. Note: in SAS, proc ttest only does two-tailed tests. But you can divide the resulting p-value by two to get it for a one tailed test (i.e. you are asking if your sample is significantly larger than 650 (not significantly different from 650):

proc ttest data= yourdata H0= 650 ; var Variable1 ; run;

11.2 Two-tailed T-test for two samples

 Now, let’s expand the T-test to two samples where we ask if the true means of two populations are significantly different. Remember that a simple comparison of two means isn’t that meaningful (of course they’re at least slightly different!) unless we also have some idea of the variation (spread) of each sample. Again, we likely don’t know the population parameters, so we are using statistics from samples to infer about the populations.

 We will modify the code above to also test lentil varieties A versus C and B versus C. It will be helpful to make R objects for each of the varieties, as well as the means for each variety. Remember, you also need to test each of your samples for normality:

VarA=data[data$FARM=="Farm1" & data$VARIETY=="A","YIELD"] VarB=data[data$FARM=="Farm1" & data$VARIETY=="B","YIELD"] VarC=data[data$FARM=="Farm1" & data$VARIETY=="C","YIELD"] xA=mean(VarA) xB=mean(VarB) xC=mean(VarC)

 Now, we will compute a t-statistic, comparing Variety A to B, using the formula:

tAB=(xA-xB)/sqrt(var(VarA)/4+var(VarB)/4) tAB

 Our threshold is set at α=0.05. We can use qt() to check the t-value for the threshold. Remember that we must divide 0.05 in half because this is a two-tailed test. Also, we can check the critical t- value for 0.975 or 0.025, as they return the same value (one positive and one negative, on either tail of the distribution). We can then compare this critical t-value to our calculated t-statistic above (tAB). Instead of comparing the critical t and t-statistic manually, we can also just use pt() to determine the p-value (percentile) of our t-statistic.

qt(0.975,6) pt(tAB,6)

CHALLENGE:

  1. The time needed to drive from city A to city B is normally distributed with a mean of180 minutes and standard deviation of 20 minutes. a. What is the probability that a person will drive from city A to city B in three hours or more? b. What is the probability that a person will drive from city A to city B in more than 140 minutes? c. What is the probability that a person will drive from city A to city B in exactly three hours? d. What is the probability that a person will drive from city A to city B in less than 2.5 hours?