General Statistics
Statistical Inference
Comparing two population means
(large sample size)

Comparing two population means - large independent samples

Comparing sample means of two independent samples with large sample size is similar to comparing a sample mean against a population mean (Chapter 7); the z-score or z-statistics for the standard normal distribution is used to evaluate tests. The only difference is the values for the parameters used in determining the statistics.

The hypothesis testing involving two different means study the distribution of their differences:.

1. Know the basic general test statistics used for comparing two population means.

If we have two populations or sample distributions the following basic statistics can be obtained from each:
 
Population or Sample Identification Sample size Sample mean Population mean Sample Standard deviation Population standard deviation
1 n1
2 n2

Large sample sizes studies use the standard normal z-score statistics and small sample size studies use the student t statistics.

If we let  ( and  and  be a combined standard deviation for both sample distributions or data sets, then
 
For large sample size () the test statistics in a hypothesis test is:

, the z-score

For small sample size () the test statistics in a hypothesis test is:

, the student's t, df = n-1

For large sample size the standard deviation and test statistics are:
 
Standard Deviation:

Also 

Test statistics, z

2. Know how to use appropriate statistics to test if two sample means are equal to each other or if their difference = 0 (large sample size).

3 Types of tests in comparing two sample means:

When comparing the sample means,  there are 3 questions to consider:

Question 1:: Is  ? Ha (Two-tailed test)

Question 2: : Is  ? Ha (Right-tailed test)

Question 3: : Is ? Ha (Left-tailed test)

Question 1: Is  ? Ha (Two-tailed test)
 
Is   Ha

Are the two sample means the same or not the same? 

If the two means are the same then their difference is approximately = 0.

Here we use a two-tailed test by computing the confidence interval for the test at the level of significance .

If the test statistics for z falls outside this interval we decide that the means differs, we chose the alternate hypothesis, Ha

Otherwise we have no reason to think that they differ, H0

By Examples:

Problem 1. Two types of cars are compared for acceleration rate. 40 test runs are recorded for each car and the results for the mean elapsed time recorded below:
 
  Sample mean Sample standard deviation Sample size
Car A (x1) 7.4 1.5 40
Car B (x2) 7.1 1.8 40

Construct a 98% CI for the difference in the mean elapsed time for the two types of cars. Using this CI, determine if there is a difference in the mean elapsed times?

Given difference , n = 40 (large so can use normal approximation of z-score).

Step 1 - Hypothesis: The claim that , the null hypothesis.

The alternate hypothesis is that 

H0 :

Ha :

Step 2. Select level of significance: This is given as  (2% = 100 - 98)

So for two-tailed test

Step 3. Test statistics and observed value.

Step 4. Determine the critical region (favors Ha)

For alpha = 0.02 at both ends of intervals: 0.01 and 0.99, za/2 = -2.326 and z1-a/2 = 2.326
 

The critical region is  and  (reference table)

A 98% Confidence Interval for the difference is  or (diff: -0.56 to 1.16) shown in graph above

Step 5. Make decision.

No not reject the null hypothesis if  or 

The observed z = 0.81, and since 0.81 < 2.326 and is not in the critical region, we have no reason to reject H0 in favor of Ha.

Note also that a difference of 0.75 is between the confidence interval of -0.56 and 1.16 the blue region for the null hypothesis acceptance.

There the difference between both means are 0.

Question 2: : Is  ? Ha (Right-tailed test)
 
Is Ha

Is  the sample mean of sample 1 greater than the sample mean of sample 2? 

If so then their difference will be significantly greater than zero or not equal to 0.

Here we use a one-tailed test by computing the confidence interval for the test at the level of significance .

If the test statistics for z falls outside this interval we decide that the means differs, we chose the alternate hypothesis, Ha

Otherwise we have no reason to think that they differ, H0

By Examples:

Problem 2. The personnel officer of a large corporation claimed that college graduates applying for jobs with their firm in the current year tended to have higher grade point averages than those applying in the previous year. Samples from the group of applicants gave the following results:
 
  Sample mean Sample standard deviation Sample size
Current year (x1) 2.98 0.4 60
Previous year (x2) 2.8 0.5 52

Is there sufficient evidence to justify the claim at a 5% level of significance?

Given difference , n >= 52 (large so can use normal approximation of z-score).

Step 1 - Hypothesis: The claim that  , the null hypothesis.

The alternate hypothesis is that 

H0 :

Ha :

Step 2. Select level of significance: This is given as  (5%)

Step 3. Test statistics and observed value.

Step 4. Determine the critical region (favors Ha)

For alpha = 0.05 at the upper end of intervals: 0.95, z1-a/2 = 1.65
 
The critical region is  (reference table)

Step 5. Make decision.

No not reject the null hypothesis if  or 

Since 2.08 is > 1.65 and in the critical region (red) we reject the null hypothesis that grades are the same both years.

The observed z = 2.08 and since 2.08 > 1.65 and is in the critical region, we reject H0 in favor of Ha.

Therefore we conclude that college graduates from current year have higher grades than previous year.

Question 3: : Is ? Ha (Left-tailed test)
 
Is Ha

Is the sample mean of sample 1 less than the sample mean of sample 2? 

Then their difference will be significantly greater than zero or not equal to 0.

Here we use a one-tailed test by computing the confidence interval for the test at the level of significance .

If the test statistics for z falls outside this interval we decide that the means differs, we chose the alternate hypothesis, Ha

Otherwise we have no reason to think that they differ, H0

By Examples:

Problem 3. A biologist suspected that males age 20 - 24 have a lower mean systolic blood pressure than males in the same age group. Independent random samples produced the following results for systolic pressure.
 
  Sample mean Sample standard deviation Sample size
Female (x1) 117 12.1 41
Male (x2) 125 13.90.5 31

Is there sufficient evidence to justify the claim at a 1% level of significance?

Given difference , n >= 31 (large so can use normal approximation of z-score)

Step 1 - Hypothesis: The claim that , the null hypothesis.

The alternate hypothesis is that 

H0 :

Ha :

Step 2. Select level of significance: This is given as  (1%)

Step 3. Test statistics and observed value.

Step 4. Determine the critical region (favors Ha)

For alpha = 0.01 at the upper end of intervals: 0.99, za/2 = -2.326
 

The critical region is    (reference table)

Step 5. Make decision.

No not reject the null hypothesis if 

Since -2.56 is < -2.326 and in the critical region (red) we reject the null hypothesis that female and male systolic pressure are the same..

The observed z = -2.56 and since -2.56 < 2.326 and is in the critical region, we reject H0 in favor of Ha.

Therefore we conclude that female systolic pressure are lower than male's same age (20-24)


Hypothesis Testing Worksheet