Comparing two population means

General Statistics

Statistical Inference
Comparing two population means
(large sample size)

Comparing two population means - large independent samples

Comparing sample means of two independent samples with large sample size is similar to comparing a sample mean against a population mean (Chapter 7); the z-score or z-statistics for the standard normal distribution is used to evaluate tests. The only difference is the values for the parameters used in determining the statistics.

The hypothesis testing involving two different means study the distribution of their differences:.

1. Know the basic general test statistics used for comparing two population means.

If we have two populations or sample distributions the following basic statistics can be obtained from each:

Population or Sample Identification Sample size Sample mean Population mean Sample Standard deviation Population standard deviation

1 n₁

2 n₂

Large sample sizes studies use the standard normal z-score statistics and small sample size studies use the student t statistics.

If we let ( and and be a combined standard deviation for both sample distributions or data sets, then

For large sample size () the test statistics in a hypothesis test is:
, the z-score
For small sample size () the test statistics in a hypothesis test is:
, the student's t, df = n-1

For large sample size the standard deviation and test statistics are:

Standard Deviation:

Also
Test statistics, z

2. Know how to use appropriate statistics to test if two sample means are equal to each other or if their difference = 0 (large sample size).

3 Types of tests in comparing two sample means:

When comparing the sample means, there are 3 questions to consider:

Question 1:: Is ? H_a (Two-tailed test)

Question 2: : Is ? H_a (Right-tailed test)

Question 3: : Is ? H_a (Left-tailed test)

Question 1: Is ? H_a (Two-tailed test)

Is H_a
Are the two sample means the same or not the same?
If the two means are the same then their difference is approximately = 0.
Here we use a two-tailed test by computing the confidence interval for the test at the level of significance .
If the test statistics for z falls outside this interval we decide that the means differs, we chose the alternate hypothesis, H_a
Otherwise we have no reason to think that they differ, H₀

By Examples:

Problem 1. Two types of cars are compared for acceleration rate. 40 test runs are recorded for each car and the results for the mean elapsed time recorded below:

Sample mean Sample standard deviation Sample size

Car A (x1) 7.4 1.5 40

Car B (x2) 7.1 1.8 40

Construct a 98% CI for the difference in the mean elapsed time for the two types of cars. Using this CI, determine if there is a difference in the mean elapsed times?

Given difference , n = 40 (large so can use normal approximation of z-score).

Step 1 - Hypothesis: The claim that , the null hypothesis.

The alternate hypothesis is that

H₀ :

H_a :

Step 2. Select level of significance: This is given as (2% = 100 - 98)

So for two-tailed test:

Step 3. Test statistics and observed value.

Step 4. Determine the critical region (favors H_a)

For alpha = 0.02 at both ends of intervals: 0.01 and 0.99, z_a/2 = -2.326 and z_1-a/2 = 2.326

The critical region is and (reference table)

A 98% Confidence Interval for the difference is or (diff: -0.56 to 1.16) shown in graph above

Step 5. Make decision.

No not reject the null hypothesis if or

The observed z = 0.81, and since 0.81 < 2.326 and is not in the critical region, we have no reason to reject H₀ in favor of H_a.

Note also that a difference of 0.75 is between the confidence interval of -0.56 and 1.16 the blue region for the null hypothesis acceptance.

There the difference between both means are 0.

Question 2: : Is ? H_a (Right-tailed test)

Is H_a
Is the sample mean of sample 1 greater than the sample mean of sample 2?
If so then their difference will be significantly greater than zero or not equal to 0.
Here we use a one-tailed test by computing the confidence interval for the test at the level of significance .
If the test statistics for z falls outside this interval we decide that the means differs, we chose the alternate hypothesis, H_a
Otherwise we have no reason to think that they differ, H₀

By Examples:

Problem 2. The personnel officer of a large corporation claimed that college graduates applying for jobs with their firm in the current year tended to have higher grade point averages than those applying in the previous year. Samples from the group of applicants gave the following results:

Sample mean Sample standard deviation Sample size

Current year (x1) 2.98 0.4 60

Previous year (x2) 2.8 0.5 52

Is there sufficient evidence to justify the claim at a 5% level of significance?

Given difference , n >= 52 (large so can use normal approximation of z-score).

Step 1 - Hypothesis: The claim that , the null hypothesis.

The alternate hypothesis is that

H₀ :

H_a :

Step 2. Select level of significance: This is given as (5%)

Step 3. Test statistics and observed value.

Step 4. Determine the critical region (favors H_a)

For alpha = 0.05 at the upper end of intervals: 0.95, z_1-a/2 = 1.65

The critical region is (reference table)

Step 5. Make decision.

No not reject the null hypothesis if or

Since 2.08 is > 1.65 and in the critical region (red) we reject the null hypothesis that grades are the same both years.

The observed z = 2.08 and since 2.08 > 1.65 and is in the critical region, we reject H₀ in favor of H_a.

Therefore we conclude that college graduates from current year have higher grades than previous year.

Question 3: : Is ? H_a (Left-tailed test)

Is H_a
Is the sample mean of sample 1 less than the sample mean of sample 2?
Then their difference will be significantly greater than zero or not equal to 0.
Here we use a one-tailed test by computing the confidence interval for the test at the level of significance .
If the test statistics for z falls outside this interval we decide that the means differs, we chose the alternate hypothesis, H_a
Otherwise we have no reason to think that they differ, H₀

By Examples:

Problem 3. A biologist suspected that males age 20 - 24 have a lower mean systolic blood pressure than males in the same age group. Independent random samples produced the following results for systolic pressure.

Sample mean Sample standard deviation Sample size

Female (x1) 117 12.1 41

Male (x2) 125 13.90.5 31