Thursday, November 9, 2017

Assignment 4: Hypothesis Testing

Introduction

The goal of this assignment is to work with some key concepts related to hypothesis testing.  This assignment demonstrates a knowledge of Z and T tests including how to distinguish between each test and  the calculations involved with each test.  Also in this assignment the steps of hypothesis testing are used to make decisions about null and alternative hypothesis by using real-world data and making connections between geography and the calculated statistics.

Part 1: T and Z tests

Part 1 of this assignment demonstrates a basic knowledge of Z and T tests by answering a few short questions.

Question 1: The first question involved completing the table in Figure 1 when only the first four columns were given.  Three fields needed to be filled in including a, determining if it is a Z or a T test, and calculating the Z or T value for the test.  Column "a" was calculated by subtracting the confidence level from 100 and moving the decimal place two places to the left to convert a percentage to a decimal number.  Whether a Z or T test should be used was determined by how large a sample size (n) was being used.  If "n" is large than 30 a Z test should be used and if "n" is less than 30 a T test should be used.  Lastly the Z or T Value was calculated by using T score and Z score charts.  The T score chart is in Figure 2 and the Z score chart is in Figure 3.  If the test was two tailed the "a" value had to be divided by two to account for using two tails.  For a two tail Z test half of the "a" value was added to the confidence level to be used on the Z score chart.  To use the Z score chart the confidence level was used and the corresponding Z score was found by using the X and Y axis of the chart.  The T score chart was used by first calculating the degrees of freedom (n-1) and then finding the corresponding T value by locating the correct "a" value on the top of the chart.  For two tailed T tests the a value used should be half in order to account for the second tail.   
     
Figure 1. The first fours columns were given in the table and the last three columns
 were calculated for Question 1. 

Figure 2. T score chart used to                       Figure 3. Z score chart used to determine critical values.
determine critical values.

Question 2: The second question worked with estimates from a Department of Agriculture and Live Stock Development organization in Kenya on three main crops grown in the country.  The given estimates were for how much districts should approach in production of groundnuts, cassava, and beans.  The estimated were calculated from averages based out of the whole country of Kenya.  A survey was conducted with 23 farmers in Kenya to get a sample mean(μ) and standard deviation of the sample(σ) for the three crops as well.  The data provided for the question can be seen in Figure 4.   

Figure 4. The data provided in Question 2.  The estimated yield was based off
 of averages in the country of Kenya and the sample mean(μ) and standard deviation of 
the sample(σ) were calculated from a survey of 23 farmers.     

In this question a significant test was asked to be conducted.  The null hypothesis is that there is no significant difference between the estimated and the actual yield of the surveyed results of each of the three crops.  The alternative hypothesis is that there is a significant difference between the estimated yield and the actual yield of the surveyed results of each crop.  The hypothesis will be tested using a T test for each crop instead of a Z test because the sample population is small, less than 30.  The T test equation in Figure 5 was used to test the hypothesis for the three crops. Two-tailed T tests with a 95% Confidence Level was used to test the hypothesis. The estimated yield was used as the hypothesized mean in the equation and there were 23 observations in this study.   

Figure 5. T test equation. 

The calculated results of this question can be seen in Figure 6.  The critical value range for each crop was determined by using the T score chart in Figure 2.  degrees of freedom is calculated by subtracting 1 from the number of observations and because a 2 tailed T test was being used the column of the chart that was used was for 0.025.  For the crop groundnuts the null hypothesis cannot be rejected.  The calculated T value falls between the critical value range.  There is not a significant difference between the estimated yield of groundnuts and the actual average yield of this crop.  For the crop cassava the null hypothesis can be rejected.  There is a significant difference between the estimated yield of cassava and the actual average yield of this crop.  The calculated T value falls outside of the confidence interval range. For the crop beans the null hypothesis cannot be rejected.  The calculated T value falls between the critical value range.  There is not a significant difference between the estimated yield of beans and the actual average yield of this crop. 

     Figure 6. Calculated results of Question 2. 
                                               
Using the probability chart from the textbook in Figure 7 the probability of having or exceeding a specific test statistic was determined for each crop.  To use the chart the degrees of freedom had to be calculated by subtracting 1 from the sample size.  In this example the degrees of freedom is 22.  Using the T statistic and the degrees of freedom the probability was found.  If the T statistic was negative the probability given from the chart was subtracted from 1.  The chart provided by the textbook only went to 20 degrees of freedom, not 22, so 20 degrees of freedom was used to calculate the probability in this example.  The probability of ground nuts was determined to be 0.27762.  The probability of cassava was determined to be 0.1062.  The probability of beans was determined to be 0.95652. 


Figure 7. Probability chart used to find probability of calculated T statistics.


Question 3: In question 3 significance testing was used to determine if a stream's pollutant level is higher than the allowable limit of 4.4 mg/l.  There were 17 samples taken and a mean pollutant level was calculated to be 6.8 mg/l and a standard deviation of 4.2.  A one tailed t test was calculated because the number of samples was less than 30 so a z test was not conducted and a 95% significance level was used.

The null hypothesis of this scenario would be that there was no significant difference between the mean pollutant level of the water samples and the allowable pollutant limit.  The alternative hypothesis is that there is a significant difference between the mean pollutant level of the water samples and the allowable pollutant limit.  A one tailed t test is used because the number of samples was less than 30 so a z test was not conducted and a 95% significance level was used. 

The equation in Figure 5 was used to calculate the T statistic was calculated to be 2.356.   The chart in Figure 2 was used to determine the critical value of 1.746.  The calculated T statistic is larger than the critical value meaning the null hypothesis can be rejected.  There is a significant difference between the sample of steam pollutant levels and the allowable limit for steam pollutants.  The calculated T statistic falls outside the confidence interval range.  It can also be determined that the stream pollutant limit if over the allowable limit for pollutants.

Using the probability chart in Figure 7 the probability of having or exceeding a specific test statistic was determined.  To use the chart the degrees of freedom had to be calculated by subtracting 1 from the sample size.  In this example the degrees of freedom is 16.  Using the T statistic (2.356) and the degrees of freedom the probability was found.  The probability of the calculated statistic was determined to be 0.96945.

Part 2: Real World Scenario

Part 2 of the lab posed a real world spatial question that relied on a hypothesis test to answer the question.  A hypothesis test was conducted in this example to determine if the average value of homes for the city of Eau Claire block groups is significantly different from the block groups for Eau Claire County?

The null hypothesis is that there is no significant difference between the average home values by block in the city of Eau Claire and the average home values by block for Eau Claire County.  The alternative hypothesis is that there is a significant difference between the average home values by block in the city of Eau Claire and the average home values by block for Eau Claire County.  It was determined that a Z test should be conducted on the average home values of the homes in the city of Eau Claire because there were 52 home values recorded for the city (n=52) and when a sample size is larger than 30 a Z test should be used over a T test.  The equation for a Z test is shown in Figure 8.  A 95% confidence interval is a standard practice when working with U.S. Census Bureau data so a 1 tailed Z test with a 95% confidence interval will be used.  The table of  data in Figure 9 was used to conduct the calculation of a Z test statistic. All of the values in this table were able to be obtained from the census data in the shapefiles provided in the assignment.  The calculated Z test statistic was calculated to be -2.548.  The critical value determined with the parameters of a 95% confidence interval of a one tailed Z test was selected from Figure 1.  The critical value is -1.64, the negative critical value was selected because the sample mean is smaller than the hypothesized mean in this example.  The Z test statistic is less than the critical value meaning the null hypothesis can be rejected.  There is a significant difference between the average value of homes at block group level in the city of Eau Claire and the average value of homes at block group level in the County of Eau Claire.  The probability of the city of Eau Claire's block group average home values is 0.0055.  This was found by using the chart in Figure 3 and finding the corresponding probability value to the calculated Z statistic.  This means that the city of Eau Claire sample block groups is in the 0.55 percentile which is extremely low.               

Figure 8. Z test equation.  

Figure 9. Data collected from census data to complete Z test.

A map in Figure 10 was created to compare the average home value by block group between the city of Eau Claire and the county of Eau Claire.  The average block group home value of the county can be seen in green and the average block group home value of the city can be seen in purple.  It can be seen that in many of the block groups in the inner city of Eau Claire there is a lower home value compared to the home values found in the rest of the county.  The county block groups appear to have significantly larger home values than the home values of the block groups in the city of Eau Claire.
   
Figure 10. Map comparing the average block group home values in 
the city of Eau Claire and Eau Claire County.