Module 2: Hypothesis Testing (chi-square tests)
Module Overview
In Module 1, we focused on hypothesis testing with t-tests, which are used for continuous data. In this module, we'll explore hypothesis testing with chi-square tests, which are used for categorical data.
Chi-square tests allow us to determine whether there is a significant association between categorical variables or whether observed categorical data matches what we would expect under a certain hypothesis. These tests are essential when analyzing survey responses, demographic information, or any data where variables are measured in categories rather than continuous values.
Learning Objectives
- Explain the purpose of a Chi-square test
- Set up a chi-square test for independence on two categorical variables
- Use a chi-square test and the p-value to evaluate null and alternative hypothesis
Objective 01 - Explain the Purpose of a Chi-square Test and Identify Applications
Overview
In this section, we're going to discuss a new statistical test called the chi-square test. It's sometimes written using the Greek letter chi, which looks like a wavy capital X: .
So why do we need yet another statistical test? Well, we can't apply a t-test to all situations. In some cases, we need to compare populations in different ways to determine how they are or are not related.
For example, we might have two or more populations for which we would like to compare two or more response categories. Say we are looking at the proportion of men and women who say their Facebook viewing time increases during specific months of the year. We would then not be calculating the mean of this sample distribution but rather if the amount of viewing time of one is significant compared to another group.
Another application of the chi-square test is to determine if two categorical variables are independent. An example might be to look at the association between texting while driving and car accidents. How can we determine if these two variables are related to each other?
The chi-square test of independence is another way to state this type of test: how dependent or independent are the two variables being tested. Let's move right into an example!
Follow Along
Chi-square Statistic
To complete a chi-square test on our sample populations, we need to set up our variables in a "contingency table." It's called this because we're testing to see if the number of cases in one of our categories is contingent upon (dependent on/independent of) the other variable.
Contingency Table
For this example, we'll look at some made-up data about cats and dogs and if they prefer treats or toys. Then, based on our chi-square analysis, we'll be able to make a statement about the preferences of the animals based on statistics.
Cats | Dogs | Row Total | |
---|---|---|---|
Treats | 200 | 290 | 490 |
Toys | 400 | 910 | 1310 |
Column total | 600 | 1200 | 1800 |
Based on these numbers, we can calculate the expected values by dividing the values in the columns by the total for that column.
Expected Value
Cats | Dogs | |
---|---|---|
Treats | (600x490)/1800 = 163.33 | (1200x490)/1800 = 326.67 |
Toys | (600x1310)/1800 = 436.67 | (1200x1310)/1800 = 873.33 |
We have this fancy table and expected values for cats and dogs preferences for either treats or toys. But how do we know if any of these results are statistically significant? Calculating the chi-square statistic comes in at this point. The following formula calculates the chi-square statistic:
chi-square = sum(observed-expected)^2/expected)
Taking this formula, we'll calculate the chi-square statistic for our pet data.
Chi-square statistic: calculation
Cats | Dogs | |
---|---|---|
Treats | (200-163.33)^2 / 163.33 = 8.23 | (290-326.67)^2 / 326.67 = 4.12 |
Toys | (400-436.67)^2 / 436.67 = 3.08 | (910-873.33)^2 / 873.33 = 1.54 |
And after summing up the value in each cell, we'll have the chi-square statistic: 8.23 + 3.08 + 4.12 + 1.54 = 16.97. 16.97 is our observed chi-square value. The final step is to compare the observed value we calculated to the critical chi-square value. The critical chi-square value depends on the degrees of freedom in your data set and determines if your results are statistically significant.
In our above data set, for one degree of freedom and an alpha level of 0.05, we can use this table to look up the critical chi-square value as 3.84. Our calculated chi-square of 16.79 is greater than 3.84, so we can conclude our results are not due to chance. We can say that cats enjoy treats significantly more than dogs. (Remember this is manufactured data; your dog or cat may not fit into the above category)
Challenge
Now it's your turn to practice calculating a chi-square statistic. Using the above examples, create your contingency table using some data that interests you. You can search for a "contingency table" and see if some small example tables have data in a suitable format. Use the following steps to calculate your chi-square value:
- Put your data in a table format similar to the one shown above.
- Calculate the expected value for each cell in the table.
- Then calculate the chi-square value for each cell.
- Finally, add up all the chi-square values for the chi-square statistic.
- Use the table here to determine your critical chi-square.
- And finally, determine if your result is statistically significant.
Additional Resources
Objective 02 - Set Up a Chi-square Test for Independence on Two Categorical Variables
Overview
In the previous objective, we learned about the chi-square statistic. We worked out the chi-square value
by hand using a contingency table. For this next objective, we're going to use the magic of SciPy and
the scipy.stats
module to compute the chi-square statistic.
Follow Along
We'll look at our previous contingency table example so that we can compare our scipy.stats
results to our manual calculation.
Cats, Dogs, and Treats
Remember our contingency table from earlier in the module?
Contingency Table: Cats & Dogs
Cats | Dogs | Row Total | |
---|---|---|---|
Treats | 200 | 290 | 490 |
Toys | 400 | 910 | 1310 |
Column total | 600 | 1200 | 1800 |
Using these values, we calculated a chi-square statistic of 16.97. Next, we'll put these same values into the SciPy stats chi2_contingency function, which will perform a chi-square test of the independence of the variables in the given contingency table.
# Import the libraries
import numpy as np
from scipy.stats import chi2_contingency
# Create the table using as a NumPy array
table = np.array([[200, 290], [400, 910]])
# Print out the table to double-check
print('Contingency table: \n', table)
# Perform the chi-square test
stat, p, dof, expected = chi2_contingency(table, correction=False)
# Print out the stats in a nice format
print('Expected values: \n ', expected.round(2))
print(f'The chi square statistics is: {stat:.3f}')
print(f'The p value is: {p:.6f}')
Contingency table: [[200 290] [400 910]] Expected values: [[163.33 326.67] [436.67 873.33]] The chi square statistics is: 16.965 The p value is: 0.000038
Challenge
Using the above example as a guide, choose one of the example table data sets from this website and re-create it in Python. It would help if you tried to do the following for this table:
- Enter the table using NumPy arrays
- Please print it out to check that it's correct
- Use the chi2_contingency() function to calculate the chi-square statistic
- Compare your results to the ones listed on the website linked above (if there are results)
Additional Resources
Objective 03 - Use a Chi-square Test p-value to Draw the Correct Conclusion About the Null and Alternative Hypothesis
Overview
We've already covered a p-value and how we apply it to a null and alternative hypothesis. But let's go over a quick review.
When we perform a hypothesis test, we calculate a p-value. Using the significance level we decided on before performing our test, we then have enough information to either 1) reject or 2) fail to reject the null hypothesis.
- p-value < alpha: reject the null hypothesis
- p-value > alpha: fail to reject the null hypothesis
Example: Dice Roll
We can use a chi-square test on a collection of dice rolls to determine if the dice are fair or if the random number generator we are using is random (well, as far as we can detect).
Using dice roll statistics as our data set, we're going to work through the whole process of stating the null hypothesis, performing a chi-square test, deciding on the significance level, determining the p-value, and then making a decision on the null hypothesis.
Follow Along
We already know the expected value of each number when we roll a dice. For example, for a six-sided die, each number should occur 1/6 or about 16.67% of the time. But, we can estimate the expected frequency for each value by using a random number generator.
Let's decide on the null hypothesis and the significance level.
Null Hypothesis
For this situation, it would make sense to choose the null hypothesis to simply be: "the dice are fair".
Generated Dice Rolls
We used the random number generator in Python to simulate the dice rolling results. We "rolled" five dice, each a total of 50 times. Here are the results, along with the total for each value between 1-6
A | B | C | D | E | tot | |
---|---|---|---|---|---|---|
1 | 13 | 7 | 10 | 5 | 13 | 48 |
2 | 5 | 7 | 4 | 12 | 9 | 37 |
3 | 5 | 9 | 14 | 0 | 10 | 38 |
4 | 12 | 13 | 8 | 7 | 7 | 47 |
5 | 7 | 10 | 9 | 13 | 6 | 45 |
6 | 8 | 4 | 5 | 13 | 5 | 35 |
Each value should come up 1/6 of the time; the total number of rolls is 250, and 250/6=41.67. So we can see that the results are pretty close to that number for most of the values except for one (a little high) and six (a little low).
Let's put the data in NumPy arrays and run a chi-square test on them.
import numpy as np
# Create the array for each die value
a1 = [13, 7, 10, 5, 13]
a2 = [5, 7, 4, 12, 9]
a3 = [5, 9, 14, 0, 10]
a4 = [12, 13, 8, 7, 7]
a5 = [7, 10, 9, 13, 6]
a6 = [8, 4, 5, 13, 5]
# Combine them into a (6,5) array
dice = np.array([a1, a2, a3, a4, a5, a6])
# Import the stats module
from scipy.stats import chi2_contingency
# Perform the chi-square test
stat, p, dof, expected = chi2_contingency(dice, correction=False)
# Print out the stats in a nice format
print('Expected values: \n ', expected.round(2))
print('The degrees of freedom: ', dof)
print(f'The chi square statistics is: {stat:.3f}')
print(f'The p value is: {p:.6f}')
Expected values: [[9.6 9.6 9.6 9.6 9.6] [7.4 7.4 7.4 7.4 7.4] [7.6 7.6 7.6 7.6 7.6] [9.4 9.4 9.4 9.4 9.4] [9. 9. 9. 9. 9. ] [7. 7. 7. 7. 7. ]] The degrees of freedom: 20 The chi square statistics is: 40.375 The p value is: 0.004477
Interpret the result - computer generated
Now we need to use the Table: Chi-Square Probabilities and a significance level to interpret our result.
Let's choose an alpha level of 0.05. Our calculated chi-square of 40.375 is greater than 31.410. Our calculated p-value is 0.00447, which is less than 0.05. We reject our null hypothesis that the die is fair, and can conclude that the computer is using a "rigged" die.
Physical Dice
Let's look at the rolls from a random assortment of actual, physical dice. We set up the number of rolls and dice the same way as for the random number generator. Here are the results of rolling five dice 50 times each.
A | B | C | D | E | tot | |
---|---|---|---|---|---|---|
1 | 4 | 3 | 5 | 11 | 4 | 27 |
2 | 9 | 15 | 10 | 4 | 11 | 46 |
3 | 7 | 10 | 8 | 6 | 8 | 38 |
4 | 13 | 6 | 8 | 9 | 12 | 46 |
5 | 9 | 9 | 7 | 11 | 6 | 39 |
6 | 8 | 7 | 12 | 9 | 9 | 43 |
# Create the array for each die value
a1 = [4, 3, 5, 11, 4]
a2 = [9, 15, 10, 4, 11]
a3 = [7, 10, 8, 6, 8 ]
a4 = [13, 6, 8, 9, 12]
a5 = [9, 9, 7, 11, 6]
a6 = [8, 7, 12, 9, 9]
# Combine them into a (6,5) array
dice = np.array([a1, a2, a3, a4, a5, a6])
# Perform the chi-square test
stat, p, dof, expected = chi2_contingency(dice, correction=False)
# Print out the stats in a nice format
print('Expected values: \n ', expected.round(2))
print(f'The chi square statistics is: {stat:.3f}')
print(f'The p value is: {p:.6f}')
Expected values: [[5.4 5.4 5.4 5.4 5.4] [9.8 9.8 9.8 9.8 9.8] [7.8 7.8 7.8 7.8 7.8] [9.6 9.6 9.6 9.6 9.6] [8.4 8.4 8.4 8.4 8.4] [9. 9. 9. 9. 9. ]] The chi square statistics is: 21.989 The p value is: 0.341086
Interpret the result - human generated
Again, we'll use the table Table: Chi-Square Probabilities and a significance level to interpret our result.
For this trial, we'll use an alpha level of 0.05. Our calculated chi-square of 21.989 is less than 31.410. As with the example above, we can also use the calculated p-value. In this case, our p-value of 0.34 is greater than our alpha of 0.05, and we fail to reject the null hypothesis.
We can conclude that our results are what we would expect if the physical dice used were fair.
Both sets of tests could return different results based on the values used.
Challenge
You may take the opportunity to generate your own dice-rolling data and see how your results compare to the computer-generated ones. You can use fewer dice (and roll more than one at a time) to collect your sample. Once you have some data, construct a contingency table and calculate your chi-square statistic. Then compare your results using your preferred significance level. Are your dice fair?
Additional Resources
Guided Project
Open DC_122_Chi2_Tests.ipynb in the GitHub repository below to follow along with the guided project:
Guided Project Video
Module Assignment
Complete the Module 2 assignment to practice chi-square testing techniques you've learned.