The chi-square test (χ²) is a non-parametric statistical test used to analyze categorical data. It helps determine whether the observed frequencies of a variable (e.g., the number of people in different categories) differ significantly from what would be expected based on a specific hypothesis.
Here’s a breakdown of the chi-square test:
Remember: Statistical software can automate most of these calculations, making the chi-square test easier to perform.
Scenario: We want to test if the color preference of children is independent of their gender. We surveyed a group of 100 children and asked them their favorite color (red, blue, or green) and their gender (boy or girl).
Step 1: Create a Contingency Table:
Color | Boy | Girl | Total |
---|---|---|---|
Red | 20 | 15 | 35 |
Blue | 25 | 20 | 45 |
Green | 10 | 25 | 35 |
Total | 55 | 60 | 100 |
Step 2: Calculate Expected Frequencies:
The expected frequency for each cell is calculated by multiplying the row and column totals and dividing by the grand total (100).
Color | Boy | Girl | Total |
---|---|---|---|
Red | (55 * 35) / 100 = 19.25 | (60 * 35) / 100 = 15.75 | 35 |
Blue | (55 * 45) / 100 = 24.75 | (60 * 45) / 100 = 20.25 | 45 |
Green | (55 * 35) / 100 = 19.25 | (60 * 35) / 100 = 15.75 | 35 |
Total | 55 | 60 | 100 |
Step 3: Calculate Chi-Square Statistic:
For each cell, subtract the observed frequency (Oi) from the expected frequency (Ei), square the difference, and then divide by the expected frequency. Finally, sum all these values to get the chi-square statistic (χ²).
Color | Boy | Girl | (Oi – Ei)² | (Oi – Ei)² / Ei |
---|---|---|---|---|
Red | 20 – 19.25 | 15 – 15.75 | 0.0625 | 0.0032 |
Blue | 25 – 24.75 | 20 – 20.25 | 0.0625 | 0.0025 |
Green | 10 – 19.25 | 25 – 15.75 | 84.64 | 4.36 |
Total | 4.6877 |
Step 4: Determine Degrees of Freedom:
df = (rows – 1) * (columns – 1) = (3 – 1) * (2 – 1) = 2
Step 5: Find the Critical Chi-Square Value:
Using a chi-square distribution table with 2 degrees of freedom and a significance level of 0.05 (α = 0.05), the critical chi-square value is 5.991.
Step 6: Compare Chi-Square Statistic to Critical Value:
Our calculated chi-square statistic (4.6877) is less than the critical value (5.991).
Step 7: Interpret the Results:
Since the chi-square statistic is less than the critical value, we fail to reject the null hypothesis (H₀). This means there is not enough evidence to conclude that the color preference of children is dependent on their gender.
Note: This is a simplified example. In real-world scenarios, you would use statistical software to perform the calculations and obtain a p-value to assess the statistical significance of the results.
We hope you found the information helpful! If you learned something valuable, consider sharing it with your friends, family, and social networks.
Also Read:
Hi, I am Vishal Jaiswal, I have about a decade of experience of working in MNCs like Genpact, Savista, Ingenious. Currently i am working in EXL as a senior quality analyst. Using my writing skills i want to share the experience i have gained and help as many as i can.
Welcome to Day 13 of Learning Python for Data Science! Today, we’re focusing on three…
Test your understanding of Python Data Structure, which we learned in our previous lesson of…
Welcome to Day 12 of Learning Python for Data Science. Today, we’ll dive into Pandas,…
NumPy Array in Python is a powerful library for numerical computing in Python. It provides…
Welcome to Day 9 of Learning Python for Data Science. Today we will explore comprehensions,…
Test your understanding of Python Data Structure, which we learned in our previous lesson of…