How close are you: Chi-Square

From Stepping Up

(Redirected from Chi-Square)
Jump to: navigation, search
How close are you: Chi-Square

Hidden

A chi square (X2) statistics is used to investigate whether distributions of categorical data differ from one another. Basically, categorical variables provide data in categories. Responses to questions like "what is your favorite subject" or "do you own a car" are categorical, because yield answers such as "math" or "no". In contrast, responses to questions like "what is your weight" or "what is your GPA" is numerical.

The Chi Square statistic compares the tallies/counts of categorical responses between two (or more) independent groups. Chi square test is only used on actual numbers, NOT on percentages, means, etc.

2 by 2 Contingency Table

There are several types of chi square tests depending on the way the data was collected and the hypothesis being tested. The first example is a 2 by 2 contingency table (see below). The cell is notated with a, b, c and d.

Data Type 1 Data Type 2 Total
Category 1 a b a + b
Category 2 c d c + d
Total a + c b + d a + b + c + d = N

For a 2 x 2 contingency table the Chi Square statistic is calculated by the formula: Image:chisquare.gif

Here's an example of when to use this type of Chi-Square. Suppose we are looking at infection rates in a group of animals in Ontario and Quebec. You first come up with a null hypothesis and an hypothesis. It might look something like this:

Null-hypothesis: There is no difference in infection rates between Ontario and Quebec.

Alternate Hypothesis: The infection rates are associated with the province. Animals in "Province X" have a higher infection rate than in "Province Y"

You collect the following data on the number of animals that survive the treatment.

Infected Not Infected Total
Ontario 36 14 50
Quebec 30 25 55
Total 66 39 105

Applying the formula above we get:

Chi square = 105[(36)(25) - (14)(30)]2 / (50)(55)(39)(66) = 3.418

Before we interpret what this means, we need to know how many degrees of freedom we have. In this case, the degrees of freedom equal (# of columns minus one) x (number of rows minus one), not counting the totals. From our data this gives 1 x 1 = 1.

We now have our chi square statistic (X2 = 3.418) and our degrees of freedom (df = 1). We also know that the conventionally accepted significance level is 0.05.

Now we need to consult a Chi square distribution table! This can be found at the back of any statistics textbook. We look along the row that has "1 degree of freedom" on the Chi square distribution table. We find our value of X2 (3.418) lies between 2.706 and 3.841. The corresponding probability (look at columns) is 0.10<P<0.5 (between 0.1 and 0.5.

This is MUCH higher than the conventionally accepted significance level of 0.05. Therefore, the null hypothesis is verified. The two distributions are the same. There is no difference in infection rates between Ontario and Quebec for the group of animals.

On the other hand, IF the computed X2 statistics exceeds the critical value in the table for an 0.05 probability level, we can reject the null hypothesis. In this case, since our X2 statistic (3.418) did not exceed the critical value for 0.05 probability level (3.841) we can accept the null hypothesis that the infection rates is independent of location.

Level 2: Chi-Square Test of Independence

This is for a contingency table that has r rows and c columns. It tests for independence between groups.

The null hypothesis is: The Two categorical variables are not related (independent). Alternate hypothesis: the Two categorical variables are related.

The Chi Square equation = sum of all the (observed - expected)^2 / (expected)

Here is the general table:

Category 1 Category 2 Category 3 Row Totals
Sample A a b c a+b+c
Sample B d e f d+e+f
Sample C g h i g+h+i
Column Totals a+d+g b+e+h c+f+i a+b+c+d+e+f+g+h+i=N

Now we need to calculate the expected values for each cell in the table. the expected value for the cell is the row total times the column total divided by the grand total (N). For example, for cell a the expected value would be (a+b+c)(a+d+g)/N

Calculate the expected values for each cell.

Then fill in the values for the following table:

Observed Expected Observed (O) - Expected (E) (O - E)^2 (O - E)^2 / E

Compute the Chi Square Statistics by adding all of your (observed - expected)^2 / (expected) values together.

Then determine your Degrees of Freedom = (# columns - 1)*(# rows - 1)

Then follow the same steps for the 2x2 contingency table to determine if your X2 value is greater than the value for alpha = 0.05. If it is, reject your null hypothesis. The data might tell you that there is a relationship between your categories and samples but that's all it says. It just gives an indication of if the variables are associated.

This article was written by:

Aaron Hakim

Personal tools