What If Your Data Isn’t Playing by the Rules?
You run a chi-square goodness of fit test. In real terms, the p-value is 0. 03. Now, you reject the null hypothesis. Great—your data doesn’t fit the expected distribution. But how doesn’t it fit? That’s where most people stop. Day to day, they see “significant” and move on. But if you’re only checking whether something is “off” without asking in what way it’s off, you’re missing the whole point. The alternative hypothesis isn’t just a rejection of the null—it’s a specific claim about how the observed frequencies differ from what you expected. And getting that right changes everything Nothing fancy..
What Is the Alternative Hypothesis in a Goodness of Fit Test?
Let’s back up. In practice, ” The null hypothesis (H₀) says yes, they match. A goodness of fit test asks: “Do the observed counts in my categories match the counts I expected under a certain theory?The alternative hypothesis (Hₐ) says no, they don’t That's the part that actually makes a difference. No workaround needed..
But here’s the catch: “No, they don’t” is not a single idea. The alternative hypothesis is really a statement that at least one category’s observed frequency differs from its expected frequency. It doesn’t tell you which one, or by how much, or in which direction. It’s a whole category of possibilities. That’s by design—the test is global, not specific.
Worth pausing on this one Easy to understand, harder to ignore..
In practice, though, researchers often have a directional guess. And when you have that kind of specific expectation, you’re dealing with a composite alternative hypothesis. Or a plant’s flower color isn’t following Mendelian ratios—maybe purple is underrepresented. In real terms, maybe you suspect a die is loaded toward sixes. Now, the standard chi-square test doesn’t care about direction; it just flags any deviation. But if you want to dig deeper, you need to think beyond the basic “not equal” alternative Less friction, more output..
The Two Flavors: Two-Sided vs. One-Sided Alternatives
Most goodness of fit tests are two-sided by default. If you’re only looking at one proportion (like in a binomial test), then one-sided makes sense. ” That’s a one-sided test. But you can set up a one-sided alternative if you have a strong prior reason to expect a specific kind of deviation. In a categorical goodness of fit context, though, one-sided alternatives are rare because you’re usually comparing multiple categories. The alternative is simply that the distribution differs. Even so, for example: “The proportion of defective items is greater than 5%. For multi-category tests, the two-sided version is standard.
This changes depending on context. Keep that in mind.
Why It Matters / Why People Care
Why does this nuance matter? Because the alternative hypothesis shapes your interpretation. Even so, if you run a test and get a significant result, you know something is off. But without a clear alternative, you’re left guessing. In real terms, is category A too high? Day to day, is category B too low? Are several categories shifted?
This matters in real research. Imagine you’re testing if a new teaching method changes the distribution of student grades (A, B, C, D, F) compared to the historical average. And a significant chi-square tells you the distribution isn’t the same. But your alternative hypothesis—if you thought it through—might be: “The new method increases the proportion of A’s and decreases the proportion of F’s.” That’s actionable. Without that clarity, you just know “it’s different,” which isn’t as useful.
Also, the alternative influences your choice of follow-up tests. If you have a specific deviation in mind, you might use standardized residuals to see which categories contribute most to the chi-square statistic. That’s how you move from “it’s significant” to “here’s what’s driving it And that's really what it comes down to. Less friction, more output..
How It Works (or How to Do It)
Here’s the practical side. On the flip side, you calculate the chi-square statistic: Σ[(Observed - Expected)² / Expected]. You collect observed counts. You start with expected proportions—maybe from theory, a previous study, or a null model. You compare it to a critical value from the chi-square distribution with (k-1) degrees of freedom, where k is the number of categories.
But the alternative hypothesis is baked into those expected values. If your Hₐ is “the distribution is not 9:3:3:1,” then your expected proportions come from Mendelian genetics. On top of that, if your Hₐ is “the distribution is not 1:1:1,” then your expected proportions are 1/3 for each category. The test doesn’t care how they differ—only that they differ But it adds up..
Step-by-Step: Setting Up the Test
- Define your null proportions clearly. Write them down. “Under H₀, 25% of flowers are red, 50% pink, 25% white.”
- State your alternative in words. “Hₐ: The observed distribution of flower colors differs from the expected 1:2:1 ratio.”
- Check assumptions. Expected counts should generally be 5 or more per category. If not, combine categories or use an exact test.
- Run the test. Use software or a calculator. Get the chi-square value and p-value.
- Interpret with your alternative in mind. If p < α, you reject H₀. But then ask: “Which categories differ?” Look at residuals.
Using Residuals to Understand the Alternative
After a significant test, examine the standardized residuals: (O - E) / √E. Values greater than 2 or less than -2 suggest that category’s deviation is larger than chance alone would explain. This helps you translate the vague “not equal” alternative into specific findings. Practically speaking, maybe the residual for “red” is +2. 5—meaning you have more red flowers than expected. That’s a concrete insight from your test.
Common Mistakes / What Most People Get Wrong
The biggest mistake? Treating the alternative as an afterthought. People run the test, see it’s significant, and stop. But the alternative hypothesis is the reason you ran the test. If you don’t have a clear idea of how the data might deviate, why are you testing?
Another error: ignoring small expected counts. You might get a significant result that’s an artifact of the approximation, not a real deviation. On top of that, if one category has an expected count of 2, the chi-square approximation is shaky. Always check expected counts.
Also, people sometimes think a non-significant result “proves”
the null hypothesis. It doesn't. A non-significant result simply means your data were not inconsistent with the expected proportions at your chosen significance level. The null might still be wrong; you just didn't have enough evidence to say so. This is especially important in small samples, where the test has low power.
A third frequent error is testing the wrong alternative altogether. Even so, if you set your expected proportions to 1:1:1 but your real scientific question is whether a 1:2:1 ratio fits, then you've answered a question nobody was asking. The test will give you a result, but it won't be the result that matters for your research.
When the Chi-Square Test Isn't Enough
There are situations where the goodness-of-fit chi-square test falls short. If your categories are ordered—for example, small, medium, large—you may want a test that accounts for the direction of deviation, not just the magnitude. The chi-square test treats all departures from the null equally, whether you have too many small and too many large or just an excess of medium. An alternative like the Kolmogorov–Smirnov test or a likelihood-ratio test with ordered categories may be more appropriate.
Similarly, if you have repeated measurements on the same subjects, your observations are not independent. Plus, the standard chi-square goodness-of-fit test assumes independence, and violating that assumption can inflate your Type I error rate. In those cases, you need a different framework—perhaps a generalized linear mixed model—that can handle the dependence structure.
Finally, when sample sizes are very small, an exact test—such as Fisher's exact test adapted for goodness of fit—gives more reliable p-values than the chi-square approximation.
Wrapping Up
The chi-square goodness-of-fit test is one of the most accessible tools in the statistician's toolkit, but its simplicity can be deceptive. That said, the test itself is mechanical: compute a statistic, look up a p-value, make a decision. But the decisions that matter—what your null proportions should be, what alternative you're actually testing, whether your data meet the assumptions—require careful thought before any calculation begins.
The alternative hypothesis is not a box to check on a worksheet. Think about it: it is the question your test exists to answer. If you state it clearly, choose expected proportions that reflect the model you want to challenge, check your assumptions, and interpret residuals when the result is significant, the chi-square goodness-of-fit test becomes a powerful and transparent way to ask whether reality matches your expectations. Done well, it connects a simple numerical procedure to a substantive scientific claim—and that is what good statistics is supposed to do Small thing, real impact..