Example of Goodness of Fit Test: What You Need to Know
You're staring at a dataset. Which means you've collected hundreds — maybe thousands — of data points, and now you want to know if they follow a normal distribution, a Poisson distribution, or something else entirely. How do you actually figure that out?
That's where a goodness of fit test comes in. It's the statistical tool that answers the question: "Does my data match what I'd expect if it came from a specific theoretical distribution?"
Here's the thing — this isn't just an academic exercise. In real terms, whether you're a quality control engineer checking if defect rates are random, a researcher testing if survey responses follow expected patterns, or a data scientist validating assumptions before building a model, goodness of fit tests show up everywhere. They're one of those foundational concepts that once you understand, you start seeing applications everywhere Simple, but easy to overlook..
What Is a Goodness of Fit Test?
At its core, a goodness of fit test is a statistical procedure that compares your observed data to what you'd expect to see if the data followed a specific probability distribution. You're essentially asking: "How well does my data fit this theoretical model?"
Counterintuitive, but true.
The idea is pretty intuitive once you break it down. Day to day, say you expect a fair six-sided die to land on each number about 16. 7% of the time. You roll it 600 times and record the results. That said, if you see each number appear roughly 100 times, the data fits your expectation well. If you see "6" come up 200 times while "1" only appears 50 times, something's off — and a goodness of fit test quantifies exactly how off And that's really what it comes down to..
The Chi-Square Test: The Most Common Example
When people talk about goodness of fit tests, they're usually talking about the chi-square test first. It's the go-to example because it's relatively straightforward and works with categorical data.
Here's how it works in practice. But you have your observed frequencies (what actually happened) and your expected frequencies (what you predicted). You calculate the difference between them, square those differences to get rid of negatives, divide by the expected value, and sum everything up. That sum becomes your chi-square statistic Easy to understand, harder to ignore. That's the whole idea..
Then you compare that statistic to a critical value based on your chosen significance level and degrees of freedom. So if your calculated chi-square is larger than the critical value, you reject the null hypothesis — your data doesn't fit the expected distribution. If it's smaller, you don't have enough evidence to reject the fit Less friction, more output..
The Kolmogorov-Smirnov Test
The chi-square test works great for categorical or binned data, but what if you have continuous data and want to test if it follows a specific distribution?
That's where the Kolmogorov-Smirnov (K-S) test shines. Instead of binning your data, it compares the cumulative distribution function of your sample to the theoretical cumulative distribution you're testing against. The test statistic is the maximum vertical distance between these two curves It's one of those things that adds up..
One thing worth knowing: the K-S test is sensitive to differences in the location and shape of the distributions, but it works best when you estimate distribution parameters from the data itself rather than using predetermined values.
The Shapiro-Wilk Test
If you're specifically testing for normality — and let's be honest, that's the most common use case — the Shapiro-Wilk test is often more powerful than the K-S test. It's designed specifically to detect departures from normality and tends to be better at catching the kinds of deviations that matter in practice.
The test works by comparing the data to what you'd expect from a normally distributed sample of the same size. It calculates a W statistic, and smaller values indicate greater departure from normality Simple, but easy to overlook..
Why Goodness of Fit Tests Matter
Here's why you should care: most statistical methods assume something about your data's distribution. Regression analysis, t-tests, ANOVA — they all have assumptions. If those assumptions are violated, your results might be meaningless Worth keeping that in mind. Practical, not theoretical..
Real talk: I've seen researchers run sophisticated analyses on data that clearly didn't meet the underlying assumptions, then report results with way more confidence than they deserved. Goodness of fit tests are your safety check. They tell you whether the statistical tools you're using are even appropriate And it works..
But it's not just about validation. And understanding how well your data fits a distribution can reveal insights about the underlying process. If your wait times don't fit an exponential distribution, that tells you something about how the system works. If your defect counts don't follow a Poisson distribution, there's probably a pattern or dependency you need to investigate Simple, but easy to overlook..
In quality control, goodness of fit tests help determine whether a process is stable. Now, in finance, they check if returns actually follow the normal distribution that many models assume (spoiler: they often don't). In healthcare, they verify if patient arrivals follow expected patterns for staffing purposes.
How Goodness of Fit Tests Work: A Step-by-Step Example
Let's walk through a concrete example so you can see how this plays out in practice.
Suppose you run a call center and want to know if customer calls are evenly distributed across the five business days of the week. You collect data for several months and find:
- Monday: 245 calls
- Tuesday: 210 calls
- Wednesday: 235 calls
- Thursday: 190 calls
- Friday: 120 calls
Total: 1,000 calls
If calls were evenly distributed, you'd expect 200 calls per day (1,000 ÷ 5). That's your expected frequency for each day Not complicated — just consistent..
Now you calculate the chi-square statistic:
For Monday: (245 - 200)² / 200 = 45² / 200 = 10.125 For Tuesday: (210 - 200)² / 200 = 10² / 200 = 0.In real terms, 5 For Wednesday: (235 - 200)² / 200 = 35² / 200 = 6. 125 For Thursday: (190 - 200)² / 200 = 10² / 200 = 0 And that's really what it comes down to..
Sum: 10.Worth adding: 125 + 0. 125 + 0.5 + 6.5 + 32 = 49.
Your chi-square statistic is 49.25 Small thing, real impact..
Now you need to compare this to a critical value. Degrees of freedom = categories - 1 = 5 - 1 = 4. Think about it: at a significance level of 0. Which means 05, the critical value from the chi-square distribution table is about 9. 49.
Since 49.The calls are not evenly distributed across the week. 49, you reject the null hypothesis. 25 > 9.In practice, looking at the data, Friday stands out dramatically — significantly fewer calls than expected. This might tell you something useful about customer behavior or give you data to justify different staffing levels.
Testing Normality: A Continuous Example
Now let's look at a different scenario. You have a dataset of 50 measurements and want to test if they follow a normal distribution. This is super common in practice Surprisingly effective..
You could use the Shapiro-Wilk test. Most statistical software will run this for you and give you a p-value Simple, but easy to overlook..
If the p-value is greater than your significance level (usually 0.05), you fail to reject the null hypothesis — meaning there's no evidence your data deviates from normality. You can proceed with methods that assume normality.
If the p-value is less than 0.But this doesn't mean your data is "bad" — it just means you need to use different statistical methods. Because of that, 05, you've got evidence against normality. Maybe you transform your data, or maybe you switch to non-parametric tests that don't assume normality.
Common Mistakes People Make
One of the biggest mistakes is running goodness of fit tests without thinking about whether the test is appropriate for their situation. The chi-square test needs sufficiently large expected frequencies — generally, you want all expected frequencies to be at least 5. If you have sparse data, the test becomes unreliable Still holds up..
Another issue: people often test for normality on tiny samples. Think about it: here's the thing — with very small samples, you often lack the power to detect meaningful departures from normality even when they exist. And conversely, with very large samples, you'll almost always reject normality because the test becomes overly sensitive to trivial deviations. Context matters more than the p-value Worth keeping that in mind..
People also forget that these tests have assumptions themselves. Because of that, the K-S test, for example, assumes the distribution you're testing against is fully specified — parameters and all. If you're estimating parameters from the data, you need to account for that, or use a modified version of the test Turns out it matters..
And here's one that trips up a lot of people: statistical significance doesn't equal practical significance. A goodness of fit test might tell you that your data doesn't perfectly match a normal distribution — but if the deviation is small enough that it doesn't affect your analysis, who cares? The p-value tells you whether to keep looking, not necessarily whether there's a real problem And it works..
Practical Tips for Using Goodness of Fit Tests
Start by visualizing your data before running any formal test. Histograms, Q-Q plots, density overlays — these give you a sense of what's going on that numbers alone can't capture. A Q-Q plot is particularly useful for checking normality because it shows you exactly where the deviations occur.
Choose your test based on your data type and question. Categorical data? Chi-square. And continuous data testing against a specific distribution? K-S or Anderson-Darling. In practice, specifically testing normality? So shapiro-Wilk or Shapiro-Francia. Each test has strengths and weaknesses, so match the tool to the job.
Consider the practical consequences of the result. On top of that, if you're planning to use a statistical method that assumes normality, and your data fails a normality test, you have options: transform the data, use a solid method that doesn't assume normality, or bootstrap your approach. The test result is information, not a verdict.
Document what you're testing and why. 98, p = 0.34), supporting the assumption of normality for subsequent analysis."Data was tested for normality using the Shapiro-Wilk test (W = 0." That's the kind of sentence that makes reviewers and future-you happy.
Frequently Asked Questions
What's the simplest goodness of fit test to use?
The chi-square test is the most accessible and widely understood. Worth adding: it works for categorical data and is built into almost every statistical software package. For continuous data, the Shapiro-Wilk test for normality is straightforward and powerful Small thing, real impact..
What does a p-value mean in a goodness of fit test?
A small p-value (typically below 0.Which means 05) means your data provides evidence against the hypothesized distribution — the fit is poor. A large p-value means you don't have enough evidence to reject the null hypothesis that the data follows the expected distribution Easy to understand, harder to ignore..
Can I use goodness of fit tests on small samples?
You can, but be cautious. With small samples, tests often lack the power to detect real deviations. Worth adding: with very large samples, they'll detect tiny deviations that may not matter practically. Consider both the statistical result and the practical context.
What's the difference between chi-square and Kolmogorov-Smirnov?
Chi-square works with categorical or binned data and compares observed to expected frequencies in categories. K-S works with continuous data and compares cumulative distribution functions. K-S is generally more sensitive to differences in the shape of distributions.
Do I need to test for normality before running a t-test?
It's a good practice, especially with small to moderate samples. Many statistical software packages will check this for you and warn you if assumptions are violated. But remember: these tests aren't perfect, and some t-test alternatives are fairly solid to mild violations of normality.
The Bottom Line
Goodness of fit tests are one of those fundamental tools that show up constantly in real work. They bridge the gap between "I have data" and "I can make valid inferences from this data." The key is understanding not just how to run them, but when to use which test, how to interpret the results in context, and what to do when your data doesn't fit No workaround needed..
Start with visualization, choose your test wisely, and always remember that the goal isn't to pass a test — it's to understand your data well enough to make sound decisions.