As we saw in Statistics Defined, inferential statistics may be defined as the science of **data interpretation**. Data interpretation involves drawing one or more conclusions about a population underlying a data set accompanied by statements about the reliability of such conclusions.

The process begins with a claim about some numerical feature of a population (a population parameter), based on data collected from that population. One might suggest, for instance, that the standard deviations of two data sets arise from separate populations. Such a claim is called a statistical **hypothesis**.

A hypothesis may or may not be supported as a result of a statistical test. A **statistical test** is a mathematical procedure that allows one to determine, to some specified degree of confidence, whether or not the results of a particular study support a hypothesis.

For example, an educational psychologist might use a t-test to determine the effect of a new study technique on the learning of a grade 12 math course. Specifically, he might claim on the basis of data collected from students using the new technique, that the math learning of these students was not significantly improved. Because of its negative formulation, the claim being tested here is called the **null hypothesis, H _{0}**.

The **alternative hypothesis, H _{a}**, is the claim to be accepted if the null hypothesis is rejected. In the example being considered, H

_{a}would be that the math learning of the grade 12 students was significantly improved. It is possible to reject a null hypothesis when in fact it best describes the population being studied: this is called a

**type 1 error**. The

**significance level, α,**of a statistical test is the probability of committing a type 1 error, and is usually set at .05 or .01.

Statistical tests may be divided into small and large sample classes. A **small sample test** involves samples of size n less than 30. Large sample tests, covered in Large Sample Inferences, assume that n is greater than or equal to 30. Here we introduce a small sample test related to the chi-square distribution. Small sample tests involving the t- and F-distributions are covered in elsewhere.

An important application of the chi-square distribution is the **chi-squared test**. In its simplest form, this test is used to assess the difference between observed (o) and expected (e) event frequencies when a multinomial experiment is executed.

For example, if a fair coin is tossed 20 times, we would expect to observe a “head”(H) 10 times because the theoretical probability is P(H) = 1/2 = 10/20. In practice, we would often see 7H, 8H, 9H, 11H, 12H or 13H, rather than exactly 10, due to chance variation. Thus the observed frequency of a chance event will very often differ from a single expected frequency.

In relation to the discrepancy between observed and expected event frequencies, it is one of the **tasks of statistical inference** to ask and answer the question: “is the difference significant?” In the above coin experiment, for instance, we would only occasionally see zero or 20 “heads,” since these frequencies represent significant differences from the predictions of theory.

The observations in our coin-tossing experiment may be classified in exactly two ways; “heads” or “tails.” In such situations the binomial and/or normal distributions may be used to assess the significance of deviations from expected frequencies.

Often, however, the results of a chance experiment fall into more than two classes. Such experiments are called multi-nomial. In tossing a fair, 4-sided die, for instance, one can observe a “1,” “2,” “3” or “4.”

If this die is tossed 100 times we would expect, on the basis of theoretical probability, exactly 25 **1’s**, 25 **2’s**, 25 **3’s** and 25 4’s.

Suppose we actually observe 19 1’s, 29 2’s, 30 3’s and 22 4’s. The student of inferential statistics may then be asked; “do these frequencies represent a significant difference from the predictions of theory.” Here, because there are more than two outcome classes, we cannot use the binomial and/or normal distribution to test for significance.

Instead, we use the chi-squared test. The chi-squared test involves calculation of the **chi-squared statistic**, X^{2} , a quantity which depends on the observed (o_{i}) and expected (e_{i}) frequencies of the population classes being studied. Specifically,

As described in “Testing Categorical Data,” a large value for this chi-squared statistic suggests the observed data comes from a distribution other than that assumed under the null hypothesis.