Home
Introduction and     Objectives
Library Skills
Scientific Process
Statistical Analysis
   Introduction to        Statistics
      Value of Statistics
      Variables
      Parameters
      Statistic
      Populations vs.          Samples
      Distribution of          Populations from          Samples
      Mean, Mode and          Median
      Range, Variance          and Standard          Deviation
      Normal          Distributions
      Statistical          Inference
      Comparing Sample          Means
      Goodness of Fit          (Chi square)
      Goodness of Fit          (Two-way          Classification)
      Correlation and          Linear regression
   Handling Data in        the Sciences
Technical Writing     and Evaluation
Poster Board     Design and Use
Academic Integrity
Resources
Click Here to Go Home Statistical Analysis | Goodness of Fit

Goodness of Fit

There are many problems in which we are interested in counting the number of individuals that fall into specified categories, and in comparing this observed distribution to some theoretically expected distribution. What the theoretical distribution is, depends on the problem at hand and the science of the situation. For example, we may wish to know if out of 140 tosses of a coin, the observed frequency of 60 heads and 80 tails is significantly different from equal 1:1 ratio that we would expect from a true coin. In a different situation, the theoretical distribution might be based on something such as Mendel's laws. We may want to know whether our observed values of 1,1981 fruit flies with white eyes and 7,712 fruit flies with red eyes differ so much from the values predicted by Mendelian genetics that we can publish a new theory of genetics. These kinds of problems represent the first two cases of the chi-square distribution to be discussed here.

Single Classification: Theoretical distribution specified in advance.

Suppose we have k categories and a random sample of n observations such that each observation value must fall into one and only one category of K categories. We then count the observed frequency (number of cases) in each category and denote these 01, 02, 03,…., 0k, where the total number 0's = n.

We compare this to the theoretical or expected frequency (number of cases) in each category, denote by E1, E2, E3, …., Ek, where the sum of all E's is n. We use the following statistic to test the goodness of ft of our observed distribution to the expected.

                                                         k

                                                X2 = ·  (01 –E1) 2

                                                        i=1          E1

We then compare our calculated x2 value to the test value tabulated in a chi-square distribution table provided. To use the table, you must know the degrees of freedom (df), which is equal to k-1 and decide on the significance level in rejecting or accepting the null hypothesis that the distribution are the same. The decision rule will be to reject the null hypothesis if the calculated value is greater than the value listed in the table for the appropriate degrees of freedom and significance level chosen. A chi-square table can be found at:

Chi-square Table: probabilities
http://www.richland.cc.il.us/james/lecture/m170/tbl-chi.html/

Table of Chi-square statistics
http://www.ento.vt.edu/~sharov/PopEcol/tables/chisq.html

Please note that for this test to be valid, the expected number in each category should be greater than 4. If this is not true, you must increase the sample size n or redefine the categories to make the expected values) not the observed values) in each category greater than 4.

Example 1: A wildlife biologist wants to know if there are equal numbers of male and female fawns in a certain deer population. If there is no differential mortality a 50:50 sex ratio would be expected. A random sample of 1000 fawns from this population contained 800 females and 200 males.


Females
Males
Total
O = Observed
800
200
1000
E = Expected
500
500
1000
O-E
300
-300
(O-E)²/E
180
180
360 = x2 value

The degrees of freedom for this example are 2-1 = 1. Examination of the x² table shows that 360 is greater than the 5% value of 3.84. We therefore reject the null hypothesis that the sex ration is 50:50. We can be 95% certain that there are more female than male fawns in this population.

Example 2: Mendelian genetics predicts the proportion of seed types in a particular breeding experiment should be 9:3:3:1. We do the experiment and obtain the following distribution of seed types:


Round and Yellow
Wrinkled and Yellow
Round and Green
Wrinkled and Green
Total Value
O = Observed
313
101
108
32
556
E = Expected
312.75
104.25
104.25
34.75
556
O-E
2.25
-3.25
3.75
-2.75
(O-E)²/E
0.016
0.101
0.135
0.218
0.440 = x²

The expected values are obtained by taking 9/16 of the total number as the expected number of round and yellow, 3/16 of the total as the expected frequency of wrinkled and yellow and round and green; 1/16 of the total for wrinkled and green. The degrees of freedom = 4-1 or 3. Examination of a chi square table for 3 degrees of shows that our value (0.470) is less than the 5% value (7.81). We therefore, accept the null hypothesis that our observed distribution matches the predicted distribution.

For other examples of chi-square problems or more on Goodness if fit, link to: http://www.physics.csbsju.edu/stats/chi-square.html
http://www.kursus.kvl.dk/shares/vetgen/_Popgen/genetik/applets/ki.htm
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
http://www.ruf.rice.edu/~bioslabs/tools/stats/chisquare.html

Home - Intro - Lib. Skills - Sci. Process - Stat. Analysis
Tech. Writing - Poster Board - Resources
© 2002 Biological Science Institute
All Rights Reserved.
This website is optimized for
 Internet Explorer