Home
Introduction and     Objectives
Library Skills
Scientific Process
Statistical Analysis
   Introduction to        Statistics
      Value of Statistics
      Variables
      Parameters
      Statistic
      Populations vs.          Samples
      Distribution of          Populations from          Samples
      Mean, Mode and          Median
      Range, Variance          and Standard          Deviation
      Normal          Distributions
      Statistical          Inference
      Comparing Sample          Means
      Goodness of Fit          (Chi square)
      Goodness of Fit          (Two-way          Classification)
      Correlation and          Linear regression
   Handling Data in        the Sciences
Technical Writing     and Evaluation
Poster Board     Design and Use
Academic Integrity
Resources
Click Here to Go Home Statistical Analysis | Measurements of Relationships

Measurements of Relationships

1. Correlation Coefficients

Previous to this section we have been concerned with questions of differences. This section will look at relationships between variables.

Correlation means co-relation, or the degree those two variables "go together". Linear correlation means to go together in a straight line. The correlation coefficient is a number that summarizes the direction and degree (closeness) of linear relations between two variables. The correlation coefficient is also known as the Pearson Product-Moment Correlation Coefficient. The sample value is called r, and the population value is called (rho). The correlation coefficient can take values between -1 through 0 to +1. The sign (+ or -) of the correlation affects its interpretation. When the correlation is positive (r > 0), as the value of one variable increases, so does the other. For example, on average, as height in people increases, so does weight.

To get a pictorial view of the relationships between two variables a scattergram can be made. Scattergrams show the relationship of x and y data. See Figure 7


Figure 7a Example (a) positively correlated data


Figure 7b Example (b) negatively correlated data.

There may also be no correlation between variables at all, and the following diagram called a scattergram shows this


Figure 8 Scattergram showing no relationship between variables.

The formula for determining correlation coefficient is below and can be found on many calculators today.

x = x value of each point
ª = mean of the x's
y = y value of each point
åy = mean value of y's

Sample Problem

Given Point
x values
y values
A
-2
-1
B
2
0
C
3
2
D
5
3

Step 1: Determine the mean values of x and y

ª = 8/4 = 2 åy = 4/4 = 1

Step 2: Determine the following values

(x- )

-2-2 = 4
 2-2 = 0
 3-2 = 1
 5-2 = 3
(x- )²

16
0
1
9
(y- y)

-1 -1 = 2
 0 -1 = 1
 2 -1 = 1
 3 -1 = 2
(y- y)²

4
1
1
4

26 10

Step 3: Determine the following values

(x- ) (y- y)
4
0
1
3
x
x
x
x
2
1
1
2
=8
=0
=1
=6
15

Step 4: Determine r using the formula above.

R = 0.93

Using the formula above r =_0.93. Remember that when two events are perfectly correlated they are +1 or -1. In the case of +1, as one measurement increases so does the other one. In case of -1 as one measurement increases the other decreases. In actual experience, it is rare to find perfect positive or perfect negative relationships. In oue problem the correlation is both positive and strong .

To view another solved problem link to: http://trochim.human.cornell.edu/kb/statcorr.htm


For more on correlation coefficients link to;

http://www.tufts.edu/~gdallal/corr.htm

http://web.indstate.edu/nurs/mary/N322/pearsonr.html

http://www.psychstat.smsu.edu/introbook/sbk29m.htm

http://www.psychstat.smsu.edu/introbook/sbk17m.htm

2. Linear regression

The most common relationship found in research is of a linear relationship. In experimentation it is often found that a high independent variable value is associated with a high dependent variable value and that a low independent variable value is associated with a low dependent variable value. The relationship may also be linear but inversely related (a high value on one axis will correspond to a low value on the other axis. In still other causes there may be no relationship between the two values.

A straight line can represent a true linear relationship if the data are plotted on a graph. Due to individual differences in the experimental units I animals, water samples) or due to error, the tabulated results may reflect a perfectly straight line. Therefore, it becomes necessary to calculate the best-fit line for the representing the data.

Mathematically, it is found that the extreme values tend to move or regress gradually toward the mean value for that group and do not tend to become more and more extreme. The measurement of this trend to regress toward the mean is termed linear regression.

Two formulas are combined to produce the formula for the best line fit. The First formula calculates the slop value of the line and the second one includes the slop value with they-intercept. On the third line is listed the formula in the most common form:

                                                N(· xy) – (·x)(·y)

                                    b =       N ·x2 – (·x)2

                                    a = åy - bª

                                    åY = bª + a

Please consult statistics textbooks, or statisticians for other statistical tools that may fit your assumptions, hypothesis and type of variables. The following may help you decide on the statistical test for your data analysis.

Table 1. A schematic key to determining the type of statically test to use for data analysis.

Home - Intro - Lib. Skills - Sci. Process - Stat. Analysis
Tech. Writing - Poster Board - Resources
© 2002 Biological Science Institute
All Rights Reserved.
This website is optimized for
 Internet Explorer