Graphical Analysis
of Data
Finding functional relationships in physics
One of the major objectives
of an experimental science such as physics is to find functional relationships
between experimental variables. A functional relationship most often means
a mathematical equation that expresses an experimental variable in terms of
one or more other experimental variables. Some examples are:
,
,
,
,
,
, and
.
There are two major reasons
for looking for functional relationships between experimental variables in
a science such as physics: (1) to test the predictions of one or more theories
in order to decide which one Nature prefers, and (2) to provide clues that
will help in constructing theories about the way Nature works.
In testing a physical
hypothesis, data are gathered and graphed to see if these data are consistent
with the predictions of the hypothesis. Since the laws of physics seem to
be best expressed in mathematical form, this means that the experimenter must
try to extract the mathematical equation from the experimental data. If there
are particular numbers that must be determined (e.g., the gravitational constant,
G, or an energy level in an atom, etc.), the experimenter tries to
design the experiment in a way that will give the most exact answer possible.
If an experimenter is
working in an area of physics where the physical laws are unknown or uncertain,
it is often necessary to guess at mathematical relationships. Sometimes just
plotting the data on a graph will suggest a mathematical relationship, and
this might help to clarify what kind of a theory might explain the data.
When an equation relating experimental variables is obtained from experiment
alone and has no known basis in a theory, it is called an empirical
equation. These kinds of equation are also useful in engineering applications
provided they are not extended beyond the range of the data from which they
were derived. Such equations can be used to calibrate instruments. Sometimes
they are simply used for converting measurements in one kind of unit to values
in a different unit. A good example is a thermocouple curve. Thermocouples
produce a small voltage as they are heated. The voltage output of the thermocouple
can be converted to temperature by means of an empirical equation.
In the following sections,
we will discuss some techniques for extracting equations from the graphs of
experimental data. We will be interested primarily in Cartesian graphs, semi-log
graphs, and log-log graphs. Some other linearization techniques will
also be shown.
Cartesian Graphs
As you may have learned
from your mathematics courses, a Cartesian coordinate system consists of a
set of axes with uniformly spaced divisions. The independent variable
of an experiment is usually plotted as the horizontal coordinate, and the
dependent variable is usually plotted as the vertical coordinate. Each coordinate
pair locates a point on the graph, and the set of points from all of the data
will usually suggest some kind of “curve”. The simplest “curve” that might
be produced by a set of data is a straight line as shown in Figure 1.
The equation of a straight
line has the form
, where y is the name of the dependent variable, and x
the independent variable (Other letters, such as T for temperature
and V for volume, can also be used. The names given to the variables
depend on what is being measured.) This form of the equation of a straight
line is called the slope-intercept form. The slope, m, of the
graph tells how many units the graph rises or falls for each unit of horizontal
movement to the right. It is defined by
.
Whichever pair of points is chosen, the slope
is simply the change that occurs in the vertical direction divided
by the change that occurs in the horizontal direction as one moves
between the pair of points. These changes are represented by the dashed lines
in Figure 1.
The vertical intercept
is the point where the graph crosses the vertical axis, and it has the coordinates
. This is the circled point on the vertical axis in Figure 1. Thus
we see that we can “read” the equation from the graph by calculating its slope
and noting where the graph crosses the vertical axis.
Figure 2 shows an example
of an experiment in which the distance, d, a body moves is plotted
against time, t. Notice that not all of the plotted points lie on
the line drawn through them. This is due to experimental errors in the measurements.
The line is meant to be the best fit to the trend of the data points.
(What is meant by “best fit” is discussed later, but for now, the line is
what is presumed would occur if the measurements were perfect for each point.)
Since we are claiming that the line is the best representation of the
data, then we can pick two points on the line and use them to calculate
the slope. To help reduce the uncertainty in the calculation, it is best
to pick points far apart, and, if convenient, the vertical intercept can be
one of the points. Having calculated the slope, we then read off the intercept
and write down the equation relating the distance to the time. Let’s do this
for the data in Figure 2.
We can use the two points
that are circled. Notice that these lie on the line, and that it is
convenient to choose one of the points to be the y-intercept at (0,
-23). The other is at (95, 230) Thus the slope is
.
Then the equation relating distance to time is
.
Figure
1
Figure 1: Cartesian Plot of a Straight Line
Notes for Figure 1:
The circled point on the
vertical axis is the vertical intercept, b. The other two circled
points are used to calculate the slope, m. The dashed lines indicate
the rise (vertical change, Dy) and the run
(horizontal change, Dx).
If we call the left-hand point (x1, y1)
and the right-hand point (x2, y2), then
the slope is calculate by
.
Figure
2.
Figure 2: Cartesian Plot of Distance versus Time
Notes for Figure 2:
The y-intercept
is the circled point at the left (0, -23). This and the other circled point,
(95, 230), are used to calculate the slope, m, which is
Therefore, the equation relating d and
t for this distance-versus-time plot is
. The distance is in meters, and the time is in seconds.
In the example shown in
Figure 2, the line is sloping upward to the right, therefore the slope is
positive. Had the curve been sloping downward to the right, we would
have calculated a negative number for the slope, m.
Log-Log Plots
Many theories or experimental
situations present us with data that are related by a power law, which
is an equation of the form
where m and b are real numbers.
If we suspect we are dealing with such a relationship between the experimental
variables, we can use a linearizing technique that causes the data
to lie along a straight line when the points are plotted. Take the natural
logarithm of both sides of this power law equation. We get
or
.
Notice that this equation looks very similar to
the slope-intercept form of the equation for a straight line, except that
y has been replaced by ln(y), x by ln(x), and
b by ln(b). If we were instead to take the common logarithm
of both sides, we would get
.
Again we have an equation that looks like the
slope-intercept form of a straight line with y replaced by log(y),
x by log(x), and b by log(b). What this means
is that, if the variables are related by a power law, plotting the logarithms
(natural or common) of the variables instead of the variables themselves will
produce a straight line. Here are some examples of power law relationships.
Kepler’s third law says
that the square of the period of a satellite’s orbit is proportional to the
cube of the mean radius of the orbit, or
. G is the gravitational constant, and M is the mass
of the planet or star about which the satellite is orbiting. This can be
written in the power law form by taking the square root of both sides to get
. Thus we see that
and
.
The frequency of oscillation
(number of oscillations per second) of a mass on the end of a spring is given
by the equation
, where k is the spring constant, and M is the amount
of mass fastened to the end of the spring. If we are considering the period
and the mass to be the experimental variables, this can be rewritten as
. Thus
, and
.
Newton’s law of gravitation
states that the gravitational force between two spherical masses, M,
and m, is proportional to the product of the masses and inversely proportional
to the square of the distance, r, between their centers. Stated mathematically,
, where we see that
, and
. A similar force law holds for two electrical charges.
Log-log plots have a set
of horizontal and vertical axes with spacings proportional to the logarithm
of the numbers on both axes (see Figure 3). However, instead of writing
the logarithms of the numbers along the axes, the numbers themselves are written.
In addition to making the graphs easier to read, this means that simply placing
points on the graph at the locations of their numbered coordinates actually
locates them at the logarithms of their coordinates. This is equivalent to
taking the logarithms of the numbers and plotting them on a Cartesian graph.
Notice in Figure 3 that
the distances between powers of ten are uniform, but the spacings in between
these get closer together. Thus the first division mark after the 1 is 2,
the next is 3, etc. Similarly the first division mark after the after the
10 is 20, the next 30, etc., and after 0.01 are 0.02 and 0.03, etc.. If a
straight line is produced by plotting data on a log-log graph, then the data
are related by a power law. The coefficient, b, in
is simply read off the graph on the vertical line passing through
x = 1 (note, ln(1) = 0), because the number instead of its logarithm
is actually displayed.
The slope, m, is
still the rise divided by the run, as in Cartesian plots, but the distances
in both the vertical and horizontal directions are logarithms of the numbers
plotted. Because the numbers themselves are printed along the axes, we read
the numbers and take their logarithms before using them to calculate the slope.
Therefore
.
The graphs shown in Figures
3 and 4 are data from an experiment measuring the frequency of oscillation
of a mass on a spring for various masses. The Cartesian plot is highly non-linear,
however, when the data are plotted on a log-log graph, a straight line is
suggested. Taking the two circled points at the ends of the line through
the data points, we can estimate the slope as
.
The value of b is taken from the circled
point where the line crosses M = 1 (LnM = 0). From the graph
we read b = 11. Therefore the equation for this line is
. Compare this with the theoretical equation for the frequency given
by
. We see that the power of M is very close to what the theory
predicts, and we can solve for k from the fact that
. To determine whether or not the data are consistent with the theory,
we would need to determine the range of values that the slope m
can have in the data shown. One tedious way to do this is to
Figure
3.
Notes for Figure 3:
Plotting frequency-versus-time
on this log-log plot does indeed suggest a power law,
. Assuming the line is the best representation of the data,
then the slope is calculated using the two circled points, (520, 0.5) and
(0.1, 36), at the ends of line. We get
The intercept occurs at the other circled point
on the vertical axis, M = 1 (recall that ln 1 = 0), and is approximately
11. Therefore the equation relating these experimental variables is
.
Figure
4.
Figure 4: Cartesian Plot of Frequency versus Mass
Notes for Figure 4:
This plot of frequency-versus-time
is very non-linear, making it difficult to find the equation relating the
two variables. It suggests a reciprocal relationship or possibly an exponential
decay. The reciprocal relationship is a power law, and this would suggest
plotting the points on a log-log graph. The previous figure shows that the
relationship is indeed a power law.
estimate visually the range of values used to
calculate m, and then proceed to calculate m for these various
values taken from the graph. If the m’s that result bracket -0.5,
then we say that the data are consistent with the theory.
Now it is obvious that
reading numbers from the graphs may not be very accurate, especially if the
graphs are small. Larger plots and finer divisions would make reading numbers
from the graph more precise. On the other hand, there are more systematic
methods that use the data themselves to find m and its uncertainty.
These methods come from the process of regression analysis which is,
in turn, a subset of techniques from statistical analysis. Some of these
techniques are part of the standard features built into today’s graphing calculators
and will be explored later.
The next plot (Figure
5) shows a log-log plot of the periods of the planets as a function of their
distances from the sun. Analyze the data to see if they are consistent with
Kepler’s third law.
Figure 6 shows a Cartesian
plot of the planetary periods. Notice that all of the data for the inner
planets are crowded into the lower left-hand corner of the graph. It is not
obvious from this Cartesian plot what mathematical relationship applies to
the periods and the distances from the sun. Note however that the range of
the data on one axis is much larger than on the other. This could be a clue
that suggests something like a power law. Another function that produces
large ranges in the data is the exponential. This is handled by a semi-log
plot discussed in the next section.
If it happens that the
data are negative numbers, it is necessary to take the absolute values of
the numbers before taking the logarithms. This raises another issue, however,
if the measurements of an experimental variable take on both positive
and negative values. What would the data mean if one were to take logarithms
of these numbers? Taking the absolute value of the negative numbers before
taking logarithms distorts the graph of the data by flipping the plotted points
about an axis on the graph. What if one of the data points is zero? The
logarithm of zero is negative infinity (undefined, actually).
These last observations
raise yet another issue. It is common practice to use computerized data acquisition
in modern laboratories. As with any measuring instrument, these devices have
limits on the sizes of the numbers they can measure. When measuring very
small values, in the vicinity of zero, it is possible for the “noise” (uncertainties)
in the data acquisition equipment to be as large as the signal being measured
(we say that the signal is “down into the noise” when this happens). If the
values being measured are supposed to be all positive, no matter how small,
the measuring uncertainties can make some of these measurements much closer
to zero, zero itself, or even negative. Consider, for example, a number that
is supposed to be on the order of 10-3, but the measuring uncertainties
put it closer to 10-6, in other words, three orders of magnitude
smaller. What would this do a log plot of the data? Suppose the measuring
uncertainties make some of the measured values negative when they were supposed
to be positive. You need to take logarithms of the data for plotting. What
do you do with the data when these kinds of data acquisition uncertainties
affect the measurements? We leave it to you to ponder the consequences of
this for the moment.
Figure
5.
Figure 5: Log-Log Plot of Planetary Periods versus Distances From the Sun
Notes for Figure 5:
Because the log-log graph
looks like a straight line, the periods of the nine planets versus their distances
from the sun appears to be a power law,
. Draw a straight line through the points. Pick two well-spaced points
on this line and find the equation relating T and R. How does
it compare with Kepler’s third law?
Figure
6.
Figure 6: Cartesian Plot of Planetary Periods versus Distances
From the Sun
Notes for Figure 6:
The data for Mercury,
Venus, Earth, and Mars are all crowded into the lower left-hand corner of
this Cartesian plot. The trend of the data curves upward. Is it a power
law? If so, what power of distance is it? It is not obvious from this plot.
The log-log plot in Figure 5 can be used to find it.
Semi-log Plots
Another function often encountered in both theoretical
and experimental work is the exponential function which has the form
.
This function occurs in connection with phenomena
for which the rate at which a variable changes is proportional to the
amount of the variable that is present. Examples include population growths
of various organisms, radioactive decay of elements into other elements, and
the discharge of a capacitor through a resistor in an electronic circuit.
Taking the natural logarithm of both sides of this exponential function gives
,
or
.
Notice the similarity
to the slope-intercept form of a straight line
. In this case, y is replaced by ln(y), and b is replaced
by ln(b). If the variables are related by an exponential equation,
then plotting the logarithm of the dependent variable directly against
the independent variable (not the logarithm of the independent variable)
will produce a straight line.
This kind of plot is called
a semi-log plot (also log-linear) because only one of the two
variables is plotted on a logarithmic scale. In most cases, the vertical
scale is logarithmic while the horizontal scale is linear. As in the case
of the Cartesian plot, the slope, m, of a line on a semi-log plot is
the rise divided by the run. However, the distances in the vertical direction
are logarithms. Since the numbers instead of their logarithms are written
along the vertical axis, we must take the logarithms of these numbers before
using them in the calculation of the slope. The numbers in the horizontal
direction can be used as they appear. Thus, for
and
,
.
The vertical intercept, b, can be read
off the vertical axis through x = 0 because the number b is
shown instead of its logarithm.
Figure 7 shows a Cartesian
plot of the discharge curve of a capacitor, while Figure 8 is a semi-log plot
of the same data. As in the previous examples, the solid line is meant to
be the best representation of the data. Using the two circled points, (0,
6) and (35, 0.032) in Figure 8, we find b = 6.0, and the slope to be
.
Thus the equation relating the variables is
. The theoretical curve relating these variables is
. R and C are the resistance and capacitance in the
discharging circuit.
Figure
7.
Figure 7: Cartesian Plot of Capacitor Voltage versus Time
Notes for Figure 7:
This is a classic exponential
decay curve. The rate at which the voltage is changing with respect to time
is proportional to the value of the voltage. Since the curve also looks somewhat
like that of a reciprocal equation, one could be tempted to plot the data
on a log-log graph. It would soon be obvious that this would not produce
a straight line. We need to plot it on a semi-log (log-linear) graph before
it becomes evident that it is an actually an exponential.
Figure
8.
Figure 8: Semi-Log Plot of Capacitor Voltage versus Time
Notes for Figure 8:
The nearly straight line
on a semi-log plot suggests an exponential equation. If the solid line is
the best representation of the data, then we can choose the two circled points
to calculate the slope. It is convenient to pick one of these points to be
the vertical intercept, b = 6.0. The slope is
.
Therefore the equation of the line is
. The theoretical equation is
. This means that
.
Other Techniques for Linearizing Data
Consider the equation
. This is certainly an example of a power law equation,
, with
and
. However, we can also think of
as a variable by letting
. Then the equation looks like
which is a linear equation in slope-intercept form,
, with
and
. This would suggest that, instead of plotting d versus t,
we should plot d versus t2 to get a straight line
passing through zero. The slope of this line could then be calculated, and
this slope would allow us to find a from the fact that
. This works especially well with an equation like
, because even though this is an equation containing a power of t,
you will discover that it does not plot as a straight line on a log-log graph.
It is not strictly in the form of a power law equation (Try taking the logarithm
of both sides of this equation and see what you get.). However, plotting
d versus t2 does produce a straight line with
intercept
and slope m =
.
Figures 9 through 12 illustrate
the above points. The distance-versus-time graph in Figure 9 has what appears
to be a vertical offset. If this represents an example of uniformly accelerated
motion, we would expect the curve to be a parabola. Since the slope of the
curve is zero at time zero, we suspect that this object was starting from
rest, but was already some distance away when the motion started. If this
is the case, the equation has the form
. Figure 10 shows what happens if we try to do a log-log plot. On
the other hand, plotting distance versus the square of the time, as
in Figure 11, gives a straight line with an intercept at d0
= 450 meters. Figure 12 illustrates yet another approach to analyzing the
data. Removing the offset, d0, leaves the equation in the
form
which is a power law; therefore, making a log-log plot of
versus t gives a straight line.
(Another question to
ponder: What if subtracting the offset makes one of the data points zero,
which could easily happen if we subtract, say, the minimum value in the data
set from all of the data values in the set. How should this be handled
on a log plot?)
We showed earlier that
Kepler’s third law,
, could be put into the form of a power law by taking the square-root
of both sides of the equation. On the other hand, we could plot
versus
, and the result would be a straight line with a slope of
and an intercept of zero.
Similarly, for data obeying
an equation of the form
, which we saw plotted in Figure 4, it is also possible to plot f
versus
to get a straight line with slope
and intercept zero. We could also plot
versus
to get a straight line with intercept zero and slope
.
To use these techniques,
you must already know, or have a pretty good idea of, the form of the equation
your data are following. The main reason for using these techniques is to
extract the actual numbers (such as the values of a and d0 in
) from the collected data. These are values that are unique to the
particular experimental set-up, in which you are making use of a known law
to find them. For example, suppose you want to measure the gravitational
constant, G, in Newton’s law of gravitation
.
In principle, you could use two known masses and
vary the separation between them, each time measuring the force, F.
Then you could plot these data on a log-log plot from which you should get
a straight line with a slope of negative 2. The intercept would be
GMm on this log-log graph, and it would occur on the vertical line
through r = 1. You could also plot F versus
on a Cartesian graph. The slope of this graph would be Gmm,
and since you know M and m, you could find G. In practice,
however, this experiment is difficult because the forces that are measured
are extremely small. One of these methods of data analysis may give
better results than the other, depending on the kinds of errors that creep
into the experiment.
(Question: The
gravitational constant, G, has been measured to be
. How do you suppose the data are handled in order to be able to see
this number on a graph?)
How one chooses what to
plot is determined by the equation connecting the variables and the kinds
of experimental errors that are part of the measurements. Some precautions
are necessary. All measured variables contain experimental errors. If you
choose to plot the square, cube, or some higher power of a set of measurements,
you magnify the plotting errors. Taking square-roots, or cube-roots, etc.,
of measured values reduce these errors. Exponentiating the measured values
can magnify or reduce plotting errors depending upon what part of the exponential
curve they are on. Taking logarithms reduces plotting errors, except for
taking logarithms of numbers between 0 and 1, which greatly magnifies the
plotting errors as the measured values approach zero.
There is much to be said
for experience in handling, plotting, and analyzing data. There are also
many advanced statistical techniques for handling and analyzing data that
are available to an experimenter. Many of these can be found in standard
software packages for the computer, and most graphing calculators have some
of these techniques built into their internal firmware. We will discuss some
of these later, but do not expect that these advanced techniques will sweep
any of the experimental uncertainty issues under the rug. No matter what
techniques the researcher is using to extract results from experimental data,
it is important for the researcher to understand thoroughly the behaviors
of the equipment and the experimental data. Advanced methods of data transformation
and analyses make it possible to quantify and display experimental uncertainties
more clearly. In conjunction with modern computerized data acquisition equipment,
these techniques make it possible to work with far more data at faster speeds.
The result is that a more dynamic aspect of the data analysis becomes
evident, because data can be acquired rapidly enough that experimental adjustments
can be made quickly to see their effects on the experimental errors. This
often helps the researcher to develop better experimental designs almost on
the fly as rapidly repeated trials and adjustments quickly uncover what works
well and what doesn’t. These techniques are also used for quality control
in manufacturing.
After a quick summary
of the graphing techniques, we will discuss the issues of error analysis and
best fits to data. We will explore some of the methods for expressing the
uncertainties in measurements, and what kind of information is needed to calculate
these. You may have already noticed some of these issues as you looked at
the graphs. How does one decide where to draw a line through the data points?
Probably several lines could be drawn, each having a slightly different slope
and/or vertical intercept. These lines will, of course, give different results
for the experiment. Which one is the “best”? Is there a way to quantify
this? This issue is related to the issue of reporting a number that best
represents a series of measurements of the same quantity (such as the length
of a room). You have probably been told to take several measurements and
report the average. Why is the average the best number to report?
Why not the number that occurs most often? What do all of those other measurements
tell us? Many scientific and engineering measurements are reported as a number
plus or minus an uncertainty. How is the uncertainty in a measurement
calculated, and what does it mean? The sections on statistical techniques
will begin to deal with these questions.
Figure
9.
Figure 9: Distance versus Time Graph With a Vertical Offset.
Notes for Figure 9:
If this is a case of uniformly
accelerated motion from rest, then the equation for this curve would be of
the form
. Even though it contains a power of t, this equation is not
a power law. Therefore, plotting the data on a log-log plot will not produce
a straight line, as the next figure shows.
Figure
10.
Figure 10: Log-Log Plot of the Distance versus Time Graph of Figure 9.
Notes for Figure 10:
This is what happens when
one plots an equation that contains a power of the independent variable, but
also has an offset that displaces the graph vertically. Such an equation
is not a power law. Notice that taking the logarithm of the equation
produces
which can also be written as
. This finally reduces to the form
,
which is not in the slope-intercept form
of a straight line.
Figure
11.
Figure 11: Plot of Distance versus Time^2
Notes for Figure 11:
If we think of the t2
in
as a variable in itself, and then plot d versus the square
of t, we get a straight line. Using the circled points, we find the
intercept, d0 = 450, and the slope
.
Therefore the equation of this line is
. According to the theory describing uniformly accelerated motion,
the coefficient, 16.1, is
, therefore, a = 32.2.
Figure
12.
Figure 12: Log-Log Plot After Removal of Vertical Offset.
Notes for Figure 12:
This is an alternative
approach to linearizing the data in the previous three figures. If the equation
is of the form
, then we can subtract the offset, d0 = 450, to get
a power-law equation
. Take the logarithms of both sides to get
. Now this equation is in the slope-intercept form of a straight
line. We simply subtract the vertical offset in d from all of the
distances and then plot the data on a log-log graph. Using the circled points,
we find the intercept on t = 1 to be approximately 16. The slope is
calculated to be
. Therefore the equation of this line is
, or
. Compare this with the result in Figure 11.
Summary of Graphing Techniques
The following table summarizes
the graphing techniques used to linearize experimental data for extracting
the equation relating the experimental variables. These techniques are very
visual, useful, and fast. Every experimenter who deals with variables which
may be related by an equation uses these methods. Just plotting the data
can help the experimenter debug or improve the experiment.
Table 1: A Summary of Linearization Techniques
| If
plotting data gives a straight line on a |
Then
the form of the equation relating the variables is |
The
slope, m, is calculated by |
and
b is read from the vertical axis through |
| Cartesian Graph |
a
linear
|
|
|
| Log-Log Graph |
a
power-law
|
|
|
| Semi-log Graph |
an
exponential
|
|
|