Home
Introduction and     Objectives
Library Skills
Scientific Process
Statistical Analysis
   Introduction to        Statistics
    Handling Data in         the Sciences
      Background
      Behavior Of          Uncertainties
      Discussion Of           Errors
      Expressing Number
      Graphical Analysis
      Linear Regression
      Preface
      Propagation Of           Errors
Technical Writing     and Evaluation
Poster Board     Design and Use
Academic Integrity
Resources
Click Here to Go Home An Introduction to Basic Statistics | Graphical Analysis of Data

Graphical Analysis of Data

Graphical Analysis of Data

Finding functional relationships in physics

One of the major objectives of an experimental science such as physics is to find functional relationships between experimental variables.  A functional relationship most often means a mathematical equation that expresses an experimental variable in terms of one or more other experimental variables.  Some examples are:

  ,         ,        ,         ,   ,       , and  .

There are two major reasons for looking for functional relationships between experimental variables in a science such as physics: (1) to test the predictions of one or more theories in order to decide which one Nature prefers, and (2) to provide clues that will help in constructing theories about the way Nature works.

In testing a physical hypothesis, data are gathered and graphed to see if these data are consistent with the predictions of the hypothesis. Since the laws of physics seem to be best expressed in mathematical form, this means that the experimenter must try to extract the mathematical equation from the experimental data.  If there are particular numbers that must be determined (e.g., the gravitational constant, G, or an energy level in an atom, etc.), the experimenter tries to design the experiment in a way that will give the most exact answer possible.

If an experimenter is working in an area of physics where the physical laws are unknown or uncertain, it is often necessary to guess at mathematical relationships.  Sometimes just plotting the data on a graph will suggest a mathematical relationship, and this might help to clarify what kind of a theory might explain the data.  When an equation relating experimental variables is obtained from experiment alone and has no known basis in a theory, it is called an empirical equation.  These kinds of equation are also useful in engineering applications provided they are not extended beyond the range of the data from which they were derived.  Such equations can be used to calibrate instruments.  Sometimes they are simply used for converting measurements in one kind of unit to values in a different unit.  A good example is a thermocouple curve.  Thermocouples produce a small voltage as they are heated.  The voltage output of the thermocouple can be converted to temperature by means of an empirical equation.

In the following sections, we will discuss some techniques for extracting equations from the graphs of experimental data.  We will be interested primarily in Cartesian graphs, semi-log graphs, and log-log graphs.  Some other linearization techniques will also be shown.


Cartesian Graphs

As you may have learned from your mathematics courses, a Cartesian coordinate system consists of a set of axes with uniformly spaced divisions.  The independent variable of an experiment is usually plotted as the horizontal coordinate, and the dependent variable is usually plotted as the vertical coordinate.  Each coordinate pair locates a point on the graph, and the set of points from all of the data will usually suggest some kind of “curve”.  The simplest “curve” that might be produced by a set of data is a straight line as shown in Figure 1.

The equation of a straight line has the form , where y is the name of the dependent variable, and x the independent variable (Other letters, such as T for temperature and V for volume, can also be used.  The names given to the variables depend on what is being measured.)  This form of the equation of a straight line is called the slope-intercept form.  The slope, m, of the graph tells how many units the graph rises or falls  for each unit of horizontal movement to the right.  It is defined by

.

Whichever pair of points is chosen, the slope is simply the change that occurs in the vertical direction divided by the change that occurs in the horizontal direction as one moves between the pair of points.  These changes are represented by the dashed lines in Figure 1.

The vertical intercept is the point where the graph crosses the vertical axis, and it has the coordinates .  This is the circled point on the vertical axis in Figure 1.  Thus we see that we can “read” the equation from the graph by calculating its slope and noting where the graph crosses the vertical axis.

Figure 2 shows an example of an experiment in which the distance, d, a body moves is plotted against time, t.  Notice that not all of the plotted points lie on the line drawn through them.  This is due to experimental errors in the measurements.  The line is meant to be the best fit to the trend of the data points. (What is meant by “best fit” is discussed later, but for now, the line is what is presumed would occur if the measurements were perfect for each point.)  Since we are claiming that the line is the best representation of the data, then we can pick two points on the line and use them to calculate the slope.  To help reduce the uncertainty in the calculation, it is best to pick points far apart, and, if convenient, the vertical intercept can be one of the points.  Having calculated the slope, we then read off the intercept and write down the equation relating the distance to the time.  Let’s do this for the data in Figure 2.

We can use the two points that are circled.  Notice that these lie on the line, and that it is convenient to choose one of the points to be the y-intercept at (0, -23).  The other is at (95, 230)  Thus  the slope is

.

Then the equation relating distance to time is .

Figure 1


Figure 1:  Cartesian Plot of a Straight Line

Notes for Figure 1:

The circled point on the vertical axis is the vertical intercept, b.  The other two circled points are used to calculate the slope, m.  The dashed lines indicate the rise (vertical change, Dy) and the run (horizontal change, Dx).  If we call the left-hand point (x1, y1) and the right-hand point (x2, y2), then the slope is calculate by

.


Figure 2.


Figure 2:  Cartesian Plot of Distance versus Time

Notes for Figure 2:

The y-intercept is the circled point at the left (0, -23).  This and the other circled point, (95, 230), are used to calculate the slope, m, which is

Therefore, the equation relating d and t for this distance-versus-time plot is .  The distance is in meters, and the time is in seconds.


In the example shown in Figure 2, the line is sloping upward to the right, therefore the slope is positive.  Had the curve been sloping downward to the right, we would have calculated a negative number for the slope, m.

Log-Log Plots

Many theories or experimental situations present us with data that are related by a power law, which is an equation of the form

where m and b are real numbers.  If we suspect we are dealing with such a relationship between the experimental variables, we can use a linearizing technique that causes the data to lie along a straight line when the points are plotted.  Take the natural logarithm of both sides of this power law equation.  We get

 

or

.

Notice that this equation looks very similar to the slope-intercept form of the equation for a straight line, except that y has been replaced by ln(y), x by ln(x), and b by ln(b).  If we were instead to take the common logarithm of both sides, we would get

.

Again we have an equation that looks like the slope-intercept form of a straight line with y replaced by log(y), x by log(x), and b by log(b).  What this means is that, if the variables are related by a power law, plotting the logarithms (natural or common) of the variables instead of the variables themselves will produce a straight line.  Here are some examples of power law relationships.

Kepler’s third law says that the square of the period of a satellite’s orbit is proportional to the cube of the mean radius of the orbit, or G is the gravitational constant, and M is the mass of the planet or star about which the satellite is orbiting.  This can be written in the power law form by taking the square root of both sides to get .  Thus we see that  and .

The frequency of oscillation (number of oscillations per second) of a mass on the end of a spring is given by the equation , where k is the spring constant, and M is the amount of mass fastened to the end of the spring.  If we are considering the period and the mass to be the experimental variables, this can be rewritten as .  Thus , and .

Newton’s law of gravitation states that the gravitational force between two spherical masses, M, and m, is proportional to the product of the masses and inversely proportional to the square of the distance, r, between their centers.  Stated mathematically, , where we see that , and .  A similar force law holds for two electrical charges.

Log-log plots have a set of horizontal and vertical axes with spacings proportional to the logarithm of the numbers on both axes (see Figure 3).  However, instead of writing the logarithms of the numbers along the axes, the numbers themselves are written.  In addition to making the graphs easier to read, this means that simply placing  points on the graph at the locations of their numbered coordinates actually locates them at the logarithms of their coordinates.  This is equivalent to taking the logarithms of the numbers and plotting them on a Cartesian graph.

Notice in Figure 3 that the distances between powers of ten are uniform, but the spacings in between these get closer together.  Thus the first division mark after the 1 is 2, the next is 3, etc.  Similarly the first division mark after the after the 10 is 20, the next 30, etc., and after 0.01 are 0.02 and 0.03, etc..  If a straight line is produced by plotting data on a log-log graph, then the data are related by a power law.  The coefficient, b, in  is simply read off the graph on the vertical line passing through x = 1 (note, ln(1) = 0), because the number instead of its logarithm is actually displayed.

The slope, m, is still the rise divided by the run, as in Cartesian plots, but the distances in both the vertical and horizontal directions are logarithms of the numbers plotted.  Because the numbers themselves are printed along the axes, we read the numbers and take their logarithms before using them to calculate the slope.  Therefore

.

The graphs shown in Figures 3 and 4 are data from an experiment measuring the frequency of oscillation of a mass on a spring for various masses.  The Cartesian plot is highly non-linear, however, when the data are plotted on a log-log graph, a straight line is suggested.  Taking the two circled points at the ends of the line through the data points, we can estimate the slope as

 .

The value of b is taken from the circled point where the line crosses M = 1 (LnM = 0).  From the graph we read b = 11.  Therefore the equation for this line is .  Compare this with the theoretical equation for the frequency given by .  We see that the power of M is very close to what the theory predicts, and we can solve for k from the fact that .  To determine whether or not the data are consistent with the theory, we would need to determine the range of values that the slope m can have in the data shown.  One tedious way to do this is to


Figure 3.

Notes for Figure 3:

Plotting frequency-versus-time on this log-log plot does indeed suggest a power law, .  Assuming the line is the best representation of the data, then the slope is calculated using the two circled points, (520, 0.5) and (0.1, 36), at the ends of line.  We get

The intercept occurs at the other circled point on the vertical axis, M = 1 (recall that ln 1 = 0), and is approximately 11.  Therefore the equation relating these experimental variables is .


Figure 4.


Figure 4:  Cartesian Plot of Frequency versus Mass

Notes for Figure 4:

This plot of frequency-versus-time is very non-linear, making it difficult to find the equation relating the two variables.  It suggests a reciprocal relationship or possibly an exponential decay.  The reciprocal relationship is a power law, and this would suggest plotting the points on a log-log graph.  The previous figure shows that the relationship is indeed a power law.

estimate visually the range of values used to calculate m, and then proceed to calculate m for these various values taken from the graph.  If the m’s that result bracket -0.5, then we say that the data are consistent with the theory.

Now it is obvious that reading numbers from the graphs may not be very accurate, especially if the graphs are small.  Larger plots and finer divisions would make reading numbers from the graph more precise.  On the other hand, there are more systematic methods that use the data themselves to find m and its uncertainty.  These methods come from the process of regression analysis which is, in turn, a subset of techniques from statistical analysis.  Some of these techniques are part of the standard features built into today’s graphing calculators and will be explored later.

The next plot (Figure 5) shows a log-log plot of the periods of the planets as a function of their distances from the sun.  Analyze the data to see if they are consistent with Kepler’s third law.

Figure 6 shows a Cartesian plot of the planetary periods.  Notice that all of the data for the inner planets are crowded into the lower left-hand corner of the graph.  It is not obvious from this Cartesian plot what mathematical relationship applies to the periods and the distances from the sun.  Note however that the range of the data on one axis is much larger than on the other.  This could be a clue that suggests something like a power law.  Another function that produces large ranges in the data is the exponential.  This is handled by a semi-log plot discussed in the next section.

If it happens that the data are negative numbers, it is necessary to take the absolute values of the numbers before taking the logarithms.  This raises another issue, however, if the measurements of an experimental variable take on both positive and negative values.  What would the data mean if one were to take logarithms of these numbers?  Taking the absolute value of the negative numbers before taking logarithms distorts the graph of the data by flipping the plotted points about an axis on the graph.  What if one of the data points is zero?  The logarithm of zero is negative infinity (undefined, actually).

These last observations raise yet another issue.  It is common practice to use computerized data acquisition in modern laboratories.  As with any measuring instrument, these devices have limits on the sizes of the numbers they can measure.  When measuring very small values, in the vicinity of zero, it is possible for the “noise” (uncertainties) in the data acquisition equipment to be as large as the signal being measured (we say that the signal is “down into the noise” when this happens).  If the values being measured are supposed to be all positive, no matter how small, the measuring uncertainties can make some of these measurements much closer to zero, zero itself, or even negative.  Consider, for example, a number that is supposed to be on the order of 10-3, but the measuring uncertainties put it closer to 10-6, in other words, three orders of magnitude smaller.  What would this do a log plot of the data?  Suppose the measuring uncertainties make some of the measured values negative when they were supposed to be positive. You need to take logarithms of the data for plotting.  What do you do with the data when these kinds of data acquisition uncertainties affect the measurements?  We leave it to you to ponder the consequences of this for the moment.


Figure 5.


Figure 5:  Log-Log Plot of Planetary Periods versus Distances From the Sun

Notes for Figure 5:

Because the log-log graph looks like a straight line, the periods of the nine planets versus their distances from the sun appears to be a power law, .  Draw a straight line through the points.  Pick two well-spaced points on this line and find the equation relating T and R.  How does it compare with Kepler’s third law?


Figure 6.


Figure 6:  Cartesian Plot of Planetary Periods versus Distances From the Sun

Notes for Figure 6:

The data for Mercury, Venus, Earth, and Mars are all crowded into the lower left-hand corner of this Cartesian plot.  The trend of the data curves upward.  Is it a power law?  If so, what power of distance is it?  It is not obvious from this plot.  The log-log plot in Figure 5 can be used to find it.


Semi-log Plots

Another function often encountered in both theoretical and experimental work is the exponential function which has the form

.

This function occurs in connection with phenomena for which the rate at which a variable changes is proportional to the amount of the variable that is present.  Examples include population growths of various organisms, radioactive decay of elements into other elements, and the discharge of a capacitor through a resistor in an electronic circuit.  Taking the natural logarithm of both sides of this exponential function gives

,

or

.

Notice the similarity to the slope-intercept form of a straight line . In this case, y is replaced by ln(y), and b is replaced by ln(b).  If the variables are related by an exponential equation, then plotting the logarithm of the dependent variable directly against the independent variable (not the logarithm of the independent variable) will produce a straight line.

This kind of plot is called a semi-log plot (also log-linear) because only one of the two variables is plotted on a logarithmic scale.  In most cases, the vertical scale is logarithmic while the horizontal scale is linear.  As in the case of the Cartesian plot, the slope, m, of a line on a semi-log plot is the rise divided by the run.  However, the distances in the vertical direction are logarithms.  Since the numbers instead of their logarithms are written along the vertical axis, we must take the logarithms of these numbers before using them in the calculation of the slope.  The numbers in the horizontal direction can be used as they appear.  Thus, for and ,

.

The vertical intercept, b, can be read off the vertical axis through x = 0 because the number b is shown instead of its logarithm. 

Figure 7 shows a Cartesian plot of the discharge curve of a capacitor, while Figure 8 is a semi-log plot of the same data. As in the previous examples, the solid line is meant to be the best representation of the data.  Using the two circled points, (0, 6) and (35, 0.032) in Figure 8, we find b = 6.0, and the slope to be

.

Thus the equation relating the variables is .  The theoretical curve relating these variables is R and C are the resistance and capacitance in the discharging circuit.


Figure 7.


Figure 7:  Cartesian Plot of Capacitor Voltage versus Time

Notes for Figure 7:

This is a classic exponential decay curve.  The rate at which the voltage is changing with respect to time is proportional to the value of the voltage.  Since the curve also looks somewhat like that of a reciprocal equation, one could be tempted to plot the data on a log-log graph.  It would soon be obvious that this would not produce a straight line.  We need to plot it on a semi-log (log-linear) graph before it becomes evident that it is an actually an exponential. 


Figure 8.


Figure 8:  Semi-Log Plot of Capacitor Voltage versus Time

Notes for Figure 8:

The nearly straight line on a semi-log plot suggests an exponential equation.  If the solid line is the best representation of the data, then we can choose the two circled points to calculate the slope.  It is convenient to pick one of these points to be the vertical intercept, b = 6.0.  The slope is

.

Therefore the equation of the line is .  The theoretical equation is .  This means that .

Other Techniques for Linearizing Data

Consider the equation .  This is certainly an example of a power law equation, , with  and .  However, we can also think of  as a variable by letting  .  Then the equation looks like  which is a linear equation in slope-intercept form, , with  and .  This would suggest that, instead of plotting d versus t,  we should plot  d versus t2 to get a straight line passing through zero.  The slope of this line could then be calculated, and this slope would allow us to find a from the fact that .  This works especially well with an equation like , because even though this is an equation containing a power of t, you will discover that it does not plot as a straight line on a log-log graph.  It is not strictly in the form of a power law equation (Try taking the logarithm of both sides of this equation and see what you get.).  However, plotting d versus t2 does produce a straight line with intercept and slope m = .

Figures 9 through 12 illustrate the above points.  The distance-versus-time graph in Figure 9 has what appears to be a vertical offset.  If this represents an example of uniformly accelerated motion, we would expect the curve to be a parabola.  Since the slope of the curve is zero at time zero, we suspect that this object was starting from rest, but was already some distance away when the motion started.  If this is the case, the equation has the form .  Figure 10 shows what happens if we try to do a log-log plot.  On the other hand, plotting distance versus the square of the time, as in Figure 11, gives a straight line with an intercept at d0 = 450 meters.  Figure 12 illustrates yet another approach to analyzing the data.  Removing the offset, d0, leaves the equation in the form  which is a power law; therefore, making a log-log plot of  versus t gives a straight line.

(Another question to ponder: What if subtracting the offset makes one of the data points zero, which could easily happen if we subtract, say, the minimum value in the data set from all of the data values in the set.  How should this be handled on a log plot?)

We showed earlier that  Kepler’s third law, , could be put into the form of a power law by taking the square-root of both sides of the equation.  On the other hand, we could plot  versus , and the result would be a straight line with a slope of  and an intercept of zero. 

Similarly, for data obeying an equation of the form , which we saw plotted in Figure 4, it is also possible to plot f versus  to get a straight line with slope  and intercept zero.  We could also plot  versus  to get a straight line with intercept zero and slope .

To use these techniques, you must already know, or have a pretty good idea of, the form of the equation your data are following.  The main reason for using these techniques is to extract the actual numbers (such as the values of a and d0 in ) from the collected data.  These are values that are unique to the particular experimental set-up, in which you are making use of a known law to find them.  For example, suppose you want to measure the gravitational constant, G, in Newton’s law of gravitation

.

In principle, you could use two known masses and vary the separation between them, each time measuring the force, F.  Then you could plot these data on a log-log plot from which you should get a straight line with a slope of negative 2.  The intercept would be GMm on this log-log graph, and it would occur on the vertical line through r = 1.  You could also plot F versus  on a Cartesian graph.  The slope of this graph would be Gmm, and since you know M and m, you could find G.  In practice, however, this experiment is difficult because the forces that are measured are extremely small.  One of these methods of data analysis may give better results than the other, depending on the kinds of errors that creep into the experiment.

(Question:  The gravitational constant, G, has been measured to be .  How do you suppose the data are handled in order to be able to see this number on a graph?)

How one chooses what to plot is determined by the equation connecting the variables and the kinds of experimental errors that are part of the measurements.  Some precautions are necessary.  All measured variables contain experimental errors.  If you choose to plot the square, cube, or some higher power of a set of measurements, you magnify the plotting errors.  Taking square-roots, or cube-roots, etc., of measured values reduce these errors.  Exponentiating the measured values can magnify or reduce plotting errors depending upon what part of the exponential curve they are on.  Taking logarithms reduces plotting errors, except for taking logarithms of numbers between 0 and 1, which greatly magnifies the plotting errors as the measured values approach zero.

There is much to be said for experience in handling, plotting, and analyzing data.  There are also many advanced statistical techniques for handling and analyzing data that are available to an experimenter.  Many of these can be found in standard software packages for the computer, and most graphing calculators have some of these techniques built into their internal firmware.  We will discuss some of these later, but do not expect that these advanced techniques will sweep any of the experimental uncertainty issues under the rug.  No matter what techniques the researcher is using to extract results from experimental data, it is important for the researcher to understand thoroughly the behaviors of the equipment and the experimental data.  Advanced methods of data transformation and analyses make it possible to quantify and display experimental uncertainties more clearly.  In conjunction with modern computerized data acquisition equipment, these techniques make it possible to work with far more data at faster speeds.  The result is that a more dynamic aspect of the data analysis becomes evident, because data can be acquired rapidly enough that experimental adjustments can be made quickly to see their effects on the experimental errors.  This often helps the researcher to develop better experimental designs almost on the fly as rapidly repeated trials and adjustments quickly uncover what works well and what doesn’t.  These techniques are also used for quality control in manufacturing.

After a quick summary of the graphing techniques, we will discuss the issues of error analysis and best fits to data.  We will explore some of the methods for expressing the uncertainties in measurements, and what kind of information is needed to calculate these.  You may have already noticed some of these issues as you looked at the graphs.  How does one decide where to draw a line through the data points?  Probably several lines could be drawn, each having a slightly different slope and/or vertical intercept.  These lines will, of course, give different results for the experiment.  Which one is the “best”?  Is there a way to quantify this?  This issue is related to the issue of reporting a number that best represents a series of measurements of the same quantity (such as the length of a room).  You have probably been told to take several measurements and report the average.  Why is the average the best number to report?  Why not the number that occurs most often?  What do all of those other measurements tell us?  Many scientific and engineering measurements are reported as a number plus or minus an uncertainty.  How is the uncertainty in a measurement calculated, and what does it mean?  The sections on statistical techniques will begin to deal with these questions.


Figure 9.


Figure 9: Distance versus Time Graph With a Vertical Offset.

Notes for Figure 9:

If this is a case of uniformly accelerated motion from rest, then the equation for this curve would be of the form .  Even though it contains a power of t, this equation is not a power law.  Therefore, plotting the data on a log-log plot will not produce a straight line, as the next figure shows.


Figure 10.


Figure 10:  Log-Log Plot of the Distance versus Time Graph of Figure 9.

Notes for Figure 10:

This is what happens when one plots an equation that contains a power of the independent variable, but also has an offset that displaces the graph vertically.  Such an equation is not a power law.  Notice that taking the logarithm of the equation  produces  which can also be written as .  This finally reduces to the form

,

which is not in the slope-intercept form of a straight line.


Figure 11.


Figure 11:  Plot of Distance versus Time^2

Notes for Figure 11:

If we think of the t2 in  as a variable in itself, and then plot d versus the square of t, we get a straight line.  Using the circled points, we find the intercept, d0 = 450, and the slope

.

Therefore the equation of this line is .  According to the theory describing uniformly accelerated motion, the coefficient, 16.1,  is , therefore, a = 32.2.


Figure 12.


Figure 12:  Log-Log Plot After Removal of Vertical Offset.

Notes for Figure 12:

This is an alternative approach to linearizing the data in the previous three figures.  If the equation is of the form , then we can subtract the offset, d0 = 450, to get a power-law equation .  Take the logarithms of both sides to get .  Now this equation is in the slope-intercept form of a straight line.  We simply subtract the vertical offset in d from all of the distances and then plot the data on a log-log graph.  Using the circled points, we find the intercept on t = 1 to be approximately 16.  The slope is calculated to be .  Therefore the equation of this line is , or .  Compare this with the result in Figure 11.


Summary of Graphing Techniques

The following table summarizes the graphing techniques used to linearize experimental data for extracting the equation relating the experimental variables.  These techniques are very visual, useful, and fast.  Every experimenter who deals with variables which may be related by an equation uses these methods.  Just plotting the data can help the experimenter debug or improve the experiment.

Table 1: A Summary of Linearization Techniques

If plotting data gives a straight line on a

Then the form of the equation relating the variables is

The slope, m, is calculated by

and b is read from the vertical axis through

Cartesian Graph

a linear

Log-Log Graph

a power-law

Semi-log Graph

an exponential

Home - Intro - Lib. Skills - Sci. Process - Stat. Analysis
Tech. Writing - Poster Board - Resources
© 2002 Biological Science Institute
All Rights Reserved.
This website is optimized for
 Internet Explorer