Expressing a Number and its Uncertainty
Reporting the Best Representation of a Series of Measurements
Scientific experimentation
frequently involves making measurements with some kind of instrument, and
this most often means measuring something to as high a precision and
accuracy as is necessary to draw valid conclusions from an experiment.
Precision refers roughly to how tightly clustered a sequence of measurements
is. Accuracy refers to how closely the measurements agree with the actual
value of the quantity being measured. It is possible to have high precision
and poor accuracy. An analogy would be a tightly clustered bunch of bullet
holes way off to one side of a target. High accuracy and poor precision would
be a analogous to target peppered all over with holes, but with the center
of the distribution of holes in the bull’s eye. Obviously one would like
to have as much of both precision and accuracy as possible.
It is an experimental
fact that, as the measurement scale of an instrument gets finer, there is
eventually a point at which several measurements of a given quantity (a length,
for example) will give different values. The question then arises, which
value should be reported? The most frequently occurring value (the mode)?
The one that has half of the values above it and half below (the median)?
The arithmetic average or mean? There are actually good reasons
for giving any of these, but the context in which the measurement is made
and the kind of information desired determines which is used. These numbers
are often referred to as measures of central tendency because they
each represent a value about which the set of measurements tend to cluster.
When using instruments for measuring physical quantities such as length, mass,
volts, time, etc., it is usually the mean that is reported. However,
along with this mean, there is reported another number that expresses the
uncertainty in the measurement. It is this additional number that gives a
great deal of information about the quality of the measurements made
during an experiment, and it allows an experimenter to judge whether or not
the measurements are adequate for drawing conclusions from an experiment.
We will discuss how this is done shortly, but first let’s consider the arguments
for reporting the mean value.
Suppose we take N
measurements of a quantity, and have before us N numbers that are different
from each other. Let’s call these N measurements
. We are
looking for a single number that best represents these N numbers.
Whatever value we choose to report and however we choose to calculate it,
let’s call this number Q. Now we can subtract this number,
Q, from each of the N measurements to form what are called deviations
(sometimes called residuals). Thus we obtain a set of numbers
. Some
of these might be positive, some might be negative, and some zero. If we
square each of these deviations, then we will have a set of numbers that are
greater than or equal to zero. The sum of the squares of the deviations
is a measure of how much the individual measurements are spread about the
number, Q. If this sum is large, the measurements are spread out and
quite different from each other. If it is small, the measurements must be
tending to cluster more closely to each other. Let’s call the sum of the
squares of the deviations S, that is
,
or, more compactly,
.
We now ask, “What is the
number, Q, that minimizes the sum of the squares of the deviations?”
What we are suggesting here is that the number that best represents
this set of measurements is, in effect, the one about which the measurements
are most tightly clustered. Finding the number, Q, that minimizes
the sum, S, is a calculus exercise. If you have not had calculus yet,
don’t worry about the following calculation; it is the result of the calculation
that is important.
To minimize S with
respect to the variable Q, we simply take the derivative of S
with respect to Q, set this derivative equal to zero, and solve for
Q. Therefore
From the last line we find that
or
What we have shown is
that the number, Q, that minimizes the sum of the squares of
the deviations is the number we get when we add up the N measurements
and divide by N, namely the mean or arithmetic average of the
measurements. In other words, the number that best represents the
set of measurements is the mean of the measurements (if what we imply by “best
represents” is the number about which all of the measurements are most tightly
clustered).
There are various notations
for the mean of a series of numbers,
. The
most common are
. The
last notation is often called the expectation value of x. In
any case, each of these usually refers to the sum of the measurements divided
by the number of measurements .
Reporting the Standard Deviation of a Sequence of Measurements
In scientific or engineering
measurements, one usually does more than just report the mean of a sequence
of measurements. An additional number is reported (a measure of dispersion)
that gives some indication of how much the measurements are spread out around
this mean, thus giving an indication of the precision of the measurement.
We already mentioned that
the sum of the squares of the deviations is a measure of how tightly clustered
individual measurements are to a number Q, and we found that the Q
that minimizes this sum is the mean of all of the measurements. Therefore,
to indicate how precisely the several measurements are clustered, it might
seem reasonable to report the sum of the squares of the deviations about the
mean. In other words, along with the mean, we might report
.
Here we have used
as the
mean of the xi’s. There are problems with using this sum,
however. Firstly, the number, S, gets larger and larger as the number
of measurements increases (this is as useless as quoting the sum of
a set of measurements instead of the mean of the measurements). Secondly,
the sum is made up of the squares of the deviations, and therefore
does not have the same units as the measurements themselves. Instead, it
has units which are the squares of the units in which the measurements are
made. Thus, if the measurement is in meters, S has units of square-meters.
A better choice would be to find the mean of the
squares of the deviations. In fact, this has a name called the variance.
Thus
.
The variance is a useful number in that it does
tell us what the mean square deviation is, and this is
a measure of the spread about the mean. However, it still has units which
are the squares of the units in which the measurements are made, but if we
take the square-root of the variance, the units are consistent. Therefore
it has become customary to report the square-root of the variance as a measure
of the precision of a measurement. The square-root of the variance is called
the standard deviation, and has been given the Greek letter sigma (s) as its most common designation. Sometimes you will see S.D. or
s.d. as the designation of the standard deviation. (Another name for the standard
deviation is the root-mean-square deviation or rms deviation.)
In summary, then, the standard deviation is given by
.
We will learn later that
there are two different standard deviations that can be calculated from a
series of measurements, one called the population standard deviation,
or just
s, and the
other called the sample standard deviation,
or s.
The only difference is that N is replaced by N-1 in the denominator
of the calculation of the sample standard deviation. The important
distinction between a population and a sample of the population will be discussed
later. The different notations,
or s,
and
or s,
are so common that it is important to get used to them. Generally
and
are used
together in some texts and applications, and s and s are used
together in others.
Summary of Reporting a Measurement and its Uncertainty
We have just seen that
the best representation of a series of measurements of a quantity is the mean
of the measurements. The justification for this is that the mean is the number
that minimizes the sum of the squares of the deviations. We have also seen
that the variance is the mean of the squares of the deviations, and the standard
deviation is the square-root of the variance. Thus the mean value of the
measurements also minimizes the standard deviation. This suggests that we
can capture the essence of a measurement of a quantity by quoting the mean
of a series of measurements of that quantity followed by another number
that represents the amount of spread about this mean. The standard way of
reporting this is as follows:
,
where the mean of the series of measurements is given by
,
and the standard deviation, s, is either
or
.
When N is large, the population and sample
standard deviations do not differ by much. However when N is small,
they could be quite different. In fact, if only one measurement were made,
the population standard deviation would be
, suggesting
that the measurement was infinitely precise. This doesn’t make sense, especially
if we know that subsequent measurements would vary. For a single measurement,
the sample standard deviation would be
(which
is, at least, undefined) suggesting that no information about precision was
obtained.
This latter observation
suggests that the sample standard deviation is the one to use for a
series of measurements of a given quantity that is presumed to be fixed (such
as the length or weight of a single item). The justification for this is
that the measuring instrument and measuring process are giving us varying
results about an exact value which we do not really know but can only discover
approximately through our imperfect measuring instruments and procedures.
A single measurement gives us no information about how our measurements would
vary, and the sample standard deviation reflects this by being undetermined.
We shall say for the moment that the sample standard deviation (the
one with the N-1 in the denominator) is the one to use for expressing
the uncertainty in a measurement, however, it depends on what we are measuring,
and we will need to explore this further after we discuss a few more topics.
An Example of Reporting a Measurement Properly
Suppose we have the following
set of measurements of a length in centimeters: 17.15, 17.42, 17.34, 17.27,
17.19, and 17.30. Table 2 shows the details of the calculations of the mean
and standard deviations. We have deliberately chosen some numbers that illustrate
the calculations but raise some additional issues as well.
Table 2:
An Example of the Calculations of the Mean and Standard Deviations
| i |
xi |
|
|
| 1 |
17.15 |
-0.128 |
0.016 |
| 2 |
17.42 |
+0.142 |
0.020 |
| 3 |
17.34 |
+0.062 |
0.004 |
| 4 |
17.27 |
-0.008 |
|
| 5 |
17.19 |
-0.088 |
0.008 |
| 6 |
17.30 |
+0.022 |
|
| |
|
|
|
| N
= 6 |
103.67 |
|
0.049 |
| |
17.278
|
|
0.008
|
| |
|
|
0.010
|
The first column of Table
2 contains the measurement number. The second column shows the individual
measurements, their total, and their mean. The third column shows the individual
deviations from the mean. The fourth column shows the squares of the deviations
from the mean, the total of these, and both a population and a sample standard
deviation. According to standard practice, the measurement of the length
should be reported as 17.28 ± 0.10 centimeters (that is,
). However,
a careful look at the numbers raises some questions about how many digits
to retain in the answer.
Notice that each measurement
is reported to four digits. This suggests that the instrument used to make
these measurements was capable of measuring to this precision, but note also
that the individual measurements are varying in the first decimal place
(the third digit). The object itself apparently has a roughness that is larger
than the resolution of the instrument. If this is the case, what is the meaning
of the second decimal place? How can we assume that the second decimal place
has anything to do with the length of the object if the first decimal place
is varying between 1 and 4? Notice also that the calculations were carried
out to three decimal places but rounded down to two. If the first decimal
place is varying, certainly the third decimal place cannot have any meaning,
so we drop it. But how do we handle that second decimal place? We need to
discuss the concept of significant figures and how we decide what digits to
keep, but we can give a preliminary rationale for the answer we just stated.
One possibility would
be to acknowledge that the object itself has a roughness in its length that
affects the first decimal place in its measurement. Since it is this first
decimal place that is varying, we could argue that the measurement should
be reported as 17.3 ± 0.1 centimeters because
this does not give the impression that anything is known about the second
decimal place, and should thus be a fair report of the length of the object
and its variation. In fact, this is the way the measurement might be reported
if no other information were to be conveyed. However, by reporting
the measurement as 17.28 ± 0.10 centimeters, we are implying something additional about the measuring
process, namely that the resolution of the measuring instrument was more than
sufficient for the measurement at hand. We have gained some additional confidence
that the variation in the first decimal place is indeed real, and was easily
measured using this instrument.
We now begin to see how
much information can be conveyed in reporting a measurement, but we must also
be careful that we do not convey any false impressions about our measurements.
For example, if (by using the un-rounded calculations from the table) we were
to report the measurement as 17.278 ± 0.099 centimeters, we would be claiming that we could detect changes
in the third decimal place in our measurements, and this is not evident in
the individual measurements shown in the table. The additional digits are
artifacts of the calculations, especially the process of division. They do
not represent any knowledge beyond the second decimal place.
It may seem that the calculations
shown in Table 2 are tedious and time-consuming. However, modern graphing
calculators and computer software contain built-in procedures for doing these
routine kinds of calculations and much more. One merely enters the data into
appropriate lists and calls up the programs to do these calculations. Many
of these built-in computer and calculator routines do several standard calculations
simultaneously. So the process of generating this information about measurements
is not as difficult as it once was. The fact that so many data handling routines
are now standard features in the firmware and software of calculators and
computers attests to the frequency and importance of these kinds of calculations
in scientific and engineering applications.
Responsible reporting
of measurements is required of scientists and engineers, and they can get
very irritable when reviewing reported measurements that do not make sense.
Novices are easily spotted by the inconsistent ways they report their measurements,
and their data and experimental results immediately become suspect when this
happens. One of the biggest faux pas that marks a novice is the reporting
of more digits than is warranted by a measurement. This not only demonstrates
a lack of understanding of the measuring process on the part of the novice,
it also suggests that the novice is not sufficiently aware of limitations
and pitfalls in the experiment itself. Sometimes these novices believe they
are covering up sloppiness and laziness by reporting many digits, or that
these extra digits somehow make the experiment more accurate. However, there
is almost always an internal consistency in a good set of measurements that
is a strong indication of good experimentation. This is because the nature
and behaviors of measurements follow physical laws, and good experimenters
are keenly aware of this. Novices who report inconsistent measurements blatantly
reveal their ignorance of scientific experimentation.