Background
The Nature and Testing of Theories
What a Theory Means in Science
The word theory
has a much more precise meaning in science than it does in ordinary language.
In common language, a theory is often meant to suggest a guess or a hunch.
In everyday discourse, a theory is commonly said to be synonymous with a hypothesis.
However, in science, a theory is a structure of cause-and-effect relationships
proposed to exist among observed phenomena in the physical universe.
A theory is constructed (by humans) to explain logically and completely
why various observations or phenomena occur in the natural world and
are related in the ways we find them to be. For example, in biology, the
theory of evolution provides the best current explanation of the diversity
of life on this planet and of the relationships that exist among various organisms
past and present. In physics, theories are highly mathematical structures
that attempt to explain the deepest known workings of the physical universe.
They have names such as electromagnetic theory, quantum theory, the special
and general theories of relativity, or elementary particle theory.
Theories can very widely
in their complexities. Scientists prefer the simpler theoretical structures
over the complicated ones because the simple ones seem to give more explanations
with fewer assumptions. Nature seems to prefer simpler theories also, but
this is by no means certain. Generally, if scientists are choosing between
competing theories that purport to explain phenomena in the universe, they
usually feel the simpler theory is more likely to be correct. However, in
science, the ultimate arbiter of theoretical “disputes” is experiment. No
matter how simple and beautiful a theory is thought to be, if it does not
pass the experimental tests, it is rejected.
As theories become more
thoroughly tested and are found to have very specific explanatory power in
a broad range of circumstances, they sometimes become known as laws.
We then hear people referring to the “laws of physics” or the “laws of thermodynamics”,
or “Newton’s laws”, or Lenz’s law, etc.. When theories are tentative and
do not yet have much experimental or observational support, they are more
appropriately called hypotheses (but physicists often call them theories
anyway because of the structure of relationships designed into them). Before
such hypotheses can be accepted and can advance to the level of being called
laws, they must run the gantlet of extensive testing and verification by many
members of the scientific community. This process can take many decades.
Testing and Validating Theories
One of the hallmarks of
scientific practice is the requirement that all scientific hypotheses and
theories be testable, and falsifiable. Obviously if there is
no conceivable way to test a hypothesis or a theory, there is no way to find
out whether it is true or false. For a hypothesis or a theory to be falsifiable,
there must be a conceivable experiment or observation that could disprove
it. The hypothesis or theory must predict something that could be shown to
be wrong. In fact, a lot of effort goes into experiments that try to show
a theory is wrong, because these are among the better ways to test a theory.
A theory that continues to withstand such onslaughts has a higher probability
of being correct (unless it is an inane theory that doesn’t predict anything
or always offers an ad-hoc reason why experiments fail to confirm or
refute it).
We can never completely
prove a theory in science; we can only show that data are consistent
with the theory and lend it support (note that another theory may explain
the same experimental observations). The broader the experimental and observational
support, the more confidence we have in a theory. The best theories make
precise predictions that are eventually verified by experiment, even though
it may take years of technological improvements in instrumentation before
this happens. And although we cannot completely prove a theory with many
experiments, it is logically possible to disprove a theory with one good
experiment. (“Good” means that the experiment is valid and definitive.)
A sloppy, incompetent experiment that does not observe what a theory predicts
cannot be considered a refutation of the theory. In fact, the broader the
experimental support for a theory, the more experimenters are obligated to
examine their experimental designs before declaring that their experiments
disprove a well-established theory.
However, if new information
that the theory cannot explain or predict eventually becomes available and
confirmed, then the time becomes ripe for an improvement or a replacement
of the theory. This is seen as an exciting time in science because it could
mean that a clearer picture of the workings of Nature is about to emerge.
Even though it can be confusing and exasperating, scientists love to be a
part of this process. Times like this are what scientists enjoy most, and
are among the main reasons they choose to become scientists.
It has been said that
science moves forward on the two legs of theory and experiment. These two
reinforce and refine each other, and lead us toward a deeper understanding
of Nature. As competing theories are tested, with some eventually verified
and others eliminated, the hope is that the few that survive are the ones
that give us the best understanding of the wide ranges of phenomena we see
in the universe around us. Good theories provide us with a logical web of
interrelationships that help us predict what will happen when we observe specific
circumstances in Nature. The ability to predict is one of the best indicators
of our level of scientific understanding.
In any case, whether in
confirming or disproving scientific theories, experimental science is challenging
and full of pitfalls and surprises. Being a research scientist requires years
of training, finely honed experimental skills, skepticism, high integrity,
and a willingness to let go of cherished hypotheses when these do not receive
experimental support and verification by colleagues.
Differences Between the Biological and Physical Sciences
While the underlying logical
structure and methodologies are similar across all of the sciences, there
are differences in the day-to-day activities of scientists and in the tools
that are most frequently used by scientists practicing in the various disciplines.
All scientists use statistics, however some statistical tools are used more
frequently in some disciplines than in others. Nearly all scientists do experiments
that contain experimental and control groups, but some sciences use
these more than others.
We can make some comparisons
by considering some of the differences between biological experiments and
physics experiments. We choose biology and physics because these two sciences
lie at different ends of the spectrum of complexity in the systems they study,
biology studying the most complex systems, physics trying to find the simplest.
They also differ greatly in the way they use mathematics. All of the other
sciences (chemistry, geology, astronomy, cosmology, etc.) fall somewhere between
the ends of this complexity spectrum. Each area of science has its characteristic
differences from the others, but each looks to the others for insights and
inspiration, especially in these modern times. Many scientists work on the
borders between these sciences where they draw upon the knowledge and insights
of several disciplines.
The Use of Control Groups in Biology Experiments
In biology, the systems
studied by the biologist (namely, plants and animals) are among the most complex
systems in the universe. To do experiments with living organisms, one must
often measure the effects of an experiment by comparing an experimental
group with a control group. These are separate but supposedly
“identical” groups of organisms except for an experimental change that is
imposed on the experimental group but not on the control group. The
idea is that, if everything else is the same in both groups, any differences
between the two groups that arise in the course of the experiment must be
a result of the experimental change imposed on the experimental group. While
this sounds easy, it is very challenging to do in practice. Living organisms
are extremely complicated and are not really identical in any characteristic
an experimenter may choose to investigate. They can also respond in unpredictable
ways to the most subtle of changes, and this can often invalidate an experiment.
Suppose a characteristic
being studied (e.g., height, weight, color, immunity) differs between the
experimental and control groups after an experiment has been done. The experimenter
must be able to show that the difference can be explained only by a
change (e.g., the kind of nourishment) deliberately introduced into the experimental
group. This is complicated by the fact that the groups will usually develop
differences in characteristics even if nothing is done by the experimenter.
The question that the experimenter must answer is, “Is the difference in question
significant, that is, is it more than would be expected if nothing
were done to the experimental group?”
To answer this question,
the experimenter must determine the ranges over which the characteristic will
normally vary all by itself. This information is an important part of the
experiment. Given this information, the experimenter then asks, “What
is the probability that the observed difference is due to normal chance
variations alone?” If the difference is more than would be expected
from chance variations alone (how much more can be measured statistically),
then this difference between groups may be due to the experimental
change introduced into the experimental group. (We say “may” because some
unforeseen or unnoticed phenomena could be responsible for the difference
between the groups.) If the difference between the two groups is no more
than would be expected from chance variations alone, then the experimenter
must report that the experiment had no significant effect on the experimental
group. (However, it is still possible that an effect would have occurred
but was nullified by an unknown or unforeseen phenomena that crept
into the experiment despite precautions. Experimenting is not easy! Experienced
researchers are constantly on the alert for these complications so they can
eliminate them or compensate for them. Peer review is also very important
in research for just this reason.)
The tools used to quantify
the answers to these questions (how much difference is significant?)
come from statistics and are called tests of significance. They have
names such as the Student’s t-test, the F-test, and chi-square tests. These
will be discussed in the sections on statistical tests.
Finding Mathematical Relationships in Physics Experiments
While physicists do controlled
experiments also, they are often testing theories which have a mathematical
structure. Such theories will often predict a specific mathematical relationship
between experimental variables, and the experimenter must determine whether
or not such a mathematical relationship really exists for a set of variables
being studied. To do this, the experimenter collects data, consisting of
pairs of independent and dependent variables, and plots them
on a graph. As in any experiment, there are experimental uncertainties.
These uncertainties will produce a plotted curve that has points scattered
about some trend line. Sometimes the trend will be clear, but at other times,
especially when working at the technological limits of the instruments, uncertainties
will be large enough that this trend can be vague. The experimenter must
find a mathematical curve that best represents the data. There may
be several mathematical formulas that produce curves similar to the one being
shown by the data. To decide among these, the experimenter can do what is
called a regression fit or a least-squares fit of each of the
mathematical curves to the data. There are quantitative statistical measures
that help in determining how well such curves fit the data. You will learn
of terms such as the correlation coefficient, chi-square, r-value,
Frobenius norm, or chi-square minimization, and so on. In most
cases, there is at least one theory guiding the experiment, and it is often
some theory that prompts the experimenter on which mathematical curves to
try.
Often the data collected
by a physicist contain the results of multiple effects occurring in the experiment
simultaneously. A common example is when a physicist is attempting to measure
the radioactive half life of an element that is decaying into another element
that is also radioactive. As the number of atoms of the original element
decreases, the number of atoms of the daughter element increases. If the
detector is measuring gamma rays from both elements, the simple exponential
decay curve expected from the original element is mixed together with the
decay curve of the daughter element. If the physicist is able to recognize
what is happening, she may need to use mathematical techniques to separate
these two curves. This is especially true if it is difficult or impossible
to monitor the gamma rays from the two elements separately.
Among the other graphing
tools used by a physicist are the linearization techniques which produce
a straight line on a graph of the experimental data. An example is a variable
that changes exponentially with respect to another variable. If the
logarithm of the dependent variable is plotted against the independent
variable (called a semi-log plot), a straight line is obtained. As another
example, if plotting the logarithm of one variable against the logarithm of
the other (a log-log plot) produces a straight line, then the experimental
variables are related by a power law. Sometimes plotting one variable
against the square of the other will produce a straight line. There are many
ways of doing the plotting depending on what is being sought, and depending
on what theory is guiding the experiment. From plots such as these, the equation
relating the two variables can be extracted. Some of these techniques are
discussed in the sections dealing with plotting data from physics experiments.