Most of the hypotheses we test in experiments are about causal
relationships between factors. Sometimes the causal argument is
implicit, but if one experiment doesn't address it, a subsequent
experiment usually will. Suppose the hypothesis is that an information
retrieval system, A, has a higher recall rate than a related system,
. This hypothesis isn't explicitly causal, but unless it is
just a wild, unmotivated guess (``Gee whiz, perhaps A outperforms
; one of them has to be best, right?'') there must be some
reasons for it. We must think A and
perform differently <
because one has features the other lacks, or because they solve
different problems, or they are constrained differently by their
environments, and so on. An exploratory study might stumble
upon a difference--or a murky suggestion of a difference--without
intending to, but an experiment is designed to demonstrate a
difference and the reasons for it.
Although experimental hypotheses have the form, ``factor X affects
behavior Y,'' experiments rarely test such hypotheses directly.
Instead, factors and behaviors are represented by measured variables
x and y, and an experiment seeks some sort of dependency between
them. In a manipulation experiment, we manipulate x and record
the effects on y. For the simple hypothesis A outperforms
, x is the identity of the system, A or
, and y
is the performance variable, in this case, recall. For the
hypothesis that A outperforms
because A has a larger
dictionary, x is the size of the dictionary and y is recall,
again. For the hypothesis that A outperforms
because A has
a parser that handles compound nouns better, we have two x
variables, the identity of the parser (
) and the prevalence
of compound nouns in the test problems (
). We expect the
effect of
on y to depend on
; for example, it
shouldn't matter which parser is used if the test
set includes no compound nouns. Whatever the hypothesis, a
manipulation experiment tests it by manipulating x and recording the
effect on y.
In an observation experiment we again test a causal relationship between factor X and behavior Y, but we cannot find a variable x to manipulate directly. Your hypothesis might be that smoking causes lung cancer, but you cannot ethically manipulate whether or not people in a sample smoke. Your hypothesis might be that girls score higher on math tests than boys, but you cannot say to an individual, ``For today's experiment, I want you to be a boy.'' In an observation experiment, the observed variable x is used to classify individuals in a sample, then y is computed for each class and the values compared. (It is hoped that individuals who differ on x also differ on factor X, but it doesn't always work out. Recall, for example, how various Olympic committees struggled to find appropriate observable indicators of gender.)
In manipulation experiments, x variables are called independent variables and y variables dependent variables. This terminology reflects the purpose of the experiment--to demonstrate a causal dependence between x and y. The same terms also apply in some observational experiments, although often x is called a predictor and y a response variable. Again, the terminology reflects the purpose of the experiment: if the goal is to see whether x predicts accurately the value of y--whether, for example, the number of cigarettes smoked per day predicts the probability of lung cancer--then the predictor-response terminology is more descriptive.
Manipulation and observation experiments produce two kinds of
effects. Simple effects demonstrate that x influences
y, while interaction effects show and
in concert
influence y. For example, system A outperforms
(a simple
effect) and the magnitude of the difference depends on the prevalence of
compound nouns in test items (an interaction between the systems' parsers
and the test material).
Although most experiments vary only two or three independent variables at a time, many more factors influence the dependent variable, as you can see by comparing young Fred and Abigail in Figure 3.1.
Figure 3.1 Dependent variables are influenced by many factors.
Your hypothesis might be that gender influences math scores but until you rule out the number of siblings, the parent's occupations, the child's height and more besides, you can't be sure that gender--and not something else--is responsible for a simple effect on math scores. Nor can you be certain gender is the proximal influence; perhaps gender influences teachers' attitudes and the attention they pay to each child, which influences the child's confidence, and thus test-taking skills and math scores. Ruling out alternative explanations is the purpose of experimental control.