Next: The Concept of Control Previous: Table of Contents

Basic Issues in Experiment Design

Most of the hypotheses we test in experiments are about causal relationships between factors. Sometimes the causal argument is implicit, but if one experiment doesn't address it, a subsequent experiment usually will. Suppose the hypothesis is that an information retrieval system, A, has a higher recall rate than a related system, . This hypothesis isn't explicitly causal, but unless it is just a wild, unmotivated guess (``Gee whiz, perhaps A outperforms ; one of them has to be best, right?'') there must be some reasons for it. We must think A and perform differently < because one has features the other lacks, or because they solve different problems, or they are constrained differently by their environments, and so on. An exploratory study might stumble upon a difference--or a murky suggestion of a difference--without intending to, but an experiment is designed to demonstrate a difference and the reasons for it.

Although experimental hypotheses have the form, ``factor X affects behavior Y,'' experiments rarely test such hypotheses directly. Instead, factors and behaviors are represented by measured variables x and y, and an experiment seeks some sort of dependency between them. In a manipulation experiment, we manipulate x and record the effects on y. For the simple hypothesis A outperforms , x is the identity of the system, A or , and y is the performance variable, in this case, recall. For the hypothesis that A outperforms because A has a larger dictionary, x is the size of the dictionary and y is recall, again. For the hypothesis that A outperforms because A has a parser that handles compound nouns better, we have two x variables, the identity of the parser ( ) and the prevalence of compound nouns in the test problems ( ). We expect the effect of on y to depend on ; for example, it shouldn't matter which parser is used if the test set includes no compound nouns. Whatever the hypothesis, a manipulation experiment tests it by manipulating x and recording the effect on y.

In an observation experiment we again test a causal relationship between factor X and behavior Y, but we cannot find a variable x to manipulate directly. Your hypothesis might be that smoking causes lung cancer, but you cannot ethically manipulate whether or not people in a sample smoke. Your hypothesis might be that girls score higher on math tests than boys, but you cannot say to an individual, ``For today's experiment, I want you to be a boy.'' In an observation experiment, the observed variable x is used to classify individuals in a sample, then y is computed for each class and the values compared. (It is hoped that individuals who differ on x also differ on factor X, but it doesn't always work out. Recall, for example, how various Olympic committees struggled to find appropriate observable indicators of gender.)

In manipulation experiments, x variables are called independent variables and y variables dependent variables. This terminology reflects the purpose of the experiment--to demonstrate a causal dependence between x and y. The same terms also apply in some observational experiments, although often x is called a predictor and y a response variable. Again, the terminology reflects the purpose of the experiment: if the goal is to see whether x predicts accurately the value of y--whether, for example, the number of cigarettes smoked per day predicts the probability of lung cancer--then the predictor-response terminology is more descriptive.

Manipulation and observation experiments produce two kinds of effects. Simple effects demonstrate that x influences y, while interaction effects show and in concert influence y. For example, system A outperforms (a simple effect) and the magnitude of the difference depends on the prevalence of compound nouns in test items (an interaction between the systems' parsers and the test material).

Although most experiments vary only two or three independent variables at a time, many more factors influence the dependent variable, as you can see by comparing young Fred and Abigail in Figure 3.1.

gender_is_male

Figure 3.1 Dependent variables are influenced by many factors.

Your hypothesis might be that gender influences math scores but until you rule out the number of siblings, the parent's occupations, the child's height and more besides, you can't be sure that gender--and not something else--is responsible for a simple effect on math scores. Nor can you be certain gender is the proximal influence; perhaps gender influences teachers' attitudes and the attention they pay to each child, which influences the child's confidence, and thus test-taking skills and math scores. Ruling out alternative explanations is the purpose of experimental control.

Next: The Concept of Control Previous: Table of Contents

Exper imental Methods for Artificial Intelligence, Paul R. Cohen, 1995
Mon Jul 15 17:05:56 MDT 1996