Thus far, we have been concerned with experiment design decisions that clarify or obscure the results of experiments. Our focus now shifts to the results, themselves. There is a growing concern in some areas of AI that experimental results tend to be minor and uninteresting. In the area of machine learning, for instance, where experimental methods dominate the scene, several editorials and articles have urged researchers to not subjugate their research interests to the available methods (Dietterich, 1990; Porter, 1991). Too often, we fail to recognize the distinction between research questions and experimental hypotheses. Consider a hypothetical dialog between two AI researchers:
A: What are you doing these days?
B: I am running an experiment to compare the performance of a genetic algorithm to the performance of a backpropagation algorithm.
A: Why are you doing that?
B: Well, I want to know which is faster.
A: Why?
B: Lots of people use each kind of algorithm, so I thought it would be worth learning which is faster.
A: How will these people use your result?
At this point in the conversation we will learn whether the experimenter has a reason for comparing the algorithms besides discovering which is faster. Clearly, the experimental question is which is faster, but what is the underlying research question? Contrast the previous dialog with this one:
A: What are you doing these days?
B: I am comparing the performance of identical twins reared apart to the performance of identical twins reared together, and comparing both to nonidentical twins and ordinary siblings.
A: Why are you doing that?
B: Because identical twins are genetically identical, so by comparing identical twins reared together and apart, we can get independent estimates of the genetic and social contributors to performance.
A: Why do you want to do that?
B: Because the role of genetics in behavior is one of the great unresolved questions.
Here, the experimental question is undergirded by a much more important research question, namely, how much of our behavior is genetically influenced? One wouldn't attempt to answer the experimental question for its own sake, because it would be an enormously expensive fishing expedition, and because the experiment design makes sense only in the context of the research question.
Let us call experiments without underlying research questions face value experiments. They are not designed to provide evidence about research questions, but their results might be interpreted this way after the fact. Anything we learn about research questions from face value experiments is learned incidentally; uncovering such evidence is not the purpose of these experiments. The probability that incidental evidence will answer a research question is extremely small. Many things must be controlled, and much can go wrong in experiment design, so it is very unlikely that someone else's face value experiment would answer your research question. It follows that the results of face value experiments rarely interest anyone other than the people who ran them. Thus, generalization of the results of these experiments happens only by chance, after the fact. No wonder we hear the concern that experimental results are minor and unimportant.
When you design an experiment, consider its purpose. If it is not to provide evidence about a research question, then what? If your results are not intended to cast light on a research question, what are they for?