If the maximum or minimum value of a dependent variable is known, then one can detect ceiling or floor effects easily. This strongly suggests that the dependent variable should not be open-ended; for example, it is easy to see a ceiling effect if y is a percentage score that approaches 100% in the treatment and control conditions. But the mere fact that y is bounded does not ensure we can detect ceiling and floor effects. For example, the acreage burned by fires in Phoenix is bounded--no fewer than zero acres are ever burned by a fire--so if two versions of the Phoenix planner each lost approximately zero acres when they fought fires, we would recognize a ceiling effect (approached from above). But now imagine running each version of the planner (call them P and ) on ten fires and calculating the mean acreage lost by each: and . Does this result mean is not really better than P, or have we set ten fires so challenging that 49.5 acres is nearly the best possible performance?
To resolve this question--to detect a ceiling effect--it doesn't help to know that zero is the theoretical best bound on lost acreage; we need to know the practical best bound for the 10 fires we set. If it is, say, 10 acres, then is no better than P. But if the practical best bound is, say, 47 acres, then the possible superiority of is obscured by a ceiling effect. To tease these interpretations apart, we must estimate the practical best bound. A simple method is illustrated in table 3.1. For each fire, the least acreage lost by P and is an upper bound on the practical minimum that would be lost by any planner given that fire. For example, lost 15 acres to fire 1, so the practical minimum number of acres lost to fire 1 is at least 15. The average of these minima over all fires, 400/10 = 40, is an overestimate of the practical best performance. If this number was very close to the average areas lost by P and , we could claim a ceiling effect. In fact, one planner can contain any fire in the sample with, on average, ten fewer acres lost than the other planner. So we simply cannot claim that the average areas lost by P or are the practical minimum losses for these fires. In other words, there is no ceiling effect and no reason to believe the alleged superiority of is obscured.
Table 3.1: The acreage lost by fires fought by P and , and the minimum acreage lost.
FIRE | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Total |
---|---|---|---|---|---|---|---|---|---|---|---|
P | 55 | 60 | 50 | 35 | 40 | 20 | 90 | 70 | 30 | 50 | 500 |
15 | 50 | 50 | 75 | 65 | 40 | 60 | 65 | 40 | 35 | 495 | |
Min | 15 | 50 | 50 | 35 | 40 | 20 | 60 | 65 | 30 | 35 | 400 |
A dramatic example of ceiling effects came to light when Robert Holte analyzed 14 datasets sets from a corpus that had become a mainstay of machine learning research. The corpus is maintained by the Machine Learning Group at the University of California, Irvine. All 14 sets involved learning classification rules, which map from vectors of features to classes. Each item in a dataset includes a vector and a classification, although features are sometimes missing, and both features and classifications are sometimes incorrect. All datasets were taken from real classification problems, such as classifying mushrooms as poisonous or safe, and classifying chess endgame positions as wins for white or black. Holte included two other sets, not from the Irvine corpus, in his study, as well.
Table 3.2: Average classification accuracies for two algorithms that learn classification rules, C4 and 1R*
Dataset | BC | CH | GL | G2 | HD | HE | HO | HY |
---|---|---|---|---|---|---|---|---|
C4 | 72 | 99.2 | 63.2 | 74.3 | 73.6 | 81.2 | 83.6 | 99.1 |
1R* | 72.5 | 69.2 | 56.4 | 77 | 78 | 85.1 | 81.6 | 97.2 |
Max | 72.5 | 99.2 | 63.2 | 77 | 78 | 85.1 | 83.6 | 99.1 |
Dataset | IR | LA | LY | MU | SE | SO | VO | VI | Mean |
---|---|---|---|---|---|---|---|---|---|
C4 | 93.8 | 77.2 | 77.5 | 100.0 | 97.7 | 97.5 | 95.6 | 89.4 | 85.9 |
1R* | 95.9 | 87.4 | 77.3 | 98.4 | 95 | 87 | 95.2 | 87.9 | 83.8 |
Max | 95.9 | 87.4 | 77.5 | 100.0 | 97.7 | 97.5 | 95.6 | 89.4 | 87.4 |