If the maximum or minimum value of a dependent variable is known,
then one can detect ceiling or floor effects easily. This
strongly suggests that the dependent variable should not be
open-ended; for example, it is easy to see a ceiling effect if y
is a percentage score that approaches 100% in the treatment and
control conditions. But the mere fact that y is bounded does not
ensure we can detect ceiling and floor effects. For example, the
acreage burned by fires in Phoenix is bounded--no fewer than zero
acres are ever burned by a fire--so if two versions of the
Phoenix planner each lost approximately zero acres when they
fought fires, we would recognize a ceiling effect (approached from
above). But now imagine running each version of the planner (call
them P and )
on ten fires and calculating the mean acreage
lost by each:
and
.
Does this result mean
is not really better than P, or have we
set ten fires so challenging that 49.5 acres is nearly the best possible performance?
To resolve this question--to detect a ceiling effect--it doesn't
help to know that zero is the theoretical best bound on lost acreage;
we need to know the practical best bound for the 10 fires we
set. If it is, say, 10 acres, then is no better than P.
But if the practical best bound is, say, 47 acres, then the possible
superiority of
is obscured by a ceiling effect. To
tease these interpretations apart, we must estimate the practical best
bound. A simple method is illustrated in table 3.1. For each fire, the least
acreage lost by P and
is an upper bound on the practical
minimum that would be lost by any planner given that fire. For
example,
lost 15 acres to fire 1, so the practical minimum
number of acres lost to fire 1 is at least 15. The average
of these minima over all fires, 400/10 = 40, is an overestimate of
the practical best performance. If this number was very close to the average
areas lost by P and
, we could claim a ceiling effect.
In fact, one planner can contain any fire in the sample with,
on average, ten fewer acres lost than the other planner. So we simply
cannot claim that the average areas lost by P or
are the
practical minimum losses for these fires. In other words, there is no
ceiling effect and no reason to believe the alleged superiority of
is obscured.
Table 3.1: The acreage lost by fires fought by P and , and the minimum acreage lost.
FIRE | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Total |
---|---|---|---|---|---|---|---|---|---|---|---|
P | 55 | 60 | 50 | 35 | 40 | 20 | 90 | 70 | 30 | 50 | 500 |
![]() | 15 | 50 | 50 | 75 | 65 | 40 | 60 | 65 | 40 | 35 | 495 |
Min | 15 | 50 | 50 | 35 | 40 | 20 | 60 | 65 | 30 | 35 | 400 |
A dramatic example of ceiling effects came to light when Robert Holte analyzed 14 datasets sets from a corpus that had become a mainstay of machine learning research. The corpus is maintained by the Machine Learning Group at the University of California, Irvine. All 14 sets involved learning classification rules, which map from vectors of features to classes. Each item in a dataset includes a vector and a classification, although features are sometimes missing, and both features and classifications are sometimes incorrect. All datasets were taken from real classification problems, such as classifying mushrooms as poisonous or safe, and classifying chess endgame positions as wins for white or black. Holte included two other sets, not from the Irvine corpus, in his study, as well.
Table 3.2: Average classification accuracies for two algorithms that learn classification rules, C4 and 1R*
Dataset | BC | CH | GL | G2 | HD | HE | HO | HY |
---|---|---|---|---|---|---|---|---|
C4 | 72 | 99.2 | 63.2 | 74.3 | 73.6 | 81.2 | 83.6 | 99.1 |
1R* | 72.5 | 69.2 | 56.4 | 77 | 78 | 85.1 | 81.6 | 97.2 |
Max | 72.5 | 99.2 | 63.2 | 77 | 78 | 85.1 | 83.6 | 99.1 |
Dataset | IR | LA | LY | MU | SE | SO | VO | VI | Mean |
---|---|---|---|---|---|---|---|---|---|
C4 | 93.8 | 77.2 | 77.5 | 100.0 | 97.7 | 97.5 | 95.6 | 89.4 | 85.9 |
1R* | 95.9 | 87.4 | 77.3 | 98.4 | 95 | 87 | 95.2 | 87.9 | 83.8 |
Max | 95.9 | 87.4 | 77.5 | 100.0 | 97.7 | 97.5 | 95.6 | 89.4 | 87.4 |