We can evaluate empirically whether, in practice, getting slightly
more or fewer execution traces would have significantly changed which
dependencies were detected in execution traces gathered from
Phoenix. We do so by seeing how many of the dependencies would not
have been detected if the counts in row one in the contingency table
varied by a small amount. For example, if the contingency table in
Table 1 had [52,35,240,643] instead of
[52,33,240,643], then G=40.42, which is not much different than the
value for Table 1 of G=42.86. To determine whether
the dependencies detected in the execution traces are vulnerable to
noise, we can do the following test: 1) construct the contingency
table for dependencies detected in execution traces, 2) vary, one at a
time, the counts of row one, column one and row one, column two by
, and 3) run a G-test on the resulting contingency tables. By
tweaking the contingency table cell values in this manner, we check
the sensitivity of G to noise in the data. Both columns of the first
row were varied because some of the first column counts were 1, which
makes it impossible to test whether a lower ratio of first column to
second column might not have influenced the value of G more than a
higher ratio. We tweaked the counts by
because many contingency
tables contained cell counts of less than 5, varying by
spans
that range.
Table 2: Dependencies remaining after tweaking contingency tables. The
table includes the number of dependencies remaining after tweaking
over the total number of dependencies found in the execution traces
from the four experiments.
Table 2 shows how many of the dependencies found in each
of four sets of execution traces for Phoenix would remain if their
contingency tables are so tweaked. About 65% of the dependencies
remain after tweaking their contingency table values, meaning that the
counts in the first row of the contingency table can be changed by
without dropping the significance of G below the level of
. So, 35% of the dependencies detected would disappear if a
few patterns more or less were included in the execution traces. Based
on this testing of execution traces from Phoenix, dependency detection
is sensitive to small differences in the content of the execution
traces. Most of the dependencies that were vulnerable to the tweaking
were based on execution traces that included few instances of the
precursor/failure pattern, 23 out of 44 or 52% of the dependencies
that disappeared were based on contingency tables in which one of the
counts in the first row was less than five.
The implication of the sensitivity of dependency detection to noise in the execution traces is that rare patterns are especially sensitive to noise and so should be viewed skeptically. One must interpret the results of dependency detection with care: if ``sensitive" dependencies are discarded, then rare events may remain undetected; at the same time, one does not wish to chase chimeras. Interpreting dependencies requires weighing false positives against misses. If we are trying to identify dependencies between precursors that occur rarely or failures that occur rarely, then additional effort should be expended to get enough execution traces to ensure that the dependency is not due to noise.