We can evaluate empirically whether, in practice, getting slightly more or fewer execution traces would have significantly changed which dependencies were detected in execution traces gathered from Phoenix. We do so by seeing how many of the dependencies would not have been detected if the counts in row one in the contingency table varied by a small amount. For example, if the contingency table in Table 1 had [52,35,240,643] instead of [52,33,240,643], then G=40.42, which is not much different than the value for Table 1 of G=42.86. To determine whether the dependencies detected in the execution traces are vulnerable to noise, we can do the following test: 1) construct the contingency table for dependencies detected in execution traces, 2) vary, one at a time, the counts of row one, column one and row one, column two by , and 3) run a G-test on the resulting contingency tables. By tweaking the contingency table cell values in this manner, we check the sensitivity of G to noise in the data. Both columns of the first row were varied because some of the first column counts were 1, which makes it impossible to test whether a lower ratio of first column to second column might not have influenced the value of G more than a higher ratio. We tweaked the counts by because many contingency tables contained cell counts of less than 5, varying by spans that range.
Table 2: Dependencies remaining after tweaking contingency tables. The
table includes the number of dependencies remaining after tweaking
over the total number of dependencies found in the execution traces
from the four experiments.
Table 2 shows how many of the dependencies found in each of four sets of execution traces for Phoenix would remain if their contingency tables are so tweaked. About 65% of the dependencies remain after tweaking their contingency table values, meaning that the counts in the first row of the contingency table can be changed by without dropping the significance of G below the level of . So, 35% of the dependencies detected would disappear if a few patterns more or less were included in the execution traces. Based on this testing of execution traces from Phoenix, dependency detection is sensitive to small differences in the content of the execution traces. Most of the dependencies that were vulnerable to the tweaking were based on execution traces that included few instances of the precursor/failure pattern, 23 out of 44 or 52% of the dependencies that disappeared were based on contingency tables in which one of the counts in the first row was less than five.
The implication of the sensitivity of dependency detection to noise in the execution traces is that rare patterns are especially sensitive to noise and so should be viewed skeptically. One must interpret the results of dependency detection with care: if ``sensitive" dependencies are discarded, then rare events may remain undetected; at the same time, one does not wish to chase chimeras. Interpreting dependencies requires weighing false positives against misses. If we are trying to identify dependencies between precursors that occur rarely or failures that occur rarely, then additional effort should be expended to get enough execution traces to ensure that the dependency is not due to noise.