Dependency detection is syntactic and requires little knowledge of the planner or its environment; thus, it can be applied to any planner in any environment. To begin, we gather execution traces of the planner. To determine how the planner's actions might lead to failure, the execution traces include failures and the recovery actions that repaired them, as in the following short trace:
F's are failures (e.g., is the Insufficient Progress failure in Phoenix) and R's are recovery actions (e.g., is the Substitute Projection action in the Phoenix Planner's recovery action set). It appears from this short trace that the failure ip is always preceded by the recovery action sp. We call disproportionately high co-occurrences between failures and particular precursors dependencies. In this example, the precursor of ip is the action sp, but conceptually, it could be any combination of predecessors in the trace: the preceding failure, the combination of the failure and the recovery action that repaired it, or even longer combinations of previous actions and failures. Currently, the analysis looks at only singles (i.e., failures or recovery actions) and pairs (i.e., failures and recovery actions) as precursors.
With more data, we can test whether the observed relationship between sp and ip is statistically significant. We build a contingency table of four cells: one for each combination of precursor and failure and their negations. For example, the contingency table in Table 1 tests whether depends on the precursor .
Table 1: Contingency table for testing the pattern .
We test the significance of the observed relationship, , with a G-test on the contingency table. A G-test on this table is highly significant, G=42.86,p=.001, meaning that it is highly unlikely that the observed dependence is due to chance or noise.
We construct contingency tables for three types of immediate precursors: failures, recovery actions, and a combination of a failure and the recovery action that repaired it. We denote these cases , , and , respectively. The three types overlap. In particular, is a special case of both and (because they subsume all possible values of the missing member), so if the former dependency is significant, we do not know whether it is truly a dependency between a and the subsequent , or between irrespective of the intervening action, or between R with the initial failure playing no role. In practice, all three dependencies might be present to varying degrees.
We sort out the strengths of the dependencies by running a variant on the G-test called the Heterogeneity G-test [Sokal and Rohlf 81]. The intuition is that we compare the contributions of subsets to that of the superset; one can imagine looking at a Venn diagram (as in Figure 1) to gauge whether the failure ner, the recovery action sp or the combination seems to account for most of the area in the intersection with the subsequent failure ip. Both and overlap with part of the subsequent , but it is easy to see that a larger proportion of than overlaps with . Thus, is a more reliable precursor for .
Figure did not convert
Figure 1: Venn diagram representing the data for the precursors and the
following failure.
Similarly, but somewhat harder to see, the proportion of (the darker shaded area plus the cross hatched area) that overlaps with (just the darker shaded area) is about the same as the proportion of (the area in the heavy box) that overlaps with (the two shaded areas), suggesting that we can generalize the relationship without loss of information. To compute the overlap, we add the G values for each of the subsets together (e.g., for , we add G values for all possible R's) and compare the result to the G value for the superset; if the difference is significant, then the subsets account for more of the variance in the dependency and so cannot be generalized. For this example, the G values for the subsets add to 45.89; the difference between that and G for the superset is 3.035, which is not a significant difference at the .05 level.