Although computational problems have been overcome, testing and collecting more dependencies exacts a cost in terms of certainty and model deluge. The algorithm requires serial testing of hypotheses: contingency table analysis is run on similar patterns for the same data repeatedly in the course of local search. Consequently, the probability threshold of .05 is no longer an accurate figure. We are currently investigating methods of adjusting down to compensate for the serial testing inherent in this algorithm. Given the fairly shallow depth of the search (number of tests to be run during a search) and the low values of P found in the most significant dependencies, it appears that adjusting will enhance the certainty of the results without unduly dropping the number of dependencies detected in the data.
From the point of view of someone interpreting dependencies, it is easy to be overwhelmed with the number found. Local search addresses this problem in two ways: First, only the most significant dependencies are likely to be found. Although the algorithm is doing multiple hill-climbs, it will likely find repeats in subsequent trials of the same pattern at the top of a large hill. Second, the search neighborhood in local search provides a natural clustering: the basins of attraction around most locally optimal patterns. These basins ensure that only one of the set of similar dependencies will be returned, and they can be exploited to cluster dependencies that contain the same subsequences.
Subsequences that are repeated across dependencies are probably themselves significant or may indicate cases in which several events exert a similar influence. For example, in the Phoenix data, some of the failures were variants on the same problem (e.g., two of them were failures in calculating paths), and some of the recovery actions were variants on the same method (e.g., two of them replanned, but from different points in the plan). We would like to cluster related dependencies to find commonalities and to reduce the burden of looking over a long list.
Clusters can be formed by finding locally optimal patterns and then backing out of the basin at individual positions to find other highly significant related patterns. The algorithm was augmented to ``back out'' of local maxima and gather patterns from the neighborhood that were also significant. If we examine patterns local to the maxima found in the test with precursors of length three and window of length seven, then we find that at each position in the pattern at least one and as many as six out of sixteen other patterns are significant. With local search, we can find clusters of somewhat less significant but highly similar composition.
Local search dependency detection is data directed, finds highly significant dependencies and provides a convenient method of clustering. As demonstrated on the Phoenix data, the local search method is effective at detecting long dependencies in data with considerable overlap in the elements of the dependencies: it finds a large number of the most significant dependencies in a manageable amount of computation time. However, not too surprisingly, when tested on synthetic data, it does not perform nearly as well when there is little relationship between subsequences and the long sequences that form dependencies.
This research was supported by a Colorado State University Diversity Career Enhancement grant and ARPA/Rome Laboratory contract F30602-93-C-0100. The US Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright notation hereon. I wish to thank Darrell Whitley for his advice on the application of local search, Aaron Fuegi for his programming, Steve Leathard for running tests, Tim Oates for the synthetic data generator, and Douglas Fisher and anonymous reviewers for their patient and careful reading and suggestions for improvement.