We discussed papers on RANDOOP, ARTOO, and KORAT. I would like you to compare the papers in terms of the strategy used to evaluate the proposed approaches. In particular, compare how the experiments were set up, what kinds of benchmarks were used, what kinds of faults were used (seeded vs real), metrics were used, how many runs were used, and whether the authors compared with other approaches. Feel free to include other comparison criteria as well.
Part A (6 points)
Part B (4 points)