The roles of evaluation in empirical artificial intelligence (Al) research are described, in an idealized cyclic model and in the context of three case studies. The case studies illustrate pitfalls in evaluation and the contributions of evaluation at all stages of the research cycle. Evaluation methods are contrasted with those of the behavioral sciences, and it is concluded that AI must define and refine its own methods. To this end, several experiment “schemas” and many specific evaluation criteria are described; recommendations are offered in the hope of encouraging the development and practice of evaluation methods in AI.
ASJC Scopus subject areas