In many studies, the estimation of the apparent diffusion coefficient (ADC) of lesions in visceral organs in diffusion-weighted (DW) magnetic resonance images requires an accurate lesion-segmentation algorithm. To evaluate these lesion-segmentation algorithms, region-overlap measures are used currently. However, the end task from the DW images is accurate ADC estimation, and the region-overlap measures do not evaluate the segmentation algorithms on this task. Moreover, these measures rely on the existence of gold-standard segmentation of the lesion, which is typically unavailable. In this paper, we study the problem of task-based evaluation of segmentation algorithms in DW imaging in the absence of a gold standard. We first show that using manual segmentations instead of gold-standard segmentations for this task-based evaluation is unreliable. We then propose a method to compare the segmentation algorithms that does not require gold-standard or manual segmentation results. The no-gold-standard method estimates the bias and the variance of the error between the true ADC values and the ADC values estimated using the automated segmentation algorithm. The method can be used to rank the segmentation algorithms on the basis of both the ensemble mean square error and precision. We also propose consistency checks for this evaluation technique.
ASJC Scopus subject areas
- Radiological and Ultrasound Technology
- Radiology Nuclear Medicine and imaging