deep neural networks have shown impressive performance for image-based
disease detection. Performance is commonly evaluated through clinical
validation on independent test sets to demonstrate clinically acceptable
accuracy. Reporting good performance metrics on test sets, however, is n