In a recent study published in the journal Nature medicineThe researchers tested the ability of general and specialty physicians to diagnose skin diseases based on skin tone in a simulated teledermatology situation.
Deep learning-based approaches to image-based diagnosis can improve clinical decisions, but their effectiveness is unknown due to systematic errors, particularly when evaluating underrepresented groups. The future of machine learning in medicine may include collaborations between doctors and machines, with domain-specific interfaces based on machine learning models that help clinical insights generate more accurate diagnoses. Expert recognition is essential to override automated recommendations. Initial research on store-and-forward teledermatology reveals that deep learning systems can improve generalist diagnostic accuracy, but performance uncertainties still exist between clinician expertise and underrepresented groups.
Study: Elevated body temperature is associated with depressive symptoms: results from the TemPredict study. Image Credit: RossHelen/Shutterstock
About the study
In the present study, researchers conducted a digital analysis with 389 board-certified dermatologists (BCD) and 459 primary care physicians (PCP) from 39 countries to evaluate the diagnostic accuracy provided by general practitioners and specialists in teledermatology simulations.
The study involved 364 images of 46 dermatological disorders and asked participants to submit a maximum of four differential diagnoses. Most of the images depicted eight relatively common skin diseases. The team recruited several participating physicians and designed the study to leverage valuable insights from gamification strategies, such as feedback, rewards, competition, and different rules. They discovered a replicable design space that included different skin tones, skin disorders, medical knowledge, doctor-machine collaborations, precision in clinical decision support, and user interface designs.
The researchers measured diagnostic accuracy with and without AI assistance in light and dark skin tones and followed algorithmic auditing techniques. The team focused on skin diseases based on three criteria: (i) Three board-certified dermatologists identified these diseases as the most likely ones in which the team can find precision disparities between patients’ skin tones; (ii) these diseases are relatively common; and (iii) these diseases appear frequently enough in dermatology textbooks and dermatology image atlases that the team was able to select at least five images of the two darkest skin types after requesting a review quality control by certified dermatologists.
To provide computer vision-based diagnostic predictions, the team trained a convolutional neural network to categorize nine labels: the eight skin diseases of interest and one other category. The researchers fit the model on 31,219 diverse clinical dermatology images from the Fitzpatrick 17k dataset and additional images obtained from textbooks, dermatology atlases, and online search engines. The team compared the DLS system with doctors’ performance in diagnosing skin diseases using the fine-tuned VGG-16 architecture on 31,219 clinical dermatology images.
General practitioners and specialists achieved a diagnostic accuracy of 19% and 38%, respectively, and showed four percentage points lower accuracy in diagnoses among dark-skinned people than among light-skinned people. Deep learning-based decision support improved physicians’ diagnostic accuracy by >33%, but widened gaps in GPs’ diagnostic accuracy across different skin tones.
The maximum accuracies of general practitioners, primary care physicians, dermatology residents, and board-certified dermatologists were 18%, 19%, 36%, and 38%, respectively, across all images (excluding attention control images) and 16%, 17%. 35% and 37%, respectively, for photographs indicating the eight main skin diseases investigated. The most frequently identified primary clinical diagnosis for PCP and BCD imaging was correct in 33% and 48% of observations, respectively.
In 77.0% of the photographs, one or more BCDs identified reference labels in differential diagnoses, while one or more MAPs did so in 58%. After witnessing accurate DLS estimation, one or more BCDs included reference labels in differential diagnoses in 98.0% of photographs. In all photographs, participants detected disorders in darker skin (predicted FST 5.0 and 6.0) with lower accuracy than those in lighter skin.
When examining physician categories independently, the maximum accuracies for board-certified dermatologists, dermatology residents, primary care physicians, and other physicians were lower by five percentage points, five percentage points, three percentage points, and five percentage points for photographs darker-skinned than lighter-skinned, respectively. Similarly, the highest diagnostic accuracies of board-certified dermatologists, dermatology residents, primary care physicians, and other physicians decreased by three percentage points, five percentage points, four percentage points, and four percentage points for photographs of darker skin versus clearest. , respectively. BCDs were 4.4 percentage points more likely to refer dark-skinned patients to a dermatologist for a second opinion.
The study findings showed that deep learning-based decision support can increase doctors’ diagnostic accuracy in teledermatology situations. BCDs had a top-three diagnostic accuracy of 38%, while PCPs had a 19% accuracy. The findings are consistent with previous research indicating that experts outperform generalists in diagnosing skin diseases, but accuracy is lower than in previous studies. The diagnostic accuracy of specialists and generalists was worse in dark-skinned people than in light-skinned people. The BCD and PCP performed four percentage points better in photographs of light skin than in photographs of dark skin. DLS-based decision support improved primary diagnosis accuracy by 33% for BCDs and 69% for PCPs, resulting in increased sensitivity in identifying particular skin disorders.