GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial

0
GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial
  • Kanjee, Z., Crowe, B. & Rodman, A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA 330, 78–80 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Cabral, S. et al. Clinical reasoning of a generative artificial intelligence model compared with physicians. JAMA Intern. Med. 184, 581–583 (2024).

    Article 
    PubMed 

    Google Scholar 

  • Tu, T. et al. Towards conversational diagnostic AI. Preprint at (2024).

  • McDuff, D. et al. Towards accurate differential diagnosis with large language models. Preprint at (2023).

  • Goh, E. et al. Large language model influence on diagnostic reasoning: a randomized clinical trial. JAMA Netw. Open 7, e2440969 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Zaboli, A., Brigo, F., Sibilio, S., Mian, M. & Turcato, G. Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage? Am. J. Emerg. Med. 79, 44–47 (2024).

    Article 
    PubMed 

    Google Scholar 

  • Truhn, D. et al. A pilot study on the efficacy of GPT-4 in providing orthopedic treatment recommendations from MRI reports. Sci. Rep. 13, 20159 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Cook, D. A., Sherbino, J. & Durning, S. J. Management reasoning beyond the diagnosis. JAMA 319, 2267–2268 (2018).

    Article 
    PubMed 

    Google Scholar 

  • Ledley, R. S. & Lusted, L. B. Reasoning foundations of medical diagnosis: symbolic logic, probability, and value theory aid our understanding of how physicians reason. Science 130, 9–21 (1959).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Bordage, G. Prototypes and semantic qualifiers: from past to present. Med. Educ. 41, 1117–1121 (2007).

    Article 
    PubMed 

    Google Scholar 

  • Bowen, J. L. Education educational strategies to promote clinical diagnostic reasoning. N. Engl. J. Med. 355, 2217–2225 (2006).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Cook, D. A., Stephenson, C. R., Gruppen, L. D. & Durning, S. J. Management reasoning: empirical determination of key features and a conceptual model. Acad. Med. 98, 80–87 (2023).

    Article 
    PubMed 

    Google Scholar 

  • Mercuri, M. et al. When guidelines don’t guide: the effect of patient context on management decisions based on clinical practice guidelines. Acad. Med. 90, 191–196 (2015).

    Article 
    PubMed 

    Google Scholar 

  • Schmidt, H. G., Norman, G. R., Mamede, S. & Magzoub, M. The influence of context on diagnostic reasoning: a narrative synthesis of experimental findings. J. Eval. Clin. Pract. 30, 1091–1101 (2024).

    Article 
    PubMed 

    Google Scholar 

  • Parsons, A. S., Wijesekera, T. P. & Rencic, J. J. The management script: a practical tool for teaching management reasoning. Acad. Med. 95, 1179–1185 (2020).

    Article 
    PubMed 

    Google Scholar 

  • Reverberi, C. et al. Experimental evidence of effective human–AI collaboration in medical decision-making. Sci. Rep. 12, 14952 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kempt, H. & Nagel, S. K. Responsibility, second opinions and peer-disagreement: ethical and epistemological challenges of using AI in clinical diagnostic contexts. J. Med. Ethics 48, 222–229 (2022).

    Article 
    PubMed 

    Google Scholar 

  • Restrepo, D., Rodman, A. & Abdulnour, R.-E. Conversations on reasoning: large language models in diagnosis. J. Hosp. Med. 19, 731–735 (2024).

    Article 
    PubMed 

    Google Scholar 

  • Friedman, C. P. et al. Enhancement of clinicians’ diagnostic reasoning by computer-based consultation: a multisite study of 2 systems. JAMA 282, 1851–1856 (1999); erratum 285, 2979 (2001).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Miller, R. A., Pople, H. E. Jr & Myers, J. D. Internist-1, an experimental computer-based diagnostic consultant for general internal medicine. N. Engl. J. Med. 307, 468–476 (1982).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Chen, Y. et al. SoulChat: improving LLMs’ empathy, listening, and comfort abilities through fine-tuning with multi-turn empathy conversations. Preprint at (2023).

  • Ayers, J. W. et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern. Med. 183, 589–596 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Tai-Seale, M. et al. AI-generated draft replies integrated into health records and physicians’ electronic communication. JAMA Netw. Open 7, e246565 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Chen, S. et al. The effect of using a large language model to respond to patient messages. Lancet Digit. Health 6, e379–e381 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pfeffer, M. A., Shah, N. H., Sharp, C. & Lindmark, C. Nigam Shah and partners roll out beta version of Stanford medicine SHC and SoM Secure GPT. Stanford Medicine (2024).

  • Nori, H. et al. Can generalist foundation models outcompete special-purpose tuning? Case study in medicine. Preprint at (2023).

  • Core IM. American College of Physicians www.acponline.org/cme-moc/internal-medicine-cme/internal-medicine-podcasts/core-im (2024).

  • Pell, G., Fuller, R., Homer, M. & Roberts, T. How to measure the quality of the OSCE: a review of metrics—AMEE guide no. 49. Med. Teach. 32, 802–811 (2010).

    Article 
    PubMed 

    Google Scholar 

  • Khan, K. Z., Ramachandran, S., Gaunt, K. & Pushkar, P. The Objective Structured Clinical Examination (OSCE): AMEE Guide No. 81. Part I: an historical and theoretical perspective. Med. Teach. 35, e1437–e1446 (2013).

    Article 
    PubMed 

    Google Scholar 

  • Cook, D. A., Durning, S. J., Stephenson, C. R., Gruppen, L. D. & Lineberry, M. Assessment of management reasoning: design considerations drawn from analysis of simulated outpatient encounters. Med. Teach. 1–15, (2024).

  • Singaraju, R. C., Durning, S. J., Battista, A. & Konopasky, A. Exploring procedure-based management reasoning: a case of tension pneumothorax. Diagnosis 9, 437–445 (2022).

    Article 
    PubMed 

    Google Scholar 

  • Jones, J. & Hunter, D. Consensus methods for medical and health services research. BMJ 311, 376–380 (1995).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Meskó, B. Prompt engineering as an important emerging skill for medical professionals: tutorial. J. Med. Internet Res. 25, e50638 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Van Veen, D. et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat. Med. 30, 1134–1142 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Gallo, R. J., Savage, T. & Chen, J. H. Affiliation bias in peer review of abstracts. JAMA 331, 1234–1235 (2024).

    Article 
    PubMed 

    Google Scholar 

  • Gallo, R. J. et al. Establishing best practices in large language model research: an application to repeat prompting. J. Am. Med. Inform. Assoc. 32, 386–390 (2025).

    Article 
    PubMed 

    Google Scholar 

  • Goh, E. et al. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Figshare (2025).

  • link

    Leave a Reply

    Your email address will not be published. Required fields are marked *