Full length article AI in obstetrics: Evaluating residents’ capabilities and interaction strategies with ChatGPT David Desseauve a,b,*, Raphael Lescar c, Benoit de la Fourniere c, Pierre-François Ceccaldi d,e, Mikhail Dziadzko f,g a Department of Women–Mother–Child, Gynaecology and Obstetrics Unit, Lausanne University Hospital, Lausanne, Switzerland b Department of Women–Mother–Child, Gynaecology and Obstetrics Unit, Grenoble Alpes, University Hospital, Grenoble, France c Department of Obstetrics and Gynaecology, Hôpital de la Croix-Rousse, Hospices civils de Lyon, Lyon, France d Department of Obstetrics, Gynaecology and Reproductive Medicine, Foch Hospital, Suresnes, France e Innovative Dental Materials and Interfaces Research Unit (UR 4462), Faculty of Health, University of Paris, Paris, France f Department of Anaesthesiology, Hôpital de la Croix-Rousse, Hospices civils de Lyon, Lyon, France g RESHAPE UMR 1290 INSERM, Université Lyon 1, Lyon, France A R T I C L E I N F O Keywords: Artificial Intelligence Obstetrics Medical Education Prompt Engineering A B S T R A C T In line with the digital transformation trend in medical training, students may resort to artificial intelligence (AI) for learning. This study assessed the interaction between obstetrics residents and ChatGPT during clinically oriented summative evaluations related to acute hepatic steatosis of pregnancy, and their self-reported compe- tencies in information technology (IT) and AI. The participants in this semi-qualitative observational study were 14 obstetrics residents from two university hospitals. Students’ queries were categorized into three distinct types: third-party enquiries; search-engine-style queries; and GPT-centric prompts. Responses were compared against a standardized answer produced by ChatGPT with a Delphi-developed expert prompt. Data analysis employed descriptive statistics and correlation analysis to explore the relationship between AI/IT skills and response accuracy. The study participants showed moderate IT proficiency but low AI proficiency. Interaction with ChatGPT regarding clinical signs of acute hepatic steatosis gravidarum revealed a preference for third-party questioning, resulting in only 21% accurate responses due to misinterpretation of medical acronyms. No correlation was found between AI response accuracy and the residents’ self-assessed IT or AI skills, with most expressing dissatisfaction with their AI training. This study underlines the discrepancy between perceived and actual AI proficiency, highlighted by clinically inaccurate yet plausible AI responses – a manifestation of the ’stochastic parrot’ phenomenon. These findings advocate for the inclusion of structured AI literacy programmes in medical education, focusing on prompt engineering. These academic skills are essential to exploit AI’s potential in obstetrics and gynaecol- ogy. The ultimate aim is to optimize patient care in AI-augmented health care, and prevent misleading and unsafe knowledge acquisition. Introduction In recent decades, digital technologies have transformed medical education. Human–computer interactions have become essential in medical training, including various activities such as information retrieval, knowledge assessment, and simulation [1]. The emergence of artificial intelligence (AI) holds the potential to improve medical edu- cation further, offering personalized and interactive learning experiences, enhancing diagnostic accuracy, and providing support for data-driven decision-making [1–3]. Applications of AI in obstetrics and gynaecology (OG) training have shown diverse results. Initially, AI achieved an initial success rate of 30 %, which was increased to 70 % when used to answer questions on various topics usually asked to first-year medical students. This increase was obtained after applying an iterative querying technique called ‘prompting’ [4,5]. The precision of AI-generated answers is contingent * Corresponding author at: Department of Women–Mother–Child, Lausanne University Hospital, 1011 Lausanne, Switzerland. E-mail address: david.desseauve@chuv.ch (D. Desseauve). Contents lists available at ScienceDirect European Journal of Obstetrics & Gynecology and Reproductive Biology journal homepage: www.journals.elsevier.com/european-journal-of-obstetrics-and-gynecology-and- reproductive-biology https://doi.org/10.1016/j.ejogrb.2024.09.008 Received 22 December 2023; Received in revised form 1 June 2024; Accepted 6 September 2024 European Journal of Obstetrics and Gynecology 302 (2024) 238–241 Available online 14 September 2024 0301-2115/© 2024 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ). mailto:david.desseauve@chuv.ch www.sciencedirect.com/science/journal/03012115 https://www.journals.elsevier.com/european-journal-of-obstetrics-and-gynecology-and-reproductive-biology https://www.journals.elsevier.com/european-journal-of-obstetrics-and-gynecology-and-reproductive-biology https://doi.org/10.1016/j.ejogrb.2024.09.008 https://doi.org/10.1016/j.ejogrb.2024.09.008 http://crossmark.crossref.org/dialog/?doi=10.1016/j.ejogrb.2024.09.008&domain=pdf http://creativecommons.org/licenses/by/4.0/ on the methodology employed in questioning. The emergence of ‘prompt engineering’ as a skill for leveraging the capabilities of large language models (LLMs) such as ChatGPT has not yet been integrated systematically into medical training [6,7]. Although not adopted universally, several recommendations and techniques in medical prompt engineering have been published, highlighting the sig- nificance of cultivating this skill within students’ curricula [8]. This descriptive study was designed to understand how obstetrics and gynaecology residents interact spontaneously with AI in the context of a summative training session. The main objective was to study the accuracy of GPTs’ answers depending on the methods used by obstetrics and gynaecology residents to question the AI. The secondary objective was to examine the correlation between self-assessed IT proficiency, AI answers, AI skills and the participants’ postgraduate year (PGY). Methods Study design and setting This semi-qualitative observational study was conducted in a controlled academic environment to explore how obstetrics residents interact and seek answers from ChatGPT [9], a renowned LLM designed by OpenAI [10]. The sessions took place in two university hospital centres. Participants The study cohort comprised obstetrics residents, chosen from two hospitals to ensure diversity in training backgrounds. No ethical approval was required; written consent was obtained for voluntary participation. AI interaction Residents were asked to answer the question (in French): ‘What are the clinical and paraclinical signs of acute fatty liver of pregnancy (AFLP)?’. AFLP is a rare but severe liver condition that can occur during pregnancy, and can be life-threatening for both the mother and the baby. With prompt diagnosis and treatment, most women with AFLP recover fully after delivery. Every resident had access to their personal computer to facilitate interaction with ChatGPT 3.5. The time to obtain the answer was arbitrarily fixed to 15 min. Expert prompt and standard answer Using a Delphi method, the educational team (DD, MD, BdLF) created an expert prompt using ‘prompt engineering methods’. This process involves creating prompts by iterative refining and revising the initial queries to elicit accurate responses. For instance, a prompt might instruct: ‘Imagine you’re a PGY 3 medical student specializing in ob- stetrics. Generate three distinct queries for ChatGPT to obtain the following responses: [desired response in brackets]’. The authors collaboratively reviewed and edited the prompts until consensus was reached on the final version. This prompt was used to obtain the result, and it was submitted to three experts in the management of AFLP (GD, CH, PFC) from two medical schools for evaluation and validation (see online supplementary material). The validated result from the constructed prompt was used as a standard comparator for the students’ results obtained from AI. Data collection procedure Before the main task, the participants’ PGY, a self-assessment of their IT and AI proficiency (11-item scale, where 0 represents no proficiency and 10 represents maximal proficiency), and previous experience of AI use (yes/no) were collected. The queries used with ChatGPT were recorded and yielded answers. At the end of the study, participants’ satisfaction with AI interaction was recorded (11-item scale, where 0 represents no satisfaction and 10 represents maximal satisfaction). Analytical approach All queries were analysed and categorized semantically into the following types: • Third-party question: questions were structured as if asking a third person or expert. • Search-engine style: questions were formatted similarly to what one might input into a search engine. • GPT-centric prompt: questions were tailored explicitly to the known interaction style of GPT-like models [8]. All responses were compared with the standard comparator, and evaluated using a French grading system out of 20 points for their medical accuracy and relevance to the original clinical question. Based on a scale of 0–20 points, this grading system is commonly used in French educational institutions. Scores < 10 denote failure, with 0 rep- resenting complete failure. Scores > 10 indicate varying degrees of success, with a score of 20 signifying exceptional performance. The final score was converted to an accuracy percentage. Descriptive statistics and a non-parametric Spearman’s rho correla- tion analysis were used to evaluate the relationships between self- reported AI and IT skills and the accuracy of responses obtained from ChatGPT using students’ queries. This exploratory study did not use a sample size calculation. Data are presented as median [interquartile range (IQR)] or fre- quency (percentage), as appropriate. p < 0.05 was considered to indicate significance. Results Demographic and background information The study included 14 obstetrics residents from two university hos- pitals. Half of them (7/14) were at the end of their residency (more than eight semesters), while another 43 % (6/14) were in the middle of their residency (three to eight semesters). Only one resident was at the beginning of their residency. Only 36 % of the residents (5/14) had previous experience with LLMs such as GPT or BARD. The median levels of self-assessed IT proficiency and AI proficiency were 6 [IQR 5.75–7] and 2 [IQR 1–4.25], respectively. Eleven (93 %) students rated their IT proficiency higher than 5 out of 10, and only three (21 %) students gave such a rating for their AI skills. Interaction with ChatGPT The primary task for participants was to enquire about the clinical and paraclinical signs of acute hepatic steatosis gravidarum (commonly abbreviated as ‘SHAG’ in French). The majority (64 %, 9/14) chose the third-party question method, mainly copying the same question given by the examiner. Only one resident employed a prompt-like technique; the remaining 29 % (4/14) used a search-engine-style approach. From the entire cohort, only three (21 %) of the GPT answers cor- responded to the expected response, with accuracy of 30 % (one answer), 40 % (one answer) and 90 % (one answer). Predominantly, inaccuracies originated from a misinterpretation of the ‘SHAG’ acronym by AI, resulting in unrelated answers. Accuracy of responses and relationships with IT/AI skills No correlation was found between the accuracy of the AI response and self-assessed IT or AI skills, nor for the PGY level. Strong correlation D. Desseauve et al. European Journal of Obstetrics & Gynecology and Reproductive Biology 302 (2024) 238–241 239 was found between self-assessed IT and AI skills (Table 1). In an open question, 86 % (12/14) of residents expressed concerns about their AI training, stating that their current education needed to be equipped sufficiently for AI utilization. Only two (14 %) participants were satis- fied with their level of AI knowledge. Discussion This study revealed challenges in AI interaction among obstetrics and gynaecology residents, with low accuracy rates in ChatGPT- generated answers. No correlation was found between response accu- racy and residents’ self-assessed IT or AI skills. Only a minority of par- ticipants used the prompting method. One of the most significant advances in AI chatbots is the claimed ability to ‘understand’ human language conversations within context. However, the accuracy of an AI response, even with precise contextual information, can be outweighed by the ‘stochastic parrot’ [11]. Iterative prompting and clarification of questions by providing additional back- ground information and context may enhance the accuracy of responses. This was observed in this study with the efficacy of questioning when using a prompt-like approach. Additionally, the limited number of cor- rect responses to the expert-level question aligns with prior research demonstrating diminished performance of LLMs as the complexity of questions increases [12]. The difference in the results between responders’ self-assessed IT and AI competencies is significant. Although correlated, conventional digital knowledge does not equate to AI competency because the latter is highly dependent on specific skills such as prompting. This study demonstrated that many students lack understanding regarding AI interactions and their implications [15]. Based on these findings, a primary concern highlighted by this study regarding the integration of AI into medical practice or education is the risk of yielding inaccurate results, particularly in scenarios involving a combination of summative evaluation (assigning a score to students’ answers) and formative training (constructing new knowledge). This limit is also well known in using usual internet navigators with inte- grated web-search machines to answer questions in other medical fields [16–18]. Nevertheless, the performance of LLMs and the apparent quality of the answers may mask potential inaccuracies, possibly leading students or physicians to dramatic errors if they are applied directly to patient care without expert oversight. Applying evidence-based medi- cine mitigates these risks, particularly in the management of AFLP. Adhering to clinical guidelines and peer-reviewed research ensures that decisions are based on the best-available evidence, reducing the po- tential for errors from LLM-generated recommendations. One potential solution is to promote the incorporation of prompt engineering in medical education, in parallel with the exponential AI implementation in medical practice. Broad questions are likely to produce incorrect results. This un- derscores the importance of possessing AI prompt-building skills, and understanding how to construct medical or professional questions effectively to provide a structure for the anticipated answer. An alternative approach to developing effective prompt-engineering skills involves the reverse prompting technique, in which the LLM generates a prompt in a preconditioned manner based on an arbitrarily correct example of an answer. Subsequently, a medical student is invited to formulate a question for the LLM, aiming to elicit a response closely aligned with the initial answer, though not necessarily replicating the prompt originally generated by the LLM. Multiple iterations may be required to achieve the desired level of accuracy in prompt construction. Prompt engineering, associating the art of building precise textual prompts, is essential in interacting effectively with chatbots and generative AI tools to bring out desired outputs across various digital media formats [19]. The quality of these prompts is crucial, as AI models can improve their accuracy through iterative learning from user- provided data [20]. This emerging discipline is gradually evolving into a distinct skill set in the technology and corporate sectors. In medical training and education, understanding generative AI principles is critical to produce outcomes that support teaching, learning and assessment. In this way, Acar recently introduced a concept known as the Prob- lem, AI, Interaction, Reflection (PAIR) framework [21]. This framework offers a comprehensive roadmap for leveraging generative AI tools, mainly focusing on prompt development and its application. Widely recognized within academic circles, the PAIR framework serves as a cornerstone in prompt engineering, focusing on the critical role of problem formulation and understanding. The depth and precision of comprehension of this problem are pivotal, as they dictate the choice of prompts employed, the ensuing responses generated by the AI tool, and, ultimately, the effectiveness of the entire process. Such a condition re- quires human expertise. This underscores the importance of integrating structured AI utili- zation training modules into medical curricula, ensuring that the forthcoming generation of practitioners is both AI-aware and AI- competent [13]. As emphasized by the Beijing Convention on AI and Education [14], educational programmes are responsible for investing in and instilling these skills. This study has a few limitations. The modest sample size and the study’s confinement to two academic settings may have restricted the generalizability of the findings. However, it is unlikely that more advanced AI skills would be expected among students in other French medical universities because no LLM engineering skills are currently integrated into medical curricula. Furthermore, while this study assessed the immediate interactions with AI, no semantic analysis or incremental engineering prompt analysis (prompt-based incremental learning) was performed. The correlation analysis was performed on 14 pairs of responses. Although this number is modest and does not produce a critical side effect, the Spearman rank-order correlation analysis is valid for such a small number of observations [22]. A longitudinal analysis in further studies would provide insights into the learning curve of mastering AI tools. Conclusion This study underscores a significant finding: non-AI-trained obstetric gynaecology residents exhibit a heightened propensity to employ ‘third- party-like questions’ when interfacing with LLMs such as ChatGPT. This worrisome trend results in the dissemination of incorrect and potentially detrimental information within the context of medical education. It is evident that prompt engineering, a burgeoning facet of human- –computer interaction, demands immediate integration into medical curricula. The precise role of AI as a facilitator in medical training re- mains to be defined comprehensively, leaving an imperative void in the field that requires scholarly attention and resolution. Funding None. Table 1 Relationships between the accuracy of obtained artificial intelligence (AI) response, self-rated information technology (IT) and AI experience, and post- graduate year (PGY). Accuracy IT level AI level PGY Accuracy 1 0.1216; p = 0.68 0.2914; p = 0.31 0.5744; p = 0.10 IT level ​ 1 0.8557; p < 0.0001 0.1650; p = 0.57 AI level ​ ​ 1 0.4503; p = 0.11 PGY ​ ​ ​ 1 Numbers are Spearman’s rho and p-values. D. Desseauve et al. European Journal of Obstetrics & Gynecology and Reproductive Biology 302 (2024) 238–241 240 CRediT authorship contribution statement David Desseauve: Writing – review & editing, Writing – original draft, Validation, Supervision, Project administration, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Raphael Lescar: Writing – original draft, Investigation, Formal analysis, Data curation. Benoit de la Fourniere: Writing – review & editing, Data curation. Pierre François Ceccaldi: Writing – review & editing, Project administration, Methodology, Investigation, Formal analysis, Concep- tualization. Mikhail Dziadzko: Writing – review & editing, Supervision, Methodology, Formal analysis. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements The authors wish to thank the two department heads, Pr Cyril Huissoud (CH) and Pr Gil Dubernard (GD), for their support and guid- ance in enabling the successful completion of this study. Appendix A. Supplementary data Supplementary data to this article can be found online at https://doi. org/10.1016/j.ejogrb.2024.09.008. References [1] Dave M, Patel N. Artificial intelligence in healthcare and education. Br Dent J 2023;234:761–4. [2] Nagi F, Salih R, Alzubaidi M, et al. Applications of artificial intelligence (AI) in medical education: a scoping review. In: Mantas J, Gallos P, Zoulias E, editors. Studies in health technology and informatics. IOS Press; 2023. Available at: https://ebooks.iospress.nl/doi/10.3233/SHTI230581 (last accessed 17 October 2023). [3] Alowais SA, Alghamdi SS, Alsuhebany N, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ 2023;23:689. [4] Li SW, Kemp MW, Logan SJS, et al. ChatGPT outscored human candidates in a virtual objective structured clinical examination in obstetrics and gynecology. Am J Obstet Gynecol 2023;229(172). 172.e1–12. [5] Guerra GA, Hofmann H, Sobhani S, et al. GPT-4 artificial intelligence model outperforms ChatGPT, medical students, and neurosurgery residents on neurosurgery written board-like questions. World Neurosurg 2023;179:e160–5. [6] Wang L, Bi W, Zhao S, Ma Y, Lv L, Meng C, et al. Investigating the impact of prompt engineering on the performance of large language models for standardizing obstetric diagnosis text: comparative study. JMIR Form Res 2024;8:e53216. [7] O’Connor S, Peltonen LM, Topaz M, et al. Prompt engineering when using generative AI in nursing education. Nurse Educ Pract 2024;74:103825. [8] Meskó B. Prompt engineering as an important emerging skill for medical professionals: tutorial. J Med Internet Res 2023;25:e50638. [9] ChatGPT. Available at: https://chat.openai.com (last accessed 18 September 2023). [10] OpenAI. Available at: https://openai.com/ (last accessed 18 September 2023). [11] Bender EM, Gebru T, McMillan-Major A, Shmitchell S. On the dangers of stochastic parrots: can language models be too big?. In: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. ACM; 2021. p. 610–23. Available at: https://dl.acm.org/doi/10.1145/3442188.3445922 (last accessed 17 October 2023). [12] Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2023;2:e0000198. [13] Nune A, Iyengar Karthikeyan P, Manzo C, Barman B, Botchu R. Chat generative pre-trained transformer (ChatGPT): potential implications for rheumatology practice. Rheumatol Int 2023;43:1379–80. [14] Grünebaum A, Chervenak J, Pollet SL, Katz A, Chervenak FA. The exciting potential for ChatGPT in obstetrics and gynecology. Am J Obstet Gynecol 2023; 228:696–705. [15] Pinto Dos Santos D, Giese D, Brodehl S, et al. Medical students’ attitude towards artificial intelligence: a multicentre survey. Eur Radiol 2019;29:1640–6. [16] Kothari M, Moolani S. Reliability of ‘Google’ for obtaining medical information. Indian J Ophthalmol 2015;63:267–9. [17] Al-Bahrani A, Plusa S. The quality of patient-orientated internet information on colorectal cancer. Colorectal Dis 2004;6:323–6. [18] Bristowe K, Siassakos D, Hambly H, et al. Teamwork for clinical emergencies: interprofessional focus group analysis and triangulation with simulation. Qual Health Res 2012;22:1383–94. [19] Prompt engineering guide. Available at: https://www.promptingguide.ai/fr/techn iques/cot (last accessed 14 December 2023). [20] Strobelt H, Webson A, Sanh V, et al. Interactive and visual prompt engineering for ad-hoc task adaptation with large language models. IEEE Trans Vis Comput Graph 2023;29:1146–56. [21] Acar OA. Are your students ready for AI? Harvard Business Publishing; 2023. Available at: https://hbsp.harvard.edu/inspiring-minds/are-your-students-ready- for-ai (last accessed 14 December 2023). [22] Weaver KF, Morales V, Dunn SL, Godde K, Weaver PF. An introduction to statistical analysis in research: with applications in the biological and life sciences. Wiley; 2017. Available at: https://onlinelibrary.wiley.com/doi/book/10.1002/ 9781119454205 (last accessed 26 February 2024). D. Desseauve et al. European Journal of Obstetrics & Gynecology and Reproductive Biology 302 (2024) 238–241 241 https://doi.org/10.1016/j.ejogrb.2024.09.008 https://doi.org/10.1016/j.ejogrb.2024.09.008 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0005 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0005 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0010 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0010 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0010 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0010 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0010 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0015 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0015 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0020 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0020 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0020 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0025 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0025 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0025 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0030 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0030 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0030 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0035 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0035 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0040 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0040 https://chat.openai.com https://openai.com/ http://refhub.elsevier.com/S0301-2115(24)00499-8/h0055 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0055 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0055 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0055 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0055 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0065 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0065 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0065 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0070 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0070 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0070 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0075 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0075 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0075 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0080 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0080 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0085 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0085 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0090 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0090 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0095 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0095 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0095 https://www.promptingguide.ai/fr/techniques/cot https://www.promptingguide.ai/fr/techniques/cot http://refhub.elsevier.com/S0301-2115(24)00499-8/h0105 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0105 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0105 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0110 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0110 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0110 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0115 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0115 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0115 http://refhub.elsevier.com/S0301-2115(24)00499-8/h0115 AI in obstetrics: Evaluating residents’ capabilities and interaction strategies with ChatGPT Introduction Methods Study design and setting Participants AI interaction Expert prompt and standard answer Data collection procedure Analytical approach Results Demographic and background information Interaction with ChatGPT Accuracy of responses and relationships with IT/AI skills Discussion Conclusion Funding CRediT authorship contribution statement Declaration of competing interest Acknowledgements Appendix A Supplementary data References