Harnessing the Power of ChatGPT for Assessment Question Generation: Five Tips for Medical Educators

Inthrani Raja INDRAN*, Priya PARANTHAMAN, and Nurulhuda MUSTAFA

Department of Pharmacology,
Yong Loo Lin School of Medicine (YLLSoM)

*phciri@nus.edu.sg

 

Indran, I. R., Paranthaman, P., & Mustafa, N. (2023). Harnessing the power of ChatGPT for assessment question generation: Five tips for medical educators [Lightning talk]. In Higher Education Campus Conference (HECC) 2023, 7 December, National University of Singapore. https://blog.nus.edu.sg/hecc2023proceedings/harnessing-the-power-of-chatgpt-for-assessment-question-generation-five-tips-for-medical-educators/ 

SUB-THEME

AI and Education 

 

KEYWORDS

AI, ChatGPT, questions, medical assessment

 

CATEGORY

Lightning Talks 

 

INTRODUCTION

Developing diverse and high-quality assessment questions for the medical curriculum is a complex and time-intensive task, as they often require the incorporation of clinically relevant scenarios which are aligned to the learning outcomes (Al-Rukban, 2006; Palmer & Devitt, 2007). The emergence of artificial intelligence (AI)-driven large language models (LLMs) has presented an unprecedented opportunity to explore how AI can be harnessed to optimise and automate these complex tasks for educators (AI, 2023). It also provides an opportunity for students to use the LLMs to help create practice questions and further their understanding of the concepts they wish to test.

 

AIMS & METHODS

This study aims to establish a definitive and dependable series of practical pointers, that would enable educators to tap on the ability of LLMs, like ChatGPT, to firstly enhance question generation in healthcare profession education, using multiple choice question (MCQs) as an illustrative example. Secondly, it can assist to generate diverse clinical scenarios for teaching and learning purposes and lastly, we hope that our experiences will encourage more educators to explore and access AI tools such as ChatGPT with greater ease, especially if they had limited prior experiences.

 

To generate diverse, high-quality clinical scenario MCQs, we outlined core medical concepts and identified essential keywords for integrating into the instruction stem. The text inputs were iteratively refined and fine-tuned until we developed instruction prompts that could help us generate questions of a desirable quality. Following question generation, respective domain experts reviewed them for content accuracy and overall relevance, identifying any potential flags in the question stem. This process of soliciting feedback and implementing refinements, enabled us to continuously enhance the prompts and the quality of questions generated. By prioritising expert review, we established a necessary validation process for the MCQs prior to their formal implementation.

 

THE FIVE TIPS

We consolidated the following tips to effectively harness the power of ChatGPT for assessment question generation.

 

Tip 1: Define the Objective and Select the Appropriate Model

Determine the purpose of question generation and choose the appropriate AI model based on needs and access. Model selection depends on the needs and accessibility. Choose ChatGPT 4.0 over 3.5 for greater accuracy and concept integration. ChatGPT 4.0 requires a subscription. Activate the beta features in “Settings” and utilise the “Browse with Bing” mode to retrieve information surpassing its training cut-off period, as well as install plugins for improved AI performance.

 

Tip 2: Optimise Prompt Design

When refining the stem design for question generation, there are several important considerations. Firstly, be specific in your instructions by emphasising key concepts, question types, quantity, and the answer format. Clearly state any guidelines or rules you want the model to follow. Focus on core concepts and keywords relevant to the discipline to build the instruction stem. Experiment with vocabulary to optimise question quality.

 

Tip 3: Build Diverse Authentic Scenarios

Develop a range of relevant clinical vignettes to broaden the scope of scenarios that can be used to assess students.

 

Tip 4: Calibrate Assessment Difficulty

Incorporate the principles of Bloom’s Taxonomy when developing assessment questions to test different cognitive skills, ranging from basic knowledge recall to complex analysis, enhancing question diversity.

 

Tip 5: Work Around Limitations

Be mindful that ChatGPT is trained on limited data and can generate factually inaccurate information. Despite diverse training, ChatGPT does not possess the nuanced understanding of a medical expert, which can impact the quality of the questions it generates. Human validation is necessary to address any factual inaccuracies that may arise. AI data collection risks misuse, privacy breaches, and bias amplification, leading to misguided outcomes.

 

CONCLUSION

AI-assisted question generation is an iterative process, and these tips can provide any healthcare professions educator valuable guidance in automating the generation of good quality assessment questions. Furthermore, students can leverage this technology for self-directed learning, creating and verifying their practice questions and strengthening their understanding of medical concepts (Touissi et al., 2022). While this paper primarily demonstrates the use of ChatGPT in generating MCQs, we believe that the approach can be extended to various other question types. It is also important to remember that though AI augments, it does not replace human expertise. (Ali et al., 2023; Rahsepar et al., 2023). Domain experts are needed to ensure quality, accuracy, and relevance.

 

REFERENCES 

AI, O. (2023).

Al-Rukban, M. O. (2006). Guidelines for the construction of multiple choice questions tests. J Family Community Med, 13(3), 125-33. https://www.ncbi.nlm.nih.gov/pubmed/23012132

Ali, R., Tang, O. Y., Connolly, I. D., Fridley, J. S., Shin, J. H., Zadnik Sullivan, P. L., Cielo, D., Oyelese, A. A., Doberstein, C. E., Telfeian, A. E., Gokaslan, Z. L., & Asaad, W. F. (2023). Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank. Neurosurgery. https://doi.org/10.1227/neu.0000000000002551

Palmer, E. J., & Devitt, P. G. (2007). Assessment of higher order cognitive skills in undergraduate education: modified essay or multiple choice questions? Research paper. BMC Med Educ, 7, 49. https://doi.org/10.1186/1472-6920-7-49

Rahsepar, A. A., Tavakoli, N., Kim, G. H. J., Hassani, C., Abtin, F., & Bedayat, A. (2023). How AI responds to common lung cancer questions: ChatGPT vs Google Bard. Radiology, 307(5), e230922. https://doi.org/10.1148/radiol.230922

Touissi, Y., Hjiej, G., Hajjioui, A., Ibrahimi, A., & Fourtassi, M. (2022). Does developing multiple-choice questions improve medical students’ learning? A systematic review. Med Educ Online, 27(1), 2005505. https://doi.org/10.1080/10872981.2021.2005505

 

Skip to toolbar