Harnessing the Potential of Generative AI in Medical Undergraduate Education Across Different Disciplines—Comparative Study on Performance of ChatGPT in Physiology and Biochemistry Modified Essay Questions

W. A. Nathasha Vihangi LUKE1*, LEE Seow Chong2, Kenneth BAN2, Amanda WONG1, CHEN Zhi Xiong1,3, LEE Shuh Shing3 , Reshma Taneja1,
Dujeepa SAMARASEKARA3, Celestial T. YAP1

1Department of Physiology, Yong Loo Lin School of Medicine (YLLSOM)
2Department of Biochemistry, YLLSOM
3Centre for Medical Education, YLLSOM

*nathasha@nus.edu.sg

 

Luke, W. A. N. V., Lee, S. C., Ban, K., Wong, A., Chen, Z. X., Lee, S. S., Taneja, R., Samarasekara, D., & Yap, C. T. (2023). Harnessing the potential of generative AI in medical undergraduate education across different disciplines—comparative study on performance of ChatGPT in physiology and biochemistry modified essay questions [Paper presentation]. In Higher Education Campus Conference (HECC) 2023, 7 December, National University of Singapore. https://blog.nus.edu.sg/hecc2023proceedings/harnessing-the-potential-of-generative-ai-in-medical-undergraduate-education-across-different-disciplines-comparative-study-on-performance-of-chatgpt-in-physiology-and-biochemistry-modified-es/ 
 

SUB-THEME

AI and Education

 

KEYWORDS

Generative AI, artificial intelligence, large language models, physiology, biochemistry

 

CATEGORY

Paper Presentations

 

INTRODUCTION & JUSTIFICATION

Revolutions in generative artificial intelligence (AI) have led to profound discussions on its potential implications across various disciplines in education. ChatGPT passing the United States medical school examinations (Kung et al., 2023) and excelling in other discipline-specific examinations (Subramani et al., 2023) displayed its potential to revolutionise medical education. Capabilities and limitations of this technology across disciplines should be identified to promote the optimum use of the models in medical education. This study evaluated the performance of ChatGPT, a large language model (LLM) by Open AI, powered by GPT 3.5, in modified essay questions (MEQs) in physiology and biochemistry for medical undergraduates.

 

METHODOLOGY

Modified essay questions (MEQs) extracted from physiology and biochemistry tutorials and case-based learning scenarios were encoded into ChatGPT. Answers were generated for 44 MEQs in physiology and 43 MEQs in biochemistry. Each response was graded by two examiners independently, guided by a marking scheme. In addition, the examiners rated the answers on concordance, accuracy, language, organisation, and information and provided qualitative comments. Descriptive statistics including mean, standard deviation, and variance were calculated in relation to the average scores and subgroups according to Bloom’s Taxonomy. Single factor ANOVA was calculated for the subgroups to assess for a statistically significant difference.

 

RESULTS

ChatGPT answers (n = 44) obtained a mean score of 74.7(SD 25.96) in physiology. 16/44(36.3%) of the ChatGPT answers scored 90/100 marks or above. 29.5%, numerically 13/44, obtained a score of 100%. There was a statistically significant difference in mean scores between the higher-order and lower-order questions on the Bloom’s taxonomy (p < 0.05). Qualitative comments commended ChatGPT’s strength in producing exemplary answers to most questions in physiology, mostly excelling in lower-order questions. Deficiencies were noted in applying physiological concepts in a clinical context.

 

The mean score for biochemistry was 59.3(SD 26.9). Only 2/43(4.6%) obtained 100% scores for the answers, while 7/43(16.27%) scored 90 or above marks. There was no statistically significant difference in the scores for higher and lower-order questions of the Bloom’s taxonomy. The examiner’s comments highlighted those answers lacked relevant information and had faulty explanations of concepts. Examiners commented that outputs demonstrated breadth, but not the depth expected.

nathasha luke et al, - Distribution of scores

Figure 1. Distribution of scores.

 

CONCLUSIONS AND RECOMMENDATIONS

Overall, our study demonstrates the differential performance of ChatGPT across the two subjects. ChatGPT performed with a high degree of accuracy in most physiology questions, particularly excelling in lower-order questions of the Bloom’s taxonomy. Generative AI answers in biochemistry scored relatively lower. Examiners commented that the answers demonstrated lower levels of precision and specificity, and lacked depth in explanations.

 

The performance of language models largely depends on the availability of training data; hence the efficacy may vary across subject areas. The differential performance highlights the need for future iterations of LLMs to receive subject and domain-specific training to enhance performance.

 

This study further demonstrates the potential of generative AI technology in medical education. Educators should be aware of the abilities and limitations of generative AI in different disciplines and revise learning tools accordingly to ensure integrity. Efforts should be made to integrate this technology into learning pedagogies when possible.

 

The performance of ChatGPT in MEQs highlights the ability of generative AI as educational tools for students. However, this study confirms that the current technology might not be in a state to be recommended as a sole resource, but rather be a supplementary tool along with other learning resources. In addition, the differential performance in subjects should be taken into consideration by students when determining the extent to which this technology should be incorporated into learning.

 

REFERENCES

Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., Madriaga, M., Aggabao, R., Diaz-Candido, G., Maningo, J., & Tseng, V. (2023). Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digital Health, 2(2), e0000198. https://doi.org/10.1371/journal.pdig.0000198

 Subramani, M., Jaleel, I., & Krishna Mohan, S. (2023). Evaluating the performance of ChatGPT in medical physiology university examination of phase I MBBS. Advances in Physiology Education, 47(2), 270–71. https://doi.org/10.1152/advan.00036.2023

 

Doing But Not Creating: A Theoretical Study of the Implications of ChatGPT on Paradigmatic Learning Processes

Koki MANDAI1, Mark Jun Hao TAN1, Suman PADHI1, and Kuin Tian PANG1,2,3 

1*Yale-NUS College
2Bioprocessing Technology Institute, Agency for Science, Technology, and Research (A*STAR), Singapore
3School of Chemistry, Chemical Engineering, and Biotechnology, Nanyang Technology University (NTU), Singapore

*m.koki@u.yale-nus.edu.sg

 

Mandai, K, Tan, J. H. M., Padhi, S., & Pang, K. T. (2023). Doing but not creating: A theoretical study of the implications of ChatGPT on paradigmatic learning processes [Paper presentation]. In Higher Education Campus Conference (HECC) 2023, 7 December, National University of Singapore. https://blog.nus.edu.sg/hecc2023proceedings/doing-but-not-creating-a-theoretical-study-of-the-implications-of-chatgpt-on-paradigmatic-learning-processes/

SUB-THEME

AI and Education

 

KEYWORDS

AI, artificial intelligence, education, ChatGPT, learning, technology

 

CATEGORY

Paper Presentation 

 

CHATGPT AND LEARNING FRAMEWORKS

Introduction

Since the recent release of ChatGPT, developed by OpenAI, multiple sectors have been affected by it, and educational institutions are not only affected by this trend but are also more deeply impacted compared to other fields (Dwivedi et al., 2023; Eke, 2023; Rudolph et al., 2023). Following the sub-theme of “AI and Education”, we conduct a systematic investigation into the educational uses of ChatGPT and its quality as a tool for learning, teaching, and assessing, mainly in higher education. Research is carried out using comprehensive literature reviews of the current and future educational landscape and ChatGPT’s methodology and function, while applying major educational theories as the main component for the construction of the evaluative criteria. Findings will be presented via a paper presentation.

 

Theoretical Foundations and Knowledge Gaps

Current literature on the intersections of education and artificial intelligence (AI) consists of variegated and isolated critiques of how AI impacts segments of the educational process. For instance, there is a large focus on the general benefits or harms in education (Baidoo-Anu & Ansah, 2023; Dwivedi et al., 2023; Mhlanga, 2023), rather than discussion of specific levels of learning that students and teachers encounter. Furthermore, there seems to be a lack of analysis on the fundamental change and reconsideration of the meaning of education that may occur due to the introduction of AI. The situation can be described as a Manichean dichotomy, as one side argues for the expected enhancements and improved efficiency in education (Ray 2023; Rudolph et al., 2023), while the other side argues for the risks of losing knowledge/creativity and the basis of future development (Chomsky, 2023; Dwivedi et al., 2023; Krügel et al., 2022/2023).

 

By referring to John Dewey’s reflective thought and action model for the micro-scale analysis (Dewey, 1986; Gutek, 2005; Miettinen, 2000) and a revision of Bloom’s taxonomy for the macro-scale analysis (Elsayed, 2023; Forehand, 2005; Kegan, 1977; Seddon, 1978), we consider the potential impact of ChatGPT over progressive levels of learning and the associated activities therein. These models were mainly chosen due to their hierarchical framework that allows for easy application in evaluation compared to other models, although this does not indicate that these models are superior to others; the evaluative criteria we aim to construct will be comprehensive, thus what our research provides is a possible base for future improvements. Moreover, we also incorporate insights from multiple perspectives that are not limited to educational theory, such as from the fields of policy and philosophy with the diverse backgrounds in our research team.

 

Purpose and Significance of the Present Study

This study sought to answer questions regarding the viability of ChatGPT as an educational tool, its proposed benefits and harms, and potential obstacles educators may face in its uptake, as well as relevant safeguards against those obstacles.

 

Furthermore, we suggest a possible base for a new theoretical framework in which ChatGPT is explicitly integrated with standard educational hierarchies, in order to provide better instruction to educators and students. This study aims to safely pioneer a baseline for policy considerations on it as an education tool made to either ameliorate or deteriorate. As a result, ChatGPT can be ratified in educational institutions with accompanying developmental policies to be considered and amended in governmental legislatures for wider educational use.

 

Potential Findings/Implications

The expectations from the existing literature suggest that in keeping with intuitions regarding higher-level learning, ChatGPT itself seems to be limited to do—that is, it is only able to process lower to mid-level learning comprising repetitive actions like remembering, understanding, applying, and analysing (Dwivedi, 2023; Elsayed 2023). Some literature also positions ChatGPT as less useful directly in higher-level processes of creation like evaluation and creation of new knowledge, and can even be said to hinder them (Crawford, 2023; Rudolph, 2023). Even within the lower-level process, there is a high concern for overreliance that will potentially lead to dullness of the learners (Halaweh, 2023; Ray, 2023). Yet under the lens of educational theories that this paper so far applied, there seems to be a possibility that ChatGPT may be able to assist higher-order skills such as creativity and related knowledge acquisition. As the net benefit of ChatGPT on education may more or less depend on external factors such as educational fields, the personality of the user, and the environment that we have yet to take into account of, it requires further research to determine its optimal usage in education. Still, this attempt may be one of the first steps to construct an evaluative criteria for the new era of education with AIs.

 

REFERENCES

Baidoo-Anu, D. & Ansah, L. O. (2023). Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. SSRN. https://ssrn.com/abstract=4337484

Crawford, J., Cowling, M., & Allen, K. (2023). Leadership is needed for ethical ChatGPT: Character, assessment, and learning using artificial intelligence (AI). Journal of University Teaching & Learning Practice, 20(3). https://doi.org/10.53761/1.20.3.02

Chomsky, N, et al. (2023). Noam Chomsky: The False Promise of ChatGPT. The New York Times. www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html

Dewey, J. (1986). Experience and education. The Educational Forum, 50(3), 241-52. https://doi.org/10.1080/00131728609335764

Dwivedi, Y. K. et al. (2023). “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management, 71, 1-63. https://doi.org/10.1016/j.ijinfomgt.2023.102642

Eke, D. O. (2023). ChatGPT and the rise of generative AI: Threat to academic integrity? Journal of Responsible Technology, 13, 1-4, https://doi.org/10.1016/j.jrt.2023.100060

Elsayed, S. (2023). Towards mitigating ChatGPT’s negative impact on education: Optimizing question design through Bloom’s taxonomy. https://doi.org/10.48550/arXiv.2304.08176

Forehand, M. (2005). Bloom’s taxonomy: Original and revised. In M. Orey (Ed.), Emerging perspectives on learning, teaching, and technology. http://projects.coe.uga.edu/epltt/

Gutek, G. L. (2005). Jacques Maritain and John Dewey on education: A reconsideration. Educational Horizons, 83(4), 247–63. http://www.jstor.org/stable/42925953

Halaweh, M. (2023). ChatGPT in education: Strategies for responsible implementation. Contemporary Educational Technology, 15(2), ep421. https://doi.org/10.30935/cedtech/13036

Kegan, D. L. (1977). Using Bloom’s cognitive taxonomy for curriculum planning and evaluation in nontraditional educational settings. The Journal of Higher Education, 48(1), 63–77. https://doi.org/10.2307/1979174

Krügel, S., Ostermaier, A. & Uhl, M (2023). ChatGPT’s inconsistent moral advice influences users’ judgment. Sci Rep 13, 4569. https://doi.org/10.1038/s41598-023-31341-0

Krügel, S., Ostermaier, A. & Uhl, M. Zombies in the loop? Humans trust untrustworthy AI-advisors for ethical decisions. (2022) Philos. Technol. 35, 17. https://doi.org/10.1007/s13347-022-00511-9

Mhlanga, D. (2023). Open AI in Education, the Responsible and Ethical Use of ChatGPT Towards Lifelong Learning SSRN, https://ssrn.com/abstract=4354422

Miettinen, R. (2000). The concept of experiential learning and John Dewey’s theory of reflective thought and action, International Journal of Lifelong Education, 19(1), 54-72. https://doi.org/10.1080/026013700293458

Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 3, 121-154, https://doi.org/10.1016/j.iotcps.2023.04.003

Rudolph, J., Tan, S., Tan, S. (2023). ChatGPT: Bullshit spewer or the end of traditional assessments in higher education? Journal of Applied Learning & Teaching, 6(1), 1-22. https://journals.sfu.ca/jalt/index.php/jalt/article/view/689

Seddon, G. M. (1978). The properties of Bloom’s Taxonomy of educational objectives for the cognitive domain. Review of Educational Research, 48(2), 303–23. https://doi.org/10.2307/1170087

 

Is Artificial Intelligence Better Than Human-controlled Doctor for Virtual Reality Interprofessional Simulation Training?

Sok Ying LIAW1*, Jian Zhi TAN1, Khairul Dzakirin Bin Rusli1, Rabindra RATAN2, Wentao ZHOU1, Siriwan LIM1, Tang Ching LAU3, Betsy SEAH1, and Wei Ling CHUA1

1Alice Lee Centre for Nursing Studies
2Department of Media & Information, Michigan State University
3Department of Medicine, Yong Loo Lin School of Medicine

*nurliaw@nus.edu.sg

 

Liaw, S. Y., Tan, J. Z., Khairul Dzakirin Rusli, Ratan, R., Zhou, W., Lim, S., Lau, T. C., Seah, B., & Chua, W. L. (2023). Is artificial intelligence better than human-controlled doctor for virtual reality interprofessional simulation training [Poster presentation]. In Higher Education Campus Conference (HECC) 2023, 7 December, National University of Singapore. https://blog.nus.edu.sg/hecc2023proceedings/is-artificial-intelligence-better-than-human-controlled-doctor-for-virtual-reality-interprofessional-simulation-training/ 
 

SUB-THEME

AI and Education 

 

KEYWORDS

Artificial intelligence, interprofessional education, sepsis care, interprofessional communication, virtual reality simulation

 

CATEGORY

Poster Presentation 

 

BACKGROUND

A multi-user virtual reality simulation (VRS) has shown to be an effective learning strategy to prepare medical and nursing students for sepsis team training. However, its scalability is limited by human-controlled avatars and unequal cohort sizes between nursing and medical students. Given the unequal medical and nursing cohort sizes (e.g., 300 medical students versus 1500 nursing students), it is unlikely that all nursing students have the opportunity to form interprofessional teams with medical students to engage in doctor-nurse team training. With evolving artificial intelligence (AI), an AI medical team player was developed and replaced the human-controlled avatar as the virtual doctor to enable more nursing students to engage in sepsis team training (Liaw et al., 2023).

 

AIM

To evaluate the effectiveness of an AI-powered doctor compared to the human-controlled doctor avatar in training nursing students for interprofessional communication and sepsis care.

 

METHODS

64 nursing students were recruited in the two-arm randomised controlled trial. While the participants in the intervention group went through sepsis team training with an AI-powered doctor, the participants in the control group had their training with human-controlled avatars controlled by medical students in virtual reality simulation. Pre-test and post-test questionnaires were administered to assess their sepsis knowledge and self-efficacy in interprofessional communication. Post-test simulation-based assessments were conducted to assess both groups’ sepsis and communication performance. The study was approved by the National University of Singapore institutional review board (ref no. NUS-IRB-2022-202).

 

KEY FINDINGS

Compared with the pre-test scores, both the intervention and control groups showed significant improvements in communication knowledge (P = .001) and self-efficacy in interprofessional communication (P < .001) in post-test scores. The intervention group demonstrated a significant improvement in post-test sepsis care knowledge (P <. 001), but not in the control group (P = .16). Although there were no significant differences in sepsis care performance between the groups (P = .39), the intervention group (mean 9.06, SD 1.78) had statistically significantly higher sepsis post-test knowledge scores than the control group (mean 7.75, SD 2.08). Similarly, there were no significant differences in interprofessional communication performance between the two groups (P = .21). However, the control group (mean 69.6, SD 14.4) reported a significantly higher level of self-efficacy in interprofessional communication than the intervention group (mean 60.1, SD 13.3).

 

LIMITATIONS

Despite evaluating participants’ performance through simulation-based assessment and observation using validated tools, our study did not measure the long-term retention of knowledge and level of self-efficacy in interprofessional communication. Therefore, future studies could evaluate outcomes over a longer period and measure the impact in the clinical setting.

 

SIGNIFICANCE OF THE STUDY

This study supported the sustainability of implementing AI-powered doctors in virtual reality simulation to achieve expansibility in sepsis team training. With its practicality, AI-enabled virtual reality simulation is a promising strategy to train large number of nursing students and clinical nurses across educational and healthcare institutions in delivering quality and safe patient care. This learning platform could also extend to international institutions for the opportunity of international collaboration. The findings from our study also suggested future evaluation of various blended learning approaches using AI-powered VRS with human-controlled VRS and face-to-face simulation-based interprofessional learning to optimise clinical performance in sepsis care and interprofessional communication.

 

REFERENCE

Liaw, S. Y., Tan, J. Z., Bin Rusli, K. D., Ratan, R., Zhou, W., Lim, S., Lau, T. C., Seah, B., & Chua, W. L. (2023). Artificial intelligence versus human-controlled doctor in virtual reality simulation for sepsis team training: Randomized controlled study. Journal of Medical Internet Research, 25, e47748. https://doi.org/10.2196/47748

 

Viewing Message: 1 of 1.
Warning

Blog.nus accounts will move to SSO login, tentatively before the start of AY24/25 Sem 2. Once implemented, only current NUS staff and students will be able to log in to Blog.nus. Public blogs remain readable to non-logged in users. (More information.)