Medical Students Show How ChatGPT Can Assist Qualitative Research in Medical Education
New research from the University of Cincinnati College of Medicine is drawing attention for an interesting reason: it explores how ChatGPT, a widely available artificial intelligence tool, can support qualitative research, particularly thematic analysis. What makes this study stand out is that it was led by medical students themselves, focusing not on theory alone but on practical, real-world academic work.
The study was published in the peer-reviewed journal Medical Science Educator and looks closely at whether ChatGPT can meaningfully assist researchers when analyzing open-ended survey data. Qualitative research is a core part of medical education research, yet it is often time-consuming and dependent on careful human interpretation. This project set out to examine how far AI can help without replacing human judgment.
Who Conducted the Study and Why It Matters
The research was led by Jonathan Bowden, a third-year medical student from Pickerington, Ohio, and Megha Mohanakrishnan, also a third-year medical student, from San Jose, California. Both researchers completed their undergraduate degrees in medical sciences at the University of Cincinnati before entering medical school.
They worked under the mentorship of Andrew Thompson, Ph.D., a professor-educator in the Department of Medical Education. During their first year of medical school, Bowden and Mohanakrishnan served as learning assistants for Thompson. That experience introduced them to an existing research project analyzing student responses to an emotionally significant part of medical training: cadaveric dissection.
The project originally involved examining survey responses from first-year medical students who were asked to share their thoughts and emotions about working with human cadavers as part of their coursework. This context provided an ideal dataset for qualitative analysis, since responses were deeply personal, reflective, and varied.
Understanding the Original Qualitative Analysis
Before introducing AI, the research team conducted a manual thematic analysis, which served as the gold standard for the study. This process involved carefully reading through each survey response and identifying recurring emotional and conceptual patterns.
Several consistent themes emerged. Students frequently expressed gratitude toward body donors and their families, along with appreciation and excitement for the opportunity to learn human anatomy in such a meaningful way. At the same time, many responses reflected nervousness, discomfort, and apprehension, especially from students encountering cadaveric dissection for the first time.
Once the team established these themes manually, they turned their attention to the main research question: could ChatGPT identify similar themes with reasonable accuracy?
Why ChatGPT Was Chosen
The researchers intentionally selected ChatGPT because it is free, widely accessible, and easy to use, especially for students and educators who may not have access to expensive research software or custom AI tools. The goal was not to test a specialized system, but rather to see what a commonly available AI could realistically contribute to academic research.
Their curiosity centered on efficiency. Qualitative research often requires hours of manual coding, especially with large datasets. If AI could assist with this process, even partially, it could save researchers significant time while maintaining analytical rigor.
The Three AI Prompting Methods Tested
To evaluate ChatGPTโs performance, the team tested three different prompting methods, running each method three separate times to ensure consistency.
Method One involved providing ChatGPT with a list of predefined themes along with their definitions. The AI was instructed to code student responses using only this information.
Method Two expanded on the first approach by adding 25 example responses that had already been manually coded by the researchers. These examples included the assigned themes and brief explanations. ChatGPT was instructed to reference these examples while coding new responses.
Method Three was the most interactive and iterative. ChatGPT was asked to code each of the 25 example responses one by one. After each attempt, the researchers reviewed the output, pointed out missing or incorrect themes, and asked the AI to revise its theme definitions accordingly. Once this training-like process was complete, ChatGPT analyzed the remaining responses using the updated definitions.
Among the three approaches, Method Three produced the highest agreement with the human-coded results, demonstrating that increased engagement and feedback significantly improved the AIโs performance.
What the Results Showed
Across all three methods, ChatGPT demonstrated more than 80 percent agreement with the manually coded themes. This level of agreement suggests that AI can reliably identify major patterns in qualitative data when properly guided.
However, the researchers were careful to note important limitations. While accuracy was high, precision varied depending on the method used. The most interactive method achieved better overall alignment with human coding, but sometimes applied themes more broadly than expected. This reinforced the idea that AI assistance must be carefully monitored, not blindly accepted.
The study made it clear that human oversight remains essential, especially when dealing with sensitive or emotionally complex data. ChatGPT was not presented as a replacement for researchers, but as a support tool that can enhance efficiency when used thoughtfully.
Recognition and Educational Impact
The project earned the student researchers a national award, highlighting its contribution to the growing conversation around AI in education and research. Beyond the technical findings, the experience also provided valuable professional development.
The study allowed the students to gain hands-on experience in research planning, execution, data analysis, and academic publication. It also offered insight into how emerging technologies may shape future medical education and scholarship.
Why This Research Matters for Medical Education
Qualitative research plays a critical role in understanding student experiences, professional identity formation, and emotional responses within medical training. Introducing AI tools like ChatGPT into this space raises important questions about efficiency, consistency, and ethics.
This study shows that AI can assist with repetitive analytical tasks, potentially freeing researchers to focus more on interpretation, discussion, and educational improvement. At the same time, it emphasizes that AI should complement, not replace, human expertise.
A Broader Look at AI in Qualitative Research
Outside of medical education, AI-assisted qualitative analysis is gaining attention across social sciences, psychology, and education research. Large language models are particularly well-suited for identifying patterns in text, but they depend heavily on clear instructions, strong examples, and ongoing human input.
As AI tools continue to evolve, studies like this one provide practical guidance on how they can be used responsibly. Rather than speculative promises, this research offers measured evidence of what AI can and cannot do in real academic settings.
Final Thoughts
This student-led study from the University of Cincinnati offers a grounded, realistic look at how ChatGPT can support qualitative research. It shows that with careful prompting and active human involvement, AI can become a useful assistant in academic analysis. Just as importantly, it reinforces that thoughtful design and ethical awareness are essential as AI becomes more integrated into education and research.
Research paper reference:
https://link.springer.com/article/10.1007/s40670-025-02587-2