Episode 61
Research Day: Ambient Scribes and Emotions
Educators in Medicine,
In this newsletter, we continue our journey through the fundamentals of AI, its applications in medicine, and its transformative role in faculty development and education. Let’s dive into learning.
I want to introduce to you all Mena Armosh who took lead on this project we did together. Mena is an undergraduate student at the University of South Florida. Great to see him take this from idea to fruition at USF Health’s Research Day. Even my 13 month old son enjoyed it. Check it out!
Can AI Scribes Handle Emotional Patients?
What our study tells us about the reliability of AI medical documentation — even when patients aren’t calm
The Problem with Real-World Speech
In reality, patients don’t always speak in a calm, clear, neutral tone. Someone with chest pain might sound scared. Someone describing a chronic issue might sound frustrated or sad.
Emotional speech can change how we talk — affecting our pitch, speed, and volume. Because of this, researchers wondered whether a patient’s emotional tone might affect how well AI scribes understand and document medical conversations.
The Study
To investigate this question, a small research study tested the limits of an AI scribe system.
Using standardized patient conversations representing four common medical complaints — shortness of breath, palpitations, chest pain, and abdominal pain — the researchers used software to generate audio recordings of these scenarios with different emotional tones. Each conversation was converted into joyful, neutral, and sad versions while keeping the medical information exactly the same, producing 36 total audio recordings.
These recordings were then processed using an AI scribe platform called DoxGPT, which converts spoken conversations into clinical documentation. After the notes were generated, the researchers compared them to the original “correct” transcripts to see how accurate the AI was.
To measure performance, they looked at several factors:
F1 score
Word Error Rate (WER) — including omission, insertion, and substitution errors
Variation in Differential Diagnoses
The Results
The results were promising.
Out of 36 encounters, the AI scribe produced perfect documentation in 35 cases, experiencing only one small substitution error in a chest pain scenario. Ultimately, the system showed no meaningful difference in performance across emotional tones. Whether the patient sounded joyful, neutral, or sad, the AI generated nearly identical and highly accurate medical notes.
The study also looked at the list of possible diagnoses the AI suggested. In most scenarios, the leading diagnoses remained consistent across all emotional tones. There was, however, some minor variation in certain conditions — which may be an interesting area for future research.
So What Does This Mean?
These findings suggest that AI scribes appear to remain reliable even when patients speak with different emotions. This matters because emotions are a normal part of real patient conversations. Patients may be anxious, upset, or relieved, and the technology still needs to function accurately in those situations.
If AI scribes continue to perform well, they could help reduce the documentation burden that many doctors face today. That could mean doctors spending more time listening to patients and less time typing — which may improve both patient satisfaction and physician well-being.
While more research is still needed, this study offers an encouraging sign that AI tools may be able to support healthcare providers without being thrown off by the natural emotions that come with human conversations.
Ultimately, this could help bring the focus of medicine back where it belongs — on the patient.
💌 As always, thanks for reading. Get in touch and let me know your thoughts!
Thank you for joining us on this adventure. Stay tuned for more AI insights, best practices, and more future editions of AI+MedEd.
For education and innovation,
Karim
Share this with someone - have them sign up here.


