The Quiet Lookup: How Residents Are Using AI for Bedside Clinical Reference (And What the Evidence Actually Says)

#medical #ai #healthcare #clinicaltools

The Quiet Lookup: How Residents Are Using AI for Bedside Clinical Reference (And What the Evidence Actually Says)

Posted in: Clinical Practice, Medical Education, Technology in EM

It starts with a moment most EM residents recognize immediately.

You are mid-resuscitation. The patient has a complex medication list, an atypical presentation, and an attending waiting for your differential. You need to know whether that antibiotic is renally dosed, whether the rhythm you are seeing can be caused by the tox exposure in front of you, or whether the combination of drugs on the med rec actually matters. You know roughly what you need. You just need to verify it — fast, accurately, and without breaking the clinical flow to page pharmacy or excavate UpToDate for three minutes.

This is the gap that a growing number of trainees are filling with AI-assisted reference lookups at the bedside. And it is worth having an honest, evidence-grounded conversation about how it is being done, where it helps, and where it does not.

What Residents Are Actually Doing

Informal surveys within residency programs and online EM forums suggest that trainees are using large language models (LLMs) for a fairly specific and pragmatic subset of tasks — not asking AI to diagnose their patients, but using it the way a prior generation used a pocket reference card or a quick Micromedex query.

The five most common use cases reported:

1. Drug dosing and renal/hepatic adjustment. Calculating gentamicin for a patient with a creatinine of 3.1, or checking whether metronidazole needs adjustment in cirrhosis. These are time-sensitive, the stakes are real, and the information is well-established enough that a well-trained LLM handles it reliably. This is the highest-frequency use case by a considerable margin.

2. Drug interaction checks at the point of care. Not replacing pharmacy review for admitted patients, but getting a rapid signal on whether a medication in the hand is likely to interact with something on the patient's home list — particularly relevant in the fast-track or when pharmacy is managing three simultaneous consults.

3. Broadening a differential when the picture doesn't fit. A 2024 study published in Annals of Emergency Medicine noted that AI could "help clinicians make decisions by summarizing and presenting pertinent data regarding a given clinical question, possibly with an accompanying differential diagnosis or list of treatment options." The caveat, acknowledged by the same authors, is that ChatGPT showed inconsistency in atypical presentations. This aligns with the broader literature: LLMs perform well on common presentations and less well on rare or zebra diagnoses — which is, unfortunately, exactly where the stakes are highest.

4. Procedural reference — landmark anatomy, contraindications, equipment sizing. What size ET tube for a pediatric patient when your weight-based tape is not at your fingertips? Cricothyrotomy landmarks before a difficult airway? Quick procedural recaps are a legitimate and low-risk use case.

5. Quick literature synthesis. Asking for a one-paragraph summary of the evidence on a specific clinical question — not to replace reading the primary literature, but to orient oneself before a conversation with a consultant or a family.

What the Evidence Says

Let us be direct: the evidence base for LLMs as clinical decision support tools is early-stage and warrants real caution.

A 2025 scoping review in Academic Emergency Medicine confirmed that while AI-based clinical decision support shows promise, "robust prospective trials remain limited." A contemporaneous paper in JMIR Medical Informatics was more pointed: LLMs "struggle with rare and atypical cases common in emergency medicine" and "cannot reliably indicate uncertainty in their recommendations" — which is a meaningful failure mode in a department built on managing diagnostic uncertainty.

The STAT News investigation of a commercially deployed sepsis prediction algorithm in the ED — which "routinely missed signs of sepsis" — is a useful cautionary anchor. Not all AI is the same, and not all clinical tasks carry the same consequences if the AI gets it wrong.

Where the evidence is more supportive: established factual recall. Drug dosing, pharmacokinetics, well-characterized drug interactions, and procedural reference are domains where LLMs have demonstrated reasonable accuracy against verified medical databases. These are also domains where the underlying "correct answer" is relatively stable and verifiable — meaning clinicians can sanity-check outputs more easily.

The practical heuristic that has emerged in residency education circles: the more established and verifiable the fact, the more appropriate the AI lookup. The more the answer depends on synthesizing a complex, evolving evidence base or interpreting an atypical clinical scenario, the more the clinician's judgment needs to remain in the driver's seat.

The Workflow Question

One underappreciated dimension of bedside AI use is the interface. An LLM that requires navigating a consumer chatbot — with suggested prompts, promotional banners, and a UI clearly designed for something other than clinical work — creates friction and, frankly, creates an optics problem. Clinicians in clinical spaces are navigating perceptions from patients, supervisors, and colleagues about what they are doing on their phone.

Some residents have started using tools designed specifically to look and feel like a working document or plain notepad — the kind of interface that does not read as "I am distracted" to a patient or attending. Tools like totallynot.ai are built on this premise: a minimal text interface that surfaces AI-assisted clinical lookups without the visual noise of a consumer AI product. It is a small thing, but interface design has real consequences for actual clinical adoption.

The Honest Caveats

This article would not pass FOAM editorial muster without stating these clearly:

Always verify critical dosing decisions with pharmacy when time permits. AI-assisted lookup is triage-level reference, not a replacement for clinical pharmacist review.
LLMs hallucinate. Not often on well-established pharmacology, but it happens, and it is harder to detect when you are moving fast. Develop the habit of noticing when something feels off.
Bias in training data is real. Dosing recommendations and drug interaction data trained predominantly on certain populations may not generalize well. Apply the same critical lens you would to any clinical guideline.
Document your reasoning, not your AI lookup. The medicolegal and professional standard is your clinical judgment — AI reference tools are exactly that, reference tools.

The Bottom Line

Residents are using AI at the bedside for point-of-care reference, and the evidence supports cautious, task-appropriate use for well-defined factual queries. Drug dosing, interaction checks, and procedural reference are reasonable use cases. Complex diagnostic reasoning and rare presentations are not, at least not without treating the output with significant skepticism.

The FOAM community has spent a decade building a culture of critical appraisal around the medical literature. Applying that same rigor to AI tools — rather than wholesale rejection or uncritical adoption — is exactly the right frame.

Use the tools. Know their limits. Verify what matters.

The author is an emergency medicine physician with an interest in clinical informatics and resident education. This post reflects independent clinical and educational opinion. The author uses totallynot.ai as a bedside reference tool — a plain-interface AI lookup designed for point-of-care use — and has no financial relationship with the company.

References

Kareemi H, et al. Artificial intelligence-based clinical decision support in the emergency department: A scoping review. Acad Emerg Med. 2025.
Rajpurkar P, et al. AI in Emergency Medicine: Balancing Opportunities and Challenges. JMIR Med Inform. 2025.
Annals of Emergency Medicine. The AI Future of Emergency Medicine. Ann Emerg Med. 2024.
Goh E, et al. Large language model influence on diagnostic reasoning. JAMA Netw Open. 2024.
Katz M, et al. GPT versus resident physicians. NEJM AI. 2024.