Skip to content

Gordon Guyatt on EBM’s Evolution: Core GRADE, AI-Assisted Appraisal, and Patient-Centred Practice

2026-01-01

Author(s): Scott Douglas Jacobsen

Publication (Outlet/Website): The Good Men Project

Publication Date (yyyy/mm/dd): 2025/11/27

Professor Gordon Guyatt is a Canadian physician, health researcher, and Distinguished Professor at McMaster University, widely recognized as the pioneer of evidence-based medicine (EBM). He coined the term “evidence-based medicine” in 1991, fundamentally transforming how clinicians worldwide evaluate research and make patient care decisions. Guyatt has authored or co-authored thousands of influential papers and is among the most cited health scientists globally. He has also led the development of the GRADE framework for grading evidence and guidelines. His leadership, mentorship, and prolific contributions have profoundly shaped modern clinical epidemiology and guideline development, cementing his legacy in global health research.

Scott Douglas Jacobsen: Outside of epidemiology, was there a discipline in medicine that was an early adopter of EBM? A field that was relatively friendly to it — maybe had some challenges, but nothing foundational — so its acceptance and use of the EBM model was easier?

Prof. Gordon Guyatt: Within medicine, it was general internal medicine — or perhaps more broadly, general medicine — that was the quickest to adopt it. Surgery, much less so. Obstetrics and gynecology, less. Psychiatry, also less.

When you look outside of medicine but within the clinical health sciences — so, nursing, rehabilitation, and related fields — they have now fully integrated EBM principles into their training and practice. So, nursing and allied health fields picked it up not too long after medicine did. Clinical internal medicine was first. Other specialties were slower. Nursing and the rehab sciences came on board not much later. They have incorporated it well.

Now, I am trying to remember another field — I think you are recalling when we discussed resistance in certain areas. For example, there was some pushback in certain parts of oncology. They found it challenging to adapt, partly because we were also talking about GRADE at the time.

Jacobsen: Yes — that was the area I had in mind. There has been notable resistance in oncology and cardiology, correct?

Guyatt: Exactly. Cardiology has developed its system, which has led to its unique approaches. PJ Devereaux, for instance, works in cardiology and has been part of that context. Oncology has also been slower to adopt GRADE fully, although progress is being made.

Jacobsen: That is a good transition. The seven-part Core GRADE framework, which covers essentials, risk of bias, publication bias, indirectness, and translating evidence into recommendations, has now been published. What motivated the launch of the Core GRADE series at this time? What gap did you aim to fill?

Guyatt: I have been making the case for a few years now that, in various ways, the Clinical Evidence and Clinical Epidemiology worlds — which initially aimed to serve the audiences of systematic review authors and guideline developers, but also to help clinicians use medical literature — have become too complicated.

There is always a balance between methodological rigour and simplicity. There is always a trade-off, and one must try to find that balance. But we have lost that balance. As a result, instruments for assessing the risk of bias, for example, have become much too complicated.

GRADE itself has become overly complex and no longer exceptionally well-organized, which makes it difficult for many users. First, we published a series of papers in The BMJ aimed at clinicians. That was a six-part series. It went well.

People still use it. Then, we initiated a series in the Journal of Clinical Epidemiology aimed at systematic reviewers and guideline developers. These individuals create evidence syntheses and then use them to develop guidelines.

The first fifteen papers or so covered the basics and were exceptionally well-positioned. There are now about fifty such papers — maybe between forty and fifty — in the Journal of Clinical Epidemiology. Additionally, numerous related papers have been written by individuals outside the GRADE Working Group.

Early on, we created the GRADE Working Group. I co-chaired it with Andy Oxman when we got started. Andy was the first chair; he and I worked closely in the beginning and collaborated on the first papers in the JCE series. Over the past fifteen years or so, I have been the co-chair of the GRADE Working Group.

However, another development over time has been that the GRADE Working Group has become increasingly bureaucratic. They keep producing new papers — some of which push the methodological frontier — but if you look at the whole body of work now, it isn’t easy to separate what the core principles are from the updates and refinements. Some updates are crucial, while others are so specialized that they are practically irrelevant for most users. It has become increasingly complex to keep track of everything — which is a significant problem.

I decided to address this. I concluded that what was needed was a series of papers that set out the essentials of GRADE — which we decided to call Core GRADE. I encountered quite a bit of resistance — or, more accurately, obstruction — from the leadership within the GRADE Working Group. They wanted me to stick with the increasingly bureaucratic processes.

So, I said, “Bye-bye.” I stepped down from the leadership of the GRADE Working Group. Then, with a group of colleagues — some of whom are still prominent members of the Working Group — I put together this new series of papers in The BMJ. This series outlines the essentials of GRADE, which we refer to as Core GRADE.

It is explicitly designed for paired comparisons — treatment A versus treatment B. It does not cover prognosis, diagnostic test accuracy, or network meta-analysis that connects A to B to C and so on. It is the fundamental GRADE approach for pairwise comparisons.

The goal is to offer a one-stop shop. If you read these seven papers, you have what you need. You do not have to wade through the now massive — and poorly organized — wider GRADE literature to find the basics.

We are happy with it. The initial reception seems reasonable. The final paper of the seven-part series was published just two or three weeks ago, so it is still early days to see how people will respond. However, the response has been positive so far.

Jacobsen: What about the uptake of Core GRADE in low-resource settings or among early-career researchers?

Guyatt: That is exactly who it is for. In low-resource settings, there are often fewer people with high-level methodological expertise. Likewise, early-career researchers might not yet have that expertise. These are the people who want to know: “What are the basics we need to do a good job?” This series is designed to provide that answer.

Jacobsen: On the articles about rating certainty of evidence — the risk of bias, indirectness, and so on — which do you view as more transformative for guideline developers, and why?

Guyatt: Transformative relative to what? 

Jacobsen: The risk of bias versus indirectness versus publication bias, and so on, having the most significant impact.

Guyatt: They are all important. I’m not sure if they are equally important, but there is not a huge gradient where one outweighs the other by an order of magnitude. The only one that is perhaps less impactful — because it is so difficult to sort out — is publication bias.

Publication bias happens when people conduct studies — usually with negative results — and then do not publish them. So we are left guessing: “What studies might be out there that we have not seen?” It is hard to know for sure. But apart from that, the other four reasons for rating down — the risk of bias, inconsistency, indirectness, and imprecision — are all significant.

Jacobsen: When we did one of our first interviews, you mentioned that some of PJ Devereaux’s work was going to be potentially impactful. I can not recall all the specifics of his research program at the time. How has his work progressed over the last two years?

Guyatt: Oh, PJ is remarkable. He now has eight first-author papers in The New England Journal of Medicine — not many people in the world can claim that.

One of the most transformative aspects of his work is that we now systematically look for asymptomatic myocardial infarctions — heart attacks — in high-risk patients undergoing surgery. After surgery, patients are often sedated or on pain medication, so they may not show the classic symptoms of a heart attack. They could have one and never know it unless you look for it.

Thanks to PJ’s work, monitoring for these silent events in high-risk patients has become standard practice. That is a genuinely transformative change in perioperative care. There have also been significant advances in how we manage these patients and in the treatment we provide once such events are detected.

Jacobsen: There is a lot of hype about AI these days. Is AI being introduced into clinical epidemiology or your evidence frameworks at all?

Guyatt: Yes. AI will make a significant difference in certain areas, although not all of clinical epidemiology. It will, however, greatly aid in image interpretation and pathology. For example, AI can help radiologists read X-rays more accurately or assist pathologists in interpreting biopsy slides with greater consistency and accuracy.

Another potential use is in prognosis, where patient characteristics are used to estimate the likelihood of outcomes. AI may help there. However, it will not resolve the fundamental issues with assessing treatments — because non-randomized observational studies will always have biases, regardless of whether AI is used or not.

However, in processing the evidence, AI can have a significant impact. For instance, it can already do quite reasonable risk of bias assessments. One project I am currently involved with is establishing a framework that enables AI to produce high-quality GRADE assessments.

This still requires substantial human input. First, the investigator must specify the question very clearly — using the Patient-Intervention-Comparison-Outcome (PICO) structure. Then, other guiding questions must be provided to direct how the AI should make its judgments.

Fortunately, in the end, you can prompt the AI to explain and justify all its decisions. It will produce a detailed justification for each decision point, allowing a human to verify that nothing has gone off track.

We are also examining how to present the information in a manner that ensures the AI makes the correct decisions — and defining the rules it should follow to do so. The Core GRADE series has been helpful in this process because it includes a whole set of algorithms and flowcharts: If A, then B; if B, then C, and so on. That is precisely the kind of logic you have to provide to large language models so they can handle these tasks properly. They need precise algorithms.

The algorithms we developed in Core GRADE have proven to be very useful. Ultimately, this will significantly streamline what is currently a time- and resource-intensive process.

Jacobsen: Do you think the most significant benefit will be saving the time of doctors and researchers?

Guyatt: It will save researcher time in my area — I cannot speak to basic science or other fields. But for us, it will make processing the evidence much more efficient. There is no question about that.

It has not arrived in a significant way to support our daily workflows yet. Honestly, I am surprised it has not — given all the other so-called miracles AI is achieving. But it will happen, for sure.

In terms of saving doctors’ time, the most significant gains I see are in interpreting images — radiology, pathology, that kind of thing. Another area is mental health: it is almost certain that if you cannot access a psychologist, you will be able to use a virtual psychologist who does a pretty good job. That appears to be emerging as a real option, which is quite extraordinary.

For example, in Canada, psychiatrists are covered by public health insurance. Still, psychologists are not — and psychologists are expensive. In places where psychiatric care is not covered, it is similar. So, in the future, it is pretty predictable: if you do not have much money, you will use a virtual psychologist; if you have more resources, you will see a human.

Jacobsen: That seems to lower the threshold for access to many services by the sound of it. So, looking at the early responses to your recent lectures and articles, you get two broad types of reactions: one group asking fundamental questions — equivalent to “What is your name?” or “What was it like when you first started?”— and another group praising you for being the leader and originator of EBM. In other words, a sign of how long you have been around and the influence you have had.

But as part of your academic and research life, you have mentored many people. Are there particular researchers you have trained who have, in turn, made a significant impact on your research — or even changed the way you think about specific questions and pursue new lines of work?

Guyatt: Interesting question. I thought you were going to ask about people who have made significant contributions — that would have been an easy list to give. In terms of people who have shaped my thinking, there are fewer.

The most innovative person in that sense, whom I trained, is Victor Montori, who is now a professor at the Mayo Clinic. In terms of ideas — particularly about shared decision-making and the burden of care that we place on patients — he has changed my thinking.

Jacobsen: Is his concept of Minimally Disruptive Medicine an extension of EBM? So, in terms of the burden we place on patients, Minimally Disruptive Medicine is exactly what Victor is advancing, right?

Guyatt: It is the core idea behind Minimally Disruptive Medicine. It is a nuanced approach. Minimizing harm is, of course, a central principle in both psychology and medicine. However, this is about operationalizing that idea in a very patient-centred, real-life manner.

It is not uncommon for patients to have multiple chronic conditions — multimorbidity is very common. Many of the patients I see have hypertension, diabetes, hyperlipidemia, osteoporosis — sometimes all at once.

So, they end up on anywhere from one to five medications for each condition. Add it up, and you might see someone on 15 to 20 different medications.

Jacobsen: Fifteen to twenty! That surprises me.

Guyatt: It is not unusual at all. A list of 15 or more prescriptions is routine for many older patients. On top of that, the same patient is also told to stop smoking, change their diet, and exercise — which, for many people, are more complicated than taking pills.

And if they do not manage to do all this, they are sometimes blamed for not “taking care of themselves.” Victor’s point is that we, as clinicians, often do not consider the cumulative burden of all these demands and how they fit — or do not fit — into people’s everyday lives.

Jacobsen: So, what Victor is doing is combining the principle of minimizing risk with the idea of preserving quality of life — and doing so practically, not just theoretically. How do patients typically describe this experience — for example, if they are on 15 medications and they are not someone like Ray Kurzweil, who takes dozens by choice? What do they say about having to make all these pills part of their daily routine?

Guyatt: That is a good question. And you are making me realize that I do not often ask it directly. It is a fair point. It is remarkable what patients tolerate. If they are non-adherent, we might find out indirectly. Sometimes, they show up. Yourealize they are not taking all their medications.

However, if it is truly impairing their quality of life, it does not always emerge spontaneously. I do not think I have ever directly asked someone, “How do you feel about having to take all these pills every day?” What often happens is that I will ask them to tell me what medications they are on, and sometimes they pull out a big plastic bag and start taking bottle after bottle out of it.

So, yes — that is the way it is. However, you make a good point: I should start asking patients more directly how they feel about it. 

Jacobsen: Maybe you should give them a simple, blunt questionnaire: “Does this make your life more dreary? Yes or no?” Is it five pills, or 10, 15, or 30? And somewhere, there has to be someone taking a couple dozen or more. Let’s switch gears to something a bit more research-focused. The 2025 network meta-analysis on diabetes management — could youwalk us through some of the key findings?

Guyatt: Sure. Over the last fifteen years or so, a significant methodological innovation has occurred. Initially, meta-analysis enabled us to combine all randomized trials comparing treatment A versus B and obtain the most comprehensive estimates for outcomes such as mortality, heart attacks, strokes, and side effects.

What emerged about fifteen years ago, with many refinements since then, is network meta-analysis. It allows us to do this not just for A vs. B but for A vs. B vs. C vs. D vs. E, F, and G. So now, we can compare multiple options simultaneously and get a clearer sense of the relative merits of all alternatives.

In diabetes, this is very relevant because there are many classes of medications. With network meta-analysis, we can show, “Here is how class A stacks up against class B,” and even compare different drugs within the same class — A, B, C, D, and E.

For many years, the pattern was that we had lots of drugs that were good at lowering blood sugar. Still, they did very little — or almost nothing — to prevent strokes, heart attacks, or premature death. There was a disconnect: We controlled blood sugar, but patients still died or suffered cardiovascular events at the same rates.

In recent years, though, two major classes of medications have emerged that are not particularly great at lowering blood sugar but do reduce strokes, heart attacks, premature death, and kidney failure. That has been a considerable change — now we have drugs that impact patient-important outcomes, not just blood sugar readings.

Jacobsen: In these network meta-analyses, are there any methodological cautionary notes you think should be included when people interpret the findings?

Guyatt: Not if it is done properly. Network meta-analysis is more complicated than a pairwise meta-analysis. There are more assumptions, more parameters, and more things that can go wrong. Therefore, the risk of error is higher if it is not performed carefully. But if you do it right, then there is nothing inherent to network meta-analysis that makes it less trustworthy by nature.

Jacobsen: Thank you for the opportunity and your time, Gordon. 

Last updated May 3, 2025. These terms govern all In-Sight Publishing content—past, present, and future—and supersede any prior notices.In-Sight Publishing by Scott  Douglas  Jacobsen is licensed under a Creative Commons BY‑NC‑ND 4.0; © In-Sight Publishing by Scott  Douglas  Jacobsen 2012–Present. All trademarks, performances, databases & branding are owned by their rights holders; no use without permission. Unauthorized copying, modification, framing or public communication is prohibited. External links are not endorsed. Cookies & tracking require consent, and data processing complies with PIPEDA & GDPR; no data from children < 13 (COPPA). Content meets WCAG 2.1 AA under the Accessible Canada Act & is preserved in open archival formats with backups. Excerpts & links require full credit & hyperlink; limited quoting under fair-dealing & fair-use. All content is informational; no liability for errors or omissions: Feedback welcome, and verified errors corrected promptly. For permissions or DMCA notices, email: scott.jacobsen2025@gmail.com. Site use is governed by BC laws; content is “as‑is,” liability limited, users indemnify us; moral, performers’ & database sui generis rights reserved.

Leave a Comment

Leave a comment