Clinical microbiologists are a critically important component of the healthcare team. We develop, validate, and perform the laboratory tests that guide patient management decisions related to infectious diseases. Naturally, then, the validation of the tests we use is very important. How do we usually go about validating tests in the clinical microbiology laboratory?
"Oh doctor, please tell us there is at least a 93% chance our child's diagnostic test result is accurate" - "The Doctor" by Luke Fildes. Source.
Generally, a test will first be evaluated with specimens that are positive or negative (as defined by a reference method) for the test target to establish its performance. It will then move on to a trial with patient specimens, where it is compared to a widely accepted and trusted assay already in clinical use – a “gold standard.” Sometimes a collage of tests is required to determine which patients truly are positive vs negative for the condition in question. After the group of patient specimens has been evaluated with both the new test and the standard, or comparator, tests, it is time to consult the statistics textbook. The sensitivity and specificity are easily calculated. In brief, the sensitivity measures the ability of the test to capture true positives, while the specificity measures the accuracy of the results for patients without the condition. There are other common measures of test technical performance, including the negative predictive value (NPV) and positive predictive value (PPV). The NPV and PPV are influenced by the prevalence of the condition in question in the population being tested, and are measures of the reliability of a negative result predicting the absence of the condition (NPV) or a positive result predicting that the condition is present (PPV).
Such diagnostic accuracy cohort studies are fairly inexpensive and simple to perform. However, there are disadvantages to this approach. All of the measures of the new test performance are evaluated relative to the performance of the comparator method. What if the comparator method isn’t very good? If the new test is markedly more sensitive than the comparator method, it won’t get extra credit in the sensitivity measure--it will get a bad mark for specificity. Another element to consider is that studies indicate that basic statistical measures of technical performance are often not taken under consideration by health care providers, and additionally are often misunderstood.
Avedis Donabedian, an early advocate of improving methods for evaluating the quality of healthcare, stated that “many advantages are gained by using outcome as the criterion of quality in medical care.” Patient outcomes may include direct measures such as survival, restoration of function, or less direct surrogates of morbidity such as length of stay for hospitalized patients or other condition-specific indicators of patient recovery. Clinical microbiology test evaluations typically are not designed, or powered, to determine the impact of diagnostic tests on patient outcomes.
There are different study designs that can help determine if there is an impact of the test in question on patient outcomes. Observational studies, in which subjects are not randomly assigned to study arms, are termed quasiexperimental. Anticipated confounders can be controlled, but selection bias is still a serious threat to study validity. A common form of quasiexperimental study design used to evaluate diagnostic tests is one comparing groups of patients before and after the implementation of a new test. This approach can easily be compromised by other changes in patient or population health management, or by seasonal or other differences in infectious risk, that are likely to overlap the study period asymmetrically. Gordon Guyatt, a prominent advocate of evidence-based medicine, has stated that “establishing patient benefit often requires randomized controlled trials”. Randomized controlled trials (RCTs) are the standard for the evaluation of new therapeutic interventions. You may be familiar with RCTs evaluating new antimicrobial products or new applications of old ones, but RCTs are somewhat scarce in the world of diagnostic tests. In an RCT for a diagnostic test, patients are randomly assigned to be tested with either the comparator (“standard-of-care”) or the new test, the results are reported, the clinical team responds to the result, and the patient is followed. RCTs have the advantages that potential confounders should be equally distributed between study arms, and selection bias is eliminated. Health outcome studies must be carefully designed, determining the appropriate size, target population, patient or population outcomes to examine for a potential impact, follow-up period, healthcare system financial impacts, and other study-specific considerations. They may be sponsored by the manufacturer who developed the test, or require funding from the NIH, CDC or other national agencies. Outcome studies aren’t always worth the additional time and expense - for example comparing one nucleic acid amplification test (NAAT) to another with similar technical performance would not be justified. When a substantive difference in test methodology exists between tests for a particular condition, however, an outcome study may help determine which test, if either, is preferable for routine use.
Organism or disease specific examples:
Pneumocystis jiroveci is a relatively common constituent of the lower respiratory tract among both patients and people who aren’t patients, although testing is generally performed for immunocompromised hosts. How “sensitive” should a test for P. jiroveci be? If there are only a few P. jiroveci cells in a respiratory specimen, do you want to know about that? Some laboratories test for P. jiroveci with a NAAT, while other labs use an immunofluorescent assay (IFA) to detect organisms by microscopy, with the latter requiring a higher burden of organism to generate a positive result. Which method is more appropriate? Certainly the performance characteristics are different, but without a formal comparison that includes patient outcomes in the study design, preferences for one or the other method are financial or philosophical.
Clostridium difficile is a common cause of recurrent, inflammatory disagreements. Tens of thousands of people die of C. difficile infections each year in the US alone, and the associated healthcare expenditure is similarly burdensome, so this is clearly a high-stakes game. NAATs for C. difficile are more sensitive for detection of the presence of the organism than are antigen detection-based tests, but it isn’t clear if this superior sensitivity is actually advantageous, as the presence of the organism is sometimes simply colonization rather than contributory to disease. While there have been intriguing studies evaluating differences in patient outcomes over multi-year periods before and after changing from antigen detection methods to NAAT for C. difficile, this sort of confounded retrospective analysis is unlikely to yield clear answers. A clustered multi-institution RCT comparing NAAT to antigen detection (or hybrid antigen detection / NAAT algorithms) with outcomes followed for both patients and communities (or institutions - to monitor the impact of reduced transmission) could help provide answers where today we have debate.
Similarly, there has been a great deal of deliberation about all of the newer syndromic, highly multiplexed, NAAT tests. These tests target groups of organisms that can cause similar symptoms (respiratory disease, diarrhea, blood-stream infection, meningitis) and can include more than 20 targets. They are also expensive, and many of the targets, if detected, do not suggest a clear path forward for patient care. If you detect rhinovirus in a patient with a respiratory illness, or astrovirus in a patient with diarrhea, what should you do? There is no specific treatment for many of the targets on these panels, and some can be constituents of normal microbiota, particularly for the diarrhea syndromic panels. Additionally, the detected nucleic acid may be leftover from a previous infection rather than being the cause of current symptoms. Evaluating the technical performance of such tests has some value, but it is the patient outcomes that really count. Such a panel for blood culture isolates (directly from positive bottles) was recently evaluated in an RCT compared to standard workup (which included MALDI-TOF for identification) and found that the use of the multiplex NAAT had a significant influence on antibiotic utilization, but was unfortunately not powered to detect an impact on patient outcomes. A meta-analysis of more than 30 quasiexperimental studies comparing rapid molecular tests to conventional tests for patients with bloodstream infections did find a positive impact on patient outcomes, although on subgroup analysis the positive impact was only observed for studies in which active antimicrobial stewardship program influenced provider decisions. Other diagnostic test RCTs have helped establish the clinical impact of testing for HPV, for example, and two recent RCTs found that a popular Mycobacterium tuberculosis NAAT, compared to smear review and culture, had a significant impact on time to diagnosis and antimicrobial use, but no impact on morbidity or mortality. The same test may lead to considerable healthcare cost savings, however, in a different study setting.
We need more high-quality evidence for the tests that we offer, and the trend towards more outcome studies in our field in recent years is encouraging. Health outcomes studies will be more costly and time consuming, and we may not like all of the answers, but if we want to have a positive influence on patient outcomes we need to let patient outcomes have more of an influence on some of our test evaluations. Do you know of a particular clinical microbiology testing target that you think would benefit from an outcome study? How about an important RCT in our field that should be mentioned, or emulated in further studies? Join the conversation by using the comment box below.
The above post reflects the thoughts of its author, Dr. Matthew Pettengill, and not the American Society for Microbiology.