Human Performance Studies & the Assessment of Telemedicine Programs Elizabeth Krupinski, PhD Presented at The American Telemedicine Association Conference April 18-21, 1999 Salt Lake City, UT Goal The goal of this presentation is to acquaint the reader with a few basic methods of assessment for telemedicine applications. The emphasis is on assessment of the diagnostic process and how it compares to the traditional in-person patient visit. A handout of references is available and references are made to other posters for specific applications. Rationale I There are numerous aspects of a telemedicine program that need to be evaluated in order to determine its utility and eventual success. Some of the most important aspects are: The diagnostic process (e.g., how good is the clinician’s accuracy) The technical design (e.g., what is the best network design) The economics (e.g., is it cost-effective) Rationale II Perhaps of most concern to the patient and the health care provider is the diagnostic process. What aspects are different from the traditional in-person visit? Will these factors influence diagnostic accuracy or the quality of the medical advice given via telemedicine? Methods Four major areas of investigation relating to the assessment of the diagnostic process are discussed: Comparison of existing technologies with telemedicine technologies & diagnostic performance Workflow analysis User satisfaction analysis Communication analysis A) Diagnostic Performance Some medical technologies have been modified specifically for use in telemedicine applications (e.g., the electronic stethoscope). For some applications new technologies or those from other areas have been put to use in telemedicine (e.g., the digital camera for teledermatology). In either case, the technology needs to be assessed in terms of how it impacts on the clinician’s diagnostic decision. Objective Methods Objective measures of observer performance are preferred to assess diagnostic accuracy in telemedicine and to compare it with diagnostic accuracy in the traditional in-person setting. These investigations typically are done in an experimental setting and the results are generalized to the clinical setting. See “Assessment of diagnostic accuracy using a digital camera for teledermatology” on Tues. in Teledermatology - Part II as an example of an objective performance study. The ROC Method The most widely used measure of diagnostic performance is the Receiver Operating Characteristic (ROC) method. It is a criterion-free, parameter-free, distribution- independent measure of diagnostic test performance. It goes beyond simple accuracy, sensitivity and specificity since it accounts for both true-positive and false-positive decisions. The ROC curve provides a straightforward graphic representation of performance. The area under the curve (Az) ranges from 0.5 = chance performance to 1.0 = perfect performance. Readers report their diagnostic decision and their confidence in the decision. The confidence values are used to generate the ROC curves. The ROC paradigm requires a “gold standard” for comparison (e.g., the in-person visit) of performance using two modalities or technologies. The “gold standard” is the “right” answer. The ROC Curve True-Positive Fraction 1.0 2 Modality 1: Az = 0.65 Modality 2: Az = 0.85 1 0.5 Chance Line: Az = 0.5 0 0 0.5 False-Positive Fraction 1.0 Correlation & Concordance Quite often you just want to know if clinicians make the same diagnosis in-person versus via telemedicine. Contingency analysis can measure the degree of correlation between inperson and telemedicine diagnoses. It is a variant of correlation analysis that works on nominal data such as diagnostic categories rather than numbers. The kappa statistic is becoming a very popular statistic to measure performance. It can be used to measure rates of agreement in diagnosis for telemedicine versus in-person visits. Kappa also takes disagreement rates into account, providing a more accurate measure of performance than contingency analysis does. The kappa statistic is a variation of the popular Chi-Square analysis. For both analyses there is no “gold standard”. You are only concerned with rates of agreement. Subjective Methods Subjective methods can be used to assess other factors that might impact on the diagnosis. For example, to judge the quality of a dermatologic image acquired with a digital camera versus seeing the patient in person, the photograph could be placed next to the patient and the dermatologist decides rates the photo as better, same or worse than in-person. Photos taken in a variety of lighting conditions could be placed together and the dermatologist must pick out the best one. This is known as a forced-choice design. Dermatologists could rate the digital photos as excellent, good, fair or poor in terms of a various parameters such as overall quality, sharpness and color. The percent of ratings in each category can be analyzed independently, they can be correlated with each other, with viewing time, and with the diagnostic decision. B) Workflow Analysis It may be useful to assess the impact of telemedicine on a clinic’s daily routine before and after implementation of telemedicine. Studies of this sort are generally observational, but try to look at very specific aspects of workflow or the working environment so they often involve objective measurements and can be analyzed with traditional statistical methods. See for example the poster: “Assessing case turn-around times in a university based telemedicine program”. The Referring Site Some things to examine at the referring site might be: Who does the patient spend time with (e.g., MD, PA, RN)? Does the health care provider spend more/less time with the patient if there is telemedicine? How much time does it take to prepare a case and who does the preparation? Would the patient actually make the trip for an inperson visit if telemedicine was not there? How would it impact on the patient’s routine if they had to travel for an in-person visit? The Consulting Site Some things to examine at the consulting site might be: Does the consultant’s workload increase with telemedicine? Is the consultant’s workday longer with telemedicine? Are only certain sub-specialties seeing telemedicine cases? Does the consultant require more or less patient history and other clinical information for a telemedicine case than an in-person visit? How far does the consultant have to go from their office/department to get to the telemedicine clinic? General Workflow Issues Some broader issues that can be examined as well are: How long does it take to generate a diagnostic report and get it back to the referring physician compared to an in-person specialty consult? How long does it take to get a telemedicine appointment compared to an in-person visit? How many people are involved in setting up and conducting a telemedicine session compared to an inperson visit? How do all of these issues impact on business and economic models? C) Satisfaction Analyses User satisfaction is a very important aspect of almost any endeavor and telemedicine is no exception. Practically anybody involved in telemedicine can be queried about their satisfaction with things, but the main three typically looked at are: the patient, the referring clinician and the consulting clinician. See for example the poster “Case volume, response times and user satisfaction with a university-based teleradiology system” Creating Surveys Creating a good survey is often harder than one would imagine. Surveys should be: Easy to read, using simple language appropriate for the intended audience (e.g., at about the 8th grade level for the general patient population). Translated into other languages if appropriate. Short and to the point. Surveys longer than 1 page have less chance of being completed than a 1-page survey. Address only the issues you are interested in. Do not use a shot-in-the-dark approach and ask too many questions hoping for something of interest to appear. Survey Styles I There are many ways to ask questions. The choice depends on what kind of information you want to get and in what detail. Fill-in-the-blank. This is appropriate for information that changes a lot or has no set answer. Patient name_____________________ Patient address____________________ Yes/No questions. This is very straightforward, but does not allow for gradations in answers. Were you satisfied with the telemedicine session? • Yes • No Could you see the telemedicine doctor clearly? • Yes • No Survey Styles II Likert scales. A Likert scale allows you to get a range of responses that may better reflect subtle differences in answers or opinions. Were you satisfied with the telemedicine session? • Very satisfied • Satisfied • Somewhat satisfied • Somewhat Unsatisfied • Unsatisfied • Very unsatisfied). Open-ended questions. Open-ended questions can yield much information, but can be difficult to analyze because there is generally no structure to the answers and people can interpret the questions differently. What did you like about the telemedicine session? Analyzing the Surveys Survey return rates vary considerably. Return rates will be better if surveys are filled out immediately after a telemedicine session while the respondent is still at the telemedicine facility. Sampling methods may be useful if compliance is low. For example, survey every fifth patient and put the time into insuring individual compliance. For most survey styles, the answers can be analyzed with traditional statistical methods. D) Communication Analysis A very important part of the patient-clinician interaction is communication. Telemedicine changes the ways that communication occurs since the patient and clinician are no longer in the same place. Communication analysis is particularly important for real-time telemedicine sessions. It is also important for store-forward sessions because the patient is likely to have returned home by the time the consult takes place and the referring clinician receives any feedback from the consultant. Methods I If the patient agrees, real-time sessions can be observed live or can be video-taped for later analysis. Prior to the observations, you must decide what aspects of the interaction are important. For example: How often does the patient speak & for how long? How often does the consultant speak & for how long? How often does the referring clinician speak & for how long? How many interruptions occur? How much eye contact is made? How often do questions have to be repeated? Methods II The field of psychology has spent years analyzing non-verbal means of communication to try and understand how people feel without actually asking them. People may respond to questions positively because they want to please the examiner. Body language often reveals just the opposite. For example: How does the patient/clinician sit? Slouched? Straight? What are the arms doing? Crossed? Hanging to the side? What are the hands doing? Folded? Tapping a pencil? Where are the eyes? Toward the speaker? Roaming? Facial expression - scowling or smiling? Discussion Traditional methods of experimentation, observation, and analyses from the social sciences, medical image perception, communication theory, research design and statistics are available to anyone to conduct assessment investigations in telemedicine. A well-designed study or assessment protocol can answer many questions and provide ways to strengthen a program and improve the delivery of care and user satisfaction. Discussion The keys any successful assessment protocol are: Decide beforehand what the important questions are. Design your study to answer those questions. Know your study population and what types of investigation or surveys they will tolerate. Be prepared to change the protocol if necessary. Be flexible. Utilize appropriate statistical methods whenever possible.