Computer-based test interpretation in psychological assessment

Computer-based test interpretation (CBTI) programs are technological tools that have been commonly used to interpret data in psychological assessments since the 1960s. CBTI programs are used for a myriad of psychological tests, like clinical interviews or problem rating, but are most frequently exercised in psychological and neuropsychological assessments. CBTI programs are either empirically based or clinically based. The empirically based programs, or actuarial assessment programs, use statistical analyses to interpret the data, while the clinically based programs, or automated assessment programs, rely on information from expert clinicians and research. Although CBTI programs are successful in test-retest reliability, there have been major concerns and criticisms regarding the programs' ability to assess inter-rater and internal consistency reliability. Research has shown that the validity of CBTI programs has not been confirmed, due to the varying reports of individual programs. CBTI programs are very efficient in that they save time, reduce human error, are cost effective, and are objective/reliable, yet limited in that they are not always used by adequately trained evaluators or are not integrated with multiple sources of data. As technology continues to transform our modern society, computer-based interpretation programs have the possibility to expand their software and even alleviate some of the current concerns with the programs' methodology.

History

Computerized testing methods were first introduced over 60 years ago. The first program able to interpret computerized assessment data was developed in 1962 at the Mayo Clinic.^[1] The program was used to evaluate MMPI^[2] data from hospital patients and generated a list of 110 possible descriptive statements which corresponded to particular scale elevations. This rudimentary computerized interpretation is not far off from the methods used today.^[3] In 1969, the first program able to generate narrative reports based on scale configurations was released.^[4] By 1985, it was estimated that as many as 1.5 million MMPI protocols had been interpreted by computer-based test interpretation (CBTI) programs.^[5] In 1987 as many as 72 separate suppliers of over 300 computer-based assessment products were in existence, nearly half which were developed for personality assessment.^[6] Since this time, the popularity and accessibility of computer-based testing and CBTI programs has increased dramatically, a trend that will continue into the future as the utilization of technology in the mental health profession increases.^[7]

Present status

Currently, CBTI programs fall into one of two categories: actuarial assessment programs or automated assessment programs. Actuarial assessment programs are based on statistical or actuarial prediction (e.g., statistical analyses, linear regression equations and Bayesian rules), which is empirically based while automated assessment programs consist of a series of if-then statements derived by expert clinicians^[8] and informed by published research and clinical experience.^[9] For the purposes of this article, both types will be referred to as computer-based test interpretations (CBTIs). The use of CBTIs is found in a variety of psychological domains (e.g., clinical interviewing and problem rating), but is most commonly utilized in personality and neuropsychological assessments.^[3] This article will focus on the use of CBTIs in personality assessment, most commonly using the MMPI and its subsequent revised editions.

Reliability

The ability for CBTIs to eliminate human-error is considered a benefit, and as a result reliability of CBTIs is considered to be better than those of clinician interpretations.^[7] However, CBTIs have demonstrated poor reliability. Research regarding the equivalence of CBTIs and paper-and-pencil measures has been found to be equivocal (for reviews see^[9]^[10]). Further, CBTI research has been criticized for failure to assess inter-rater (comparing the interpretation of one protocol by two different programs) and internal consistency reliability^[11] (comparing the reliability of different sections of the same interpretation). On the other hand, test-retest reliability of CBTIs is considered perfect (i.e., the same protocol will repeatedly yield the same interpretation), if the same program is used.^[7]

Validity

Research on the validity of CBTIs tends to utilize three types of studies: external criterion studies (comparing the CBTI report to some external criterion measure of the construct, such as a self-report or behavioral measure), consumer satisfaction studies (asking clients whether the reports are accurate representations of themselves), and comparison with clinical conclusions (comparing CBTI reports to clinician interpretations). Comprehensive reviews of CBTI validity can be found elsewhere (e.g.,^[12]^[13]). In general, the validity of CBTIs has not been demonstrated,^[7]^[8]^[11]^[13]^[14]^[15] and the validity of individual CBTI systems has been found to vary.^[9] However, many validity studies are flawed due to small samples,^[11] criterion contamination, the Barnum effect, inadequate input data to generate powerful statistical prediction rules,^[7] unreliability of measures and the practice of generalizing across testing situations and populations without considering potential moderators.^[16]

Strengths and weaknesses

CBTI programs can be found for nearly every type of personality assessment available today. CBTI programs arguably have many benefits over traditional hand-scored assessments and clinician interpretations which may contribute to their popularity. For example, CBTI programs save time and eliminate human responding and scoring errors.^[17] Further, CBTI programs are often more comprehensive than clinician interpretation, tend to be more reliable than clinician interpretation, are cost effective, and more objective which may allow clients to be more accepting of feedback.^[18] Despite these benefits, there are significant limitations of CBTIs to consider. For example, CBTI reports may suggest an unwarranted impression of scientific precision and reports may be too general to provide differential information.^[18] Additionally, CBTIs may promote exceedingly cavalier attitudes towards clinical assessment and interpretation, and as they are increasingly available to inadequately trained evaluators, the potential for misuse is high.^[10] Clinicians are cautioned to educate themselves before using CBTI programs, not to blindly interpret computer-generated reports as true or use CBTIs as a way to circumvent their responsibilities as a clinician to integrate multiple sources of data.^[18]

Future

As our healthcare system and society as a whole becomes increasingly reliant on technology, it is inevitable that the availability and use of CBTI software will also expand. The potential of the internet for extending the use of CBTIs has been recognized, although the potential problems associated with this modality have yet to be fully understood and will need to be addressed before the use of internet-based CBTI utilization proliferates.^[3] In addition, the application of computer-adaptive testing, although successfully applied in other assessment domains (i.e., ability and aptitude), provides a promising, yet under researched addition to personality assessment.^[3] Lastly, there is a call for the more effective integration of clinical and computer-based prediction methods, beginning with a partnership between clinicians and researchers in the development of CBTI programs.^[9]^[10]

References

^ Rome, H. P., Swenson, W. M., Mataya, P., McCarthy, C. E., Pearson, J. S., Keating, F. R., & Hathaway, S. R. (1962). Symposium on automation techniques in personality assessment. Proceedings of the Staff Meetings of the Mayo Clinic, 37, 61-82.
^ Hathaway, S. R., & McKinley, J. C. (1942). The Minnesota Multiphasic Personality Inventory. Minneapolis: University of Minnesota Press.
^ ^a ^b ^c ^d Butcher, J. N., Perry, J. N., and Hahn, J. (2004). Computers in clinical assessment: Historical developments, present status, and future challenges. Journal of Clinical Psychology, 60(3), 331-345.
^ Fowler, R. D. (1969). Automated interpretation of personality test data. In J. N. Butcher (Ed.), MMPI: Research developments and clinical applications (pp. 105-126). New York: McGraw-Hill.
^ Fowler, R. D. (1985). Landmarks in computer-assisted psychological assessment. Journal of Consulting and Clinical Psychology, 53, 748-759.
^ Krug, S. E. (1987). Psychware sourcebook 1987-1988. Kansas City, MO: Test Corporation of America.
^ ^a ^b ^c ^d ^e Garb, H. N. (2000). Computers will become increasingly important for psychological assessment: Not that there’s anything wrong with that! Psychological Assessment, 12, 31-39.
^ ^a ^b Garb, H. N. (1998). Studying the clinician: Judgment, research and psychological assessment. Washington, DC: American Psychological Association.
^ ^a ^b ^c ^d Lichtenberger, E. O. (2006). Computer utilization and clinical judgment in psychological assessment reports. Journal of Clinical Psychology, 62(1), 19-32.
^ ^a ^b ^c Snyder, D. K. (2000). Computer-assisted judgment: Defining strengths and liabilities. Psychological Assessment, 12(1), 52-60.
^ ^a ^b ^c Moreland, K. L. (1985). Validation of computer-based test interpretations: Problems and prospects. Journal of Consulting and Clinical Psychology, 53, 816-825.
^ Moreland, K. L. (1987). Computer-based test interpretations: Advice to the consumer. Applied Psychology: An International Review, 36, 385-399.
^ ^a ^b Butcher, J. N., Perry, J. N., & Atlis, M. M. (2000). Validity and utility of computer-based interpretation. Psychological Assessment, 12, 6-18.
^ Adams, K. M., & Heaton, R. K. (1985). Automated interpretation of the neuropsychological test data. Journal of Consulting and Clinical Psychology, 53, 790-802.
^ Matarazzo, J. D. (1986). Computerized clinical psychological test interpretations: Unvalidated plus all mean and no sigma. American Psychologist, 41, 14-24.
^ Lanyon, R. I. (1987). The validity of computer-based personality assessment products: Recommendations for the future. Computers in Human Behavior, 3, 225-238.
^ Allard, G., Butler, J., Faust, D., & Shea, M. T. (1995). Errors in hand scoring objective personality tests: The case of the Personality Diagnostic Questionnaire- Revised (PDQ-R). Professional Psychology: Research and Practice, 26, 304-308.
^ ^a ^b ^c Butcher, J. N. (2002). How to use computer-based reports. In J. N. Butcher (Ed.), Clinical personality assessment: Practical approaches (2nd ed.). New York: Oxford University Press.