(a) Background: Mental disorders cause substantial costs for the individual and our society. The effectiveness of psychotherapy to lessen these patients suffering is well documented but, unfortunately, not all patients benefit to the same degree. Machine learning (ML) may be useful to predict therapy outcomes for individual patients (i.e., this is an important theoretical perspective) and help to identify relevant predictors for successful treatments (i.e., a useful practical perspective). ML could thus contribute further to our understanding of the mechanisms of effective psychotherapy. However, so far, there are only very few applications of ML in psychotherapy research. These are typically either based on small sample sizes or focused on specific diagnoses (e.g., schizophrenia).
(b) Objectives: We aimed to examine ML models' ability to predict the outcome of cognitive behavioral therapy (CBT) on an individual level using a sufficiently large and heterogeneous sample with a variety of diagnoses in an ecologically valid outpatient setting.
(c) Research question(s): Of particular interest was identifying the ML algorithm that is best suited to predict the outcome of psychotherapy in a naturalistic setting and identifying variables that are most important for the prediction.
(d) Method/Approach: We used data from N = 685 patients with diverse disorders from an outpatient center. Available were data on sociodemographic and clinical baseline information from routine assessments at the outset of CBT. We trained several ML models (Decision-Trees, Random-Forest & Gradient Boosting Machines [GBM]) to predict treatment success. Success was defined as a clinical significant change (CSC) in the perceived symptom severity after treatment, measured with the Brief-Symptom-Checklist. To identify the most relevant predictor variables, we combined internal feature selection approaches with the BORUTA algorithm.
(e) Results/Findings: All models significantly outperformed the no-information rate on previously unseen validation data. The best performing model (GBM) achieved a balanced accuracy of 70%. Out of 405 variables, we identified the 16 most important predictors, which were sufficient to predict CSC with 67% balanced accuracy. Exploratively, the predictors can be summarized into six substantive factors: high baseline severity, paranoid thinking or ideations, prior mental disorders, functionality, somatization, and passive-aggressive traits.
(f) Conclusions and implications: Our study demonstrates that data which is typically available at the outset of therapy can predict whether an individual will substantially benefit from the intervention to a clinically relevant degree. Some of these predictors were expected (e.g., level of functioning), but other factors need further validation (e.g., job-impairment). The 16 most important predictors almost perfectly replicated the fully specified model's predictive performance. The comparably small set of predictors' predictive power and the fact that these predictors were already identified in different studies underline the importance of data-driven approaches. ML is clearly an attractive extension to more established psychotherapy research methodology from a theoretical and practical perspective.