For the network models, we used the mood and feelings questionnaire (MFQ). This 33 items questionnaire is well validated for the measurement of depression in adolescents. All but two items score in the same direction, asking adolescents to rate how often in the past two weeks they have felt or behaved a certain way. Scores range from 0 to 3 (never, sometimes, most of the time, always). Two items were excluded, because 1) they have previously shown to have low factor loadings in every possible factor model, 2) they are redundant as they are also represented in other items that scored in the other direction. Total MFQ scores were calculated as the average score across items multiplied by 33. In the literature, MFQ items are often rated on a three-point scale (never, sometimes, most of the time) rather than our four-point scale. For comparability, we provide sum scores based on our four-point scale (optimal for the study of covariance) as well as on the more conventional three-point scale in which items rated as 'always' were recoded to 'most of the time'. We report the four-point sum scores, unless otherwise specified. Baseline total MFQ score ranged from 13 to 90 in the patient group (mean=52.55, sd=15.91) and from 0 to 29 in the control group (mean=11.61, sd=6.71).

Depression/recovery status at baseline and follow-up was assessed using the K-SADS diagnostic interview. To meet criteria for depression, participants had to ... . At baseline, all but eleven patients met K-SADS criteria for depression (97.6%). After treatment, 76 patients continued to meet diagnostic criteria (16.3%) while 239 patients were in remission (51.4%, 32.3% missing data could not reliably be imputed). Due to the large amount of missing data and the relatively small number of participants with a diagnosis at follow-up, we also defined good/poor treatment response in terms of MFQ sum scores. Patients were median-split based on the change in total MFQ score (relative to the baseline sum score) between baseline and post-treatment. Median change was -41.7%, indicating a 41.7% decrease in total depression score. Patients who improved more than 41.7% were classified as good responders (n=233) and patients who improved less, remained stable or deteriorated were classified as poor responders (n=232)