3.1 Dataset Description
The Reddit posts and comments in this dataset are from 892 different
individuals. The remaining people served as a control group, with 137
people receiving treatment for depression. The Reddit API limit appears
to be 1000 posts and 1000 comments per user as shown in Figure 1.
Additionally, both posts and comments are chronologically ordered, which
is crucial given the goal of early depression identification. One XML
file was created for each user and used to build the collection. Only
those who have publicly admitted to having a diagnosis of depression are
classified as depressives, while users who come to Reddit via
sub-Reddits devoted to the topic are not depressed; instead, they are
interested in learning more about depression because someone close to
them is experiencing it. Each entry is identified by:
The 486 train subjects, 83 of whom are positive, and the 406 test
subjects, 54 of whom are positive, make up the datasets’ train-test
split.