Figure 3: Recall comparison on in-matrix and out-of-matrix prediction tasks by varying the number of recommended articles. For CTR, we set λv = 100. Error bars are too small to show. The maximum expected recall for random recommendation is about 6%. CF can not do out-of-matrix prediction. CTR performs best.
Experimental settings. For matrix factorization for collaborative filtering (CF), we used grid search to find that K = 200, λu = λv = 0.01, a = 1, b = 0.01 gives good performance on held out recommendations. We use CF to denote this method. For collaborative topic regression (CTR), we set the parameters similarly as for CF, K = 200, λu = 0.01, a = 1 and b = 0.01. In addition, the precision parameter λv balances how the article’s latent vector vj diverges from the topic proportions θj . We vary λv ∈ {10, 100, 1000, 10000}, where a larger λv increases the penalty of vj diverging from θj . We also compare to the model that only uses LDA-like features, as we discussed in the beginning of section 3. This is equivalent to fixing the per-item latent vector vj = θj in the CTR model. This is a nearly content-only model—while the per-user vectors are fit to the ratings data, the document vectors θj are only based on the words of the document. We use LDA to denote this method. (Note that we use the resulting topics and proportions of LDA to initialize the CTR model.) The baseline is the random model, where a user see M random recommended articles. We note that the expected recall for the random method from a pool of Mtot articles is irrelevant to library size. It is always M/Mtot.
Comparisons. Figure 3 shows the overall performance for in-matrix and out-of-matrix prediction, when we vary the number of returned articles M = 20, 40, · · · , 200. For CTR, we pick λv = 100; Figure 4 shows the performs when we change λv for the CTR model compared with CF and LDA when we fix M = 100. Figure 3 and 4 shows that matrix factorization works well for in-matrix prediction, but adding content with CTR improves performance. The improvement is greater when the number of returned documents M is larger. The reason is as follows. Popular articles are more likely to be recommended by both methods. However, when M becomes large, few user ratings are available to ensure that CF gives good recommendations; the contribution of the content becomes more important. Compared to both CF and CTR, LDA suffers for in-matrix prediction. It does not account enough for the users’ information in forming its predicted ratings.