Results

As a reminder, our assumption is that topics across a book will remain consistent. Intuitively, if the distribution of topics of our test document is very similar to one of the reference topic distribution, it is more likely that the test document actually comes from this reference book. We model this 'similarity' using the L2 norm between the test distribution against the reference distribution of each of the 10 books. In order to obtain a probability, which we deem the predictive probability, we take the inverse of this distance and normalize it so that the sum of the probabilities of the test document being from any of the books in our test set is 1.

prediction

Diagnostics

Visualization