Toronto Public Library Catalogue Subject Map

Below is a visualization of the metadata on many of the books in the Toronto Public Library's Catalogue. Specifically, from the Catalogue data (which was generously converted from XML to JSON by Alex Volkov, I extracted the records for English-language print books that contained subject metadata and attempted to cluster related subjects. Following an approach inspired by Nicolas Kruchten, the subject-coocurrence matrix was first reduced using an SVD decomposition, and then the high-dimensional subject-vectors were embedded in the low-dimensional visualization space using the Isomap technique. Finally, clusters of closely related subjects were identified and highlighted using the K-Means clustering algorithm. The SVD, Isomap, and K-mean implementations used were from scikit-learn, and the visualization uses D3.js. Full sourcecode on Github.