A Simple Scheme for Book Classification Using Wikipedia
Because the rate at which documents are being generated outstrips librarians’ ability to catalog them, an accurate, automated scheme of subject classification is desirable. However, simplistic word-counting schemes miss many important concepts; librarians must enrich algorithms with background knowledge to escape basic problems such as polysemy and synonymy. I have developed a script that uses Wikipedia as context for analyzing the subjects of nonfiction books. Though a simple method built quickly from freely available parts, it is partially successful, suggesting the promise of such an approach for future research.