Using Machine Learning and Natural Language Processing to Analyze Library Chat Reference Transcripts

Keywords: Machine Learning, Natural Language Processing, Chat Reference, Chat Transcript, Data Analysis


The use of artificial intelligence and machine learning has rapidly become a standard technology across all industries and businesses for gaining insight and predicting the future. In recent years, the library community has begun looking at ways to improve library services by applying AI and machine learning techniques to library data. Chat reference in libraries generates a large amount of data in the form of transcripts. This study uses machine learning and natural language processing methods to analyze one academic library’s chat transcripts over a period of eight years. The built machine learning model tries to classify chat questions into a category of reference or nonreference questions. The purpose is to predict the category of future questions by the model with the hope that incoming questions can be channeled to appropriate library departments or staff.


Carlos G. Figuerola, Francisco Javier Garcia Marco, and Maria Pinto, “Mapping the Evolution of Library and Information Science (1978–2014) Using Topic Modeling on LISA,” Scientometrics 112 (2017): 1507–35,

Christina M. Desai and Stephanie J. Graves, “Cyberspace or Face-to-Face: The Teachable Moment and Changing Reference Mediums,” Reference & User Services Quarterly 47, no. 3 (Spring 2008): 242–55,

Christopher Brousseau, Justin Johnson, and Curtis Thacker, “Machine Learning Based Chat Analysis,” Code4Lib Journal, no. 50 (2021),

Ellie Kohler, “What Do Your Library Chats Say? How to Analyze Webchat Transcripts for Sentiment and Topic Extraction” (17th Annual Brick & Click Libraries Conference, Maryville, Missouri: Northwest Missouri State University, 2017).

“Fantastic Futures: 2nd International Conference on AI for Libraries, Archives, and Museums,” (2019), Stanford University Library,

HyunSeung Koh and Mark Fienup, “Topic Modeling as a Tool for Analyzing Library Chat Transcripts,” Information Technology and Libraries 40, no. 3 (2021),

Jeremiah Flannery, “Using NLP to Generate MARC Summary Fields for Notre Dame’s Catholic Pamphlets,” International Journal of Librarianship 5, no.1 (2020): 20–35,

Jeremy Walker and Jason Coleman, “Using Machine Learning to Predict Chat Difficulty,” College & Research Libraries 82, no. 5 (2021),

Kevin W. Walker and Zhehan Jiang, “Application of Adaptive Boosting (AdaBoost) in Demand-Driven Acquisition (DDA) Prediction: A Machine-Learning Approach,” The Journal of Academic Librarianship 45, no. 3 (2019): 203–12,

Megan Ozeran and Piper Martin, “Good Night, Good Day, Good Luck: Applying Topic Modeling to Chat Reference Transcripts,” Information Technology and Libraries 38, no. 2 (June 2019): 49–57,

“Projects in Artificial Intelligence Registry (PAIR): A Registry for AI Projects in Higher Ed,” University of Oklahoma Libraries,

Sharon Q. Yang and Heather A. Dalal, “Delivering Virtual Reference Services on the Web: An Investigation into the Current Practice by Academic Libraries,” Journal of Academic Librarianship 41, no. 1 (November 2015): 68–86,

“Stanford University Library AI Initiative,” Stanford University Library,

Sultan M. Al-Daihani and Alan Abrahams, “A Text Mining Analysis of Academic Libraries’ Tweets,” The Journal of Academic Librarianship 42, no. 2 (2016): 135–43,

Thomas Finley, “The Democratization of Artificial Intelligence: One Library’s Approach,” Information Technology and Libraries 38, no. 1 (2019): 8–13,

How to Cite
Wang, Y. (2022). Using Machine Learning and Natural Language Processing to Analyze Library Chat Reference Transcripts. Information Technology and Libraries, 41(3).