Topic Modeling as a Tool for Analyzing Library Chat Transcripts

  • HyunSeung Koh University of Northern Iowa
  • Mark Fienup University of Northern Iowa


Library chat services are an increasingly important communication channel to connect patrons to library resources and services. Analysis of chat transcripts could provide librarians with insights into improving services. Unfortunately, chat transcripts consist of unstructured text data, making it impractical for librarians to go beyond simple quantitative analysis (e.g., chat duration, message count, word frequencies) with existing tools. As a stepping-stone toward a more sophisticated chat transcript analysis tool, this study investigated the application of different types of topic modeling techniques to analyze one academic library’s chat reference data collected from April 10, 2015, to May 31, 2019, with the goal of extracting the most accurate and easily interpretable topics. In this study, topic accuracy and interpretability—the quality of topic outcomes—were quantitatively measured with topic coherence metrics. Additionally, qualitative accuracy and interpretability were measured by the librarian author of this paper depending on the subjective judgment on whether topics are aligned with frequently asked questions or easily inferable themes in academic library contexts. This study found that from a human’s qualitative evaluation, Probabilistic Latent Semantic Analysis (pLSA) produced more accurate and interpretable topics, which is not necessarily aligned with the findings of the quantitative evaluation with all three types of topic coherence metrics. Interestingly, the commonly used technique Latent Dirichlet Allocation (LDA) did not necessarily perform better than pLSA. Also, semi-supervised techniques with human-curated anchor words of Correlation Explanation (CorEx) or guided LDA (GuidedLDA) did not necessarily perform better than an unsupervised technique of Dirichlet Multinomial Mixture (DMM). Last, the study found that using the entire transcript, including both sides of the interaction between the library patron and the librarian, performed better than using only the initial question asked by the library patron across different techniques in increasing the quality of topic outcomes.


Abdur Rahman, M. A. Basher, and Benjamin C. M. Fung, “Analyzing Topics and Authors in Chat Logs for Crime Investigation,” Knowledge and Information Systems 39, no. 2 (March 2014): 351–81,

Amanda Spink and Jannica Heinström, eds., New Directions in Information Behavior (Bingley, UK: Emerald Group Publishing Limited, 2011).

B. Jane Scales, Lipi Turner-Rahman, and Feng Hao, “A Holistic Look at Reference Statistics: Whither Librarians?,” Evidence Based Library and Information Practice 10, no. 4 (December 2015): 173–85,

Bhagyashree Vyankatrao Barde and Anant Madhavrao Bainwad, “An Overview of Topic Modeling Methods and Tools,” in Proceedings of International Conference on Intelligent Computing and Control Systems, 2018, 745–50,

Charu C. Aggarwal and ChengXiang Zhai, eds. Mining Text Data (New York: Springer, 2012).

Chengxiang Zhai, Statistical Language Models for Information Retrieval (Williston, VT: Morgan & Claypool Publishers, 2018).

Chenliang Li et al., “Topic Modeling for Short Texts With Auxiliary Word Embeddings,” in Proceedings of the Thirty-Ninth International ACM SIGIR Conference on Research and Development in Information Retrieval (New York: ACM, 2016), 165–74,

Christina M. Desai and Stephanie J. Graves, “Cyberspace or Face-to-face: The Teachable Moment and Changing Reference Mediums,” Reference & User Services Quarterly 47, no. 3 (Spring 2008): 242–55,

Christopher Brousseau, Justin Johnson, and Curtis Thacker, “Machine Learning Based Chat Analysis,” Code4Lib Journal 50 (February 2021),

Da Kuang, P. Jeffrey Brantingham, and Andrea L. Bertozzi, “Crime Topic Modeling,” Crime Science 6, no. 12 (December 2017): 1–12,

David M. Blei, “Topic Modeling and Digital Humanities,” Journal of Digital Humanities 2, no. 1 (Winter 2012),

David Newman et al., “Automatic Evaluation of Topic Coherence,” in Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (New York: ACM, 2010), 100–108,

Elisabeth Günther and Thorsten Quandt, “Word Counts and Topic Models: Automated Text Analysis Methods for Digital Journalism Research,” Digital Journalism 4, no. 1 (October 2016): 75–88,

Ellie Kohler, “What Do Your Library Chats Say?: How to Analyze Webchat Transcripts for Sentiment and Topic Extraction,” in Brick & Click Libraries Conference Proceedings (Maryville, MO: Northwest Missouri State University, 2017), 138–48,

Eradah O. Hamad et al., “Toward a Mixed-methods Research Approach to Content Analysis in the Digital Age: The Combined Content-analysis Model and Its Applications to Health Care Twitter Feeds,” Journal of Medical Internet Research 18, no. 3 (March 2016): e60,

Ewa M. Golonka, Medha Tare, and Carrie Bonilla, “Peer Interaction in Text Chat: Qualitative Analysis of Chat Transcripts,” Language Learning & Technology 21, no. 2 (June 2017): 157–78,

Feifei Liu, “How Information-Seeking Behavior Has Changed in 22 Years,” NN/g Nielsen Norman Group, January 26, 2020,

Gabe Ignatow and Rada Mihalcea, An Introduction to Text Mining: Research Design, Data Collection, and Analysis (New York: Sage, 2017).

Gerard Salton, Anita Wong, and Chung-shu Yang, “A Vector Space Model for Automatic Indexing,” Communications of the ACM 18, no. 11 (November 1975): 613–20,

Gerlof Bouma, “Normalized (Pointwise) Mutual Information in Collocation Extraction,” in Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology (Tübingen, Germany: Gunter Narr Verlag, 2009), 43–53.

Hongjiao Xu et al., “Exploring Similarity between Academic Paper and Patent Based on Latent Semantic Analysis and Vector Space Model,” in Proceedings of the Twelfth International Conference on Fuzzy Systems and Knowledge Discovery (New York: IEEE, 2015), 801–5,

Jagadeesh Jagarlamudi, Hal Daumé III, and Raghavendra Udupa, “Incorporating Lexical Priors into Topic Models,” in Proceedings of the Thirteenth Conference of the European Chapter of the Association for Computational Linguistics (Stroudsburg, PA: ACL, 2012), 204–13,

Janet Richardson et al., “Tweet If You Want to Be Sustainable: A Thematic Analysis of a Twitter Chat to Discuss Sustainability in Nurse Education,” Journal of Advanced Nursing 72, no. 5 (January 2016): 1086–96,

Jeffrey Lund et al., “Tandem Anchoring: a Multiword Anchor Approach for Interactive Topic Modeling,” in Proceedings of the Fifty-fifth Annual Meeting of the Association for Computational Linguistics (Stroudsburg, PA: ACL, 2017), 896–905,

Jennifer Waugh, “Formality in Chat Reference: Perceptions of 17- to 25-year-old University Students,” Evidence Based Library and Information Practice 8, no. 1 (2013): 19–34,

Jey Han Lau, David Newman, and Timothy Baldwin, “Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality,” in Proceedings of the Fourteenth Conference of the European Chapter of the Association for Computational Linguistics (Stroudsburg, PA: ACL, 2014), 530–39,

Jianhua Yin and Jianyong Wang, “A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering,” in Proceedings of the Twentieth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (New York: ACM, 2014), 233–42,

Jipeng Qiang et al., “Short Text Topic Modeling Techniques, Applications, and Performance: A Survey,” IEEE Transactions on Knowledge and Data Engineering 14, no. 8 (April 2019): 1–17,

John W. Mohr and Petko Bogdanov, “Introduction—Topic Models: What They Are and Why They Matter,” Poetics 41, no. 6 (December 2013): 545–69,

Jonathan Chang et al., “Reading Tea Leaves: How Humans Interpret Topic Models,” in Proceedings of the Twenty-Second International Conference on Neural Information Processing Systems (New York: ACM, 2009), 288–96,

Jordan Boyd-Graber, David Mimno, and David Newman, “Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements,” in Handbook of Mixed Membership Models and Their Applications, eds. Edoardo M. Airoldi et al. (New York: CRC Press, 2014), 225–54.

Jordan Boyd-Graber, Yuening Hu, and David Mimno, “Applications of Topic Models,” Foundations and Trends in Information Retrieval 11, no. 2–3 (2017): 143–296,

Kate Fuller and Nancy H. Dryden, “Chat Reference Analysis to Determine Accuracy and Staffing Needs at One Academic Library,” Internet Reference Services Quarterly 20, no. 3–4 (December 2015): 163–81,

Kathryn Barrett and Amy Greenberg, “Student-Staffed Virtual Reference Services: How to Meet the Training Challenge,” Journal of Library & Information Services in Distance Learning 12, no. 3–4 (August 2018): 101–229,

Laura D. Kassner and Kate M. Cassada, “Chat It Up: Backchanneling to Promote Reflective Practice among In-Service Teachers,” Journal of Digital Learning in Teacher Education 33, no. 4 (August 2017): 160–68,

Leticia H. Anaya, “Comparing Latent Dirichlet Allocation and Latent Semantic Analysis as Classifiers” (PhD diss., University of North Texas, 2011).

Lin Liu et al., “An Overview of Topic Modeling and Its Current Applications in Bioinformatics,” Springerplus 5, no. 1608 (September 2016): 1–22,

Maryvon Côté, Svetlana Kochkina, and Tara Mawhinney, “Do You Want to Chat? Reevaluating Organization of Virtual Reference Service at an Academic Library,” Reference and User Services Quarterly 56, no. 1 (Fall 2016): 36–46,

Md Waliur Rahman Miah, John Yearwood, and Siddhivinayak Kulkarni, “Constructing an Inter‐post Similarity Measure to Differentiate the Psychological Stages in Offensive Chats,” Journal of the Association for Information Science and Technology 66, no. 5 (January 2015): 1065–81,

Megan Oakleaf and Amy Vanscoy, “Instructional Strategies for Digital Reference: Methods to Facilitate Student Learning,” Reference & User Services Quarterly 49, no. 4 (Summer 2010): 380–90,

Megan Ozeran and Piper Martin, “‘Good Night, Good Day, Good Luck,’” Information Technology and Libraries 38, no. 2 (June 2019): 49–57,

Michelle Drouin et al., “Linguistic Analysis of Chat Transcripts From Child Predator Undercover Sex Stings,” Journal of Forensic Psychiatry & Psychology 28, no. 4 (February 2017): 437–57,

Mila Semeshkina, “Five Major Trends in Online Education to Watch out for in 2021,” Forbes, February 2, 2021,

Mina Park, Milam Aiken, and Laura Salvador, “How Do Humans Interact with Chatbots?: An Analysis of Transcripts,” International Journal of Management & Information Technology 14 (2018): 3338–50,

Miriam L. Matteson, Jennifer Salamon, and Lindy Brewster, “A Systematic Review of Research on Live Chat Service,” Reference & User Services Quarterly 51, no. 2 (Winter 2011): 172–89,

Nadaleen Tempelman-Kluit and Alexa Pearce, “Invoking the User from Data to Design,” College & Research Libraries 75, no. 5 (2014): 616–40,

Nan Zhang and Baojun Ma, “Constructing a Methodology toward Policy Analysts for Understanding Online Public Opinions: A Probabilistic Topic Modeling Approach,” in Electronic Government and Electronic Participation, eds. Efthimios Tambouris et al. (Amsterdam, Netherlands: IOS Press BV, 2015): 72–9,

Neha Agarwal, Geeta Sikkaa, and Lalit Kumar Awasthib, “Evaluation of Web Service Clustering Using Dirichlet Multinomial Mixture Model Based Approach for Dimensionality Reduction in Service Representation,” Information Processing & Management 57, no. 4 (July 2020), .

Olivier Toubia et al., “Extracting Features of Entertainment Products: A Guided Latent Dirichlet Allocation Approach Informed by the Psychology of Media Consumption,” Journal of Marketing Research 56, no. 1 (December 2019): 18–36,

Pamela J. Howard, “Can Academic Library Instant Message Transcripts Provide Documentation of Undergraduate Student Success?,” Journal of Web Librarianship 13, no. 1 (February 2019): 61–87,

Paula R. Dempsey, “‘Are You A Computer?’ Opening Exchanges in Virtual Reference Shape the Potential for Teaching,” College & Research Libraries 77, no. 4 (2016): 455–68,

Robin Brown, “Lifting the Veil: Analyzing Collaborative Virtual Reference Transcripts to Demonstrate Value and Make Recommendations for Practice,” Reference & User Services Quarterly 57, no. 1 (Fall 2017): 42–47,

Robin Canuel et al., “Developing and Assessing a Graduate Student Reference Service,” Reference Services Review 47, no. 4 (November 2019): 527–43,

Rubayyi Alghamdi and Khalid Alfalqi, “A Survey of Topic Modeling in Text Mining,” International Journal of Advanced Computer Science and Applications 6, no. 1 (2015): 146–53,

Ryan J. Gallagher et al., “Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge,” Transactions of the Association for Computational Linguistics 5 (December 2017): 529–42,

Sarah Lemire, Lorelei Rutledge, and Amy Brunvand, “Taking a Fresh Look: Reviewing and Classifying Reference Statistics for Data-driven Decision Making,” Reference & User Services Quarterly 55, no. 3 (Spring 2016): 230–38,

Sarah Maximiek, Elizabeth Brown, and Erin Rushton, “Coding into the Great Unknown: Analyzing Instant Messaging Session Transcripts to Identify User Behaviors and Measure Quality of Service,” College & Research Libraries 71, no. 4 (2010): 361–73,

Sarah Passonneau and Dan Coffey, “The Role of Synchronous Virtual Reference in Teaching and Learning: A Grounded Theory Analysis of Instant Messaging Transcripts,” College & Research Libraries 72, no. 3 (2011): 276–95,

Sharon Q. Yang and Heather A. Dalal, “Delivering Virtual Reference Services on the Web: An Investigation into the Current Practice by Academic Libraries,” Journal of Academic Librarianship 41, no. 1 (November 2015): 68–86,

Shu Z. Schiller, “Chat for Chat: Mediated Learning in Online Chat Virtual Reference Service,” Computers in Human Behavior 65 (July 2016): 651–65,

Shuyuan Mary Ho et al., “Computer-mediated Deception: Strategies Revealed by Language-Action Cues in Spontaneous Communication,” Journal of Management Information Systems 33, no. 2 (October 2016): 393–420,

Stefan Jansen, Hands-on Machine Learning for Algorithmic Trading: Design and Implement Investment Strategies based on Smart Algorithms that Learn from Data Using Python (Birmingham: Packt Publishing Limited, 2018).

Thomas Stieve and Niamh Wallace, “Chatting While You Work: Understanding Chat Reference User Needs Based on Chat Reference Origin,” Reference Services Review 46, no. 4 (November 2018): 587–99,

Tse-hsun Chen, Stephen W. Thomas, and Ahmed E. Hassan, “A Survey on the Use of Topic Models When Mining Software Repositories,” Empirical Software Engineering 21, no. 5 (September 2016): 1843–919,

How to Cite
Koh, H., & Fienup, M. (2021). Topic Modeling as a Tool for Analyzing Library Chat Transcripts. Information Technology and Libraries, 40(3).