Negotiating a Text Mining License for Faculty Researchers

  • Leslie A. Williams
  • Lynne M Fox Health Sciences Library, University of Colorado Anschutz Medical Campus
  • Christophe Roeder
  • Lawrence Hunter


This case study examines strategies used to leverage the library’s existing journal licenses to obtain a large collection of full-text journal articles in extensible markup language (XML) format; the right to text mine the collection; and the right to use the collection and the data mined from it for grant-funded research to develop biomedical natural language processing (BNLP) tools. Researchers attempted to obtain content directly from PubMed Central (PMC). This attempt failed due to limits on use of content in PMC. Next researchers and their library liaison attempted to obtain content from contacts in the technical divisions of the publishing industry. This resulted in an incomplete research data set. Then researchers, the library liaison, and the acquisitions librarian collaborated with the sales and technical staff of a major science, technology, engineering, and medical (STEM) publisher to successfully create a method for obtaining XML content as an extension of the library’s typical acquisition process for electronic resources. Our experience led us to realize that text mining rights of full-text articles in XML format should routinely be included in the negotiation of the library’s licenses.

Author Biography

Lynne M Fox, Health Sciences Library, University of Colorado Anschutz Medical Campus
Lynne Fox, MLS, MA, AHIP, is assistant professor and Education Librarian at the Health Sciences Library, University of Colorado Anschutz Medical Campus.  As Education Librarian, Lynne conducts and coordinates in-person and asynchronous training and creates online tutorials on a wide variety of information resources. Lynne has been a librarian tutor for the Rocky Mountain Evidence Based Medicine Workshop, an intensive weeklong workshop conducted by internationally known leaders in the field of EBHC.  She has been a librarian for almost 30  years and was admitted as a distinguished member of the Academy of Health Information Professional in 1999. In 2009, Lynne was awarded the Marla Graber Award for Excellence and Achievement by the Colorado Council of Medical Librarians (CCML). In 2010, she received the Bernice M. Hetzner Award for Excellence in Academic Health Science Librarianship from the Midcontinental Chapter of MLA. In addition to her work as a librarian, Lynne is a member of the School of Medicine's Hunter Lab, a group researching the use of ontologies and natural language processing in data mining of the biomedical literature.  Lynne has a Masters in Library Science (1984) from the University of Michigan and an MA in History (1996) from the University of Northern Colorado. Lynne was a Council member for the City of Thornton, which is Colorado’s sixth largest city, from 2009-2013.  Her vita can be found at:


University of Colorado Anschutz Medical Campus. 2013. University of Colorado Anschutz Medical Campus Quick Facts 2013 [cited September 9 2013]. Available from

Leach, Sonia M, Hannah Tipney, Weiguo Feng, William A Feng, Priyanka Kasliwal, Ronald P Schuyler, Trevor Williams, Richard A Spritz, and Lawrence Prof. Hunter . 2009. Biomedical Discovery Acceleration, With Applications to Craniofacial Development. PLoS computational biology 5 (3):1-19. DOI: 10.1371/journal.pcbi.1000215

Prof. Hunter , Lawrence. 2014. Hanalyzer: A 3R system for genome-scale discovery. Anschutz Medical Campus, University of Colorado 2009 [cited January 11 2014]. Available from

Clark, Jonathan. 2013. Text Mining and Scholarly Publishing. [cited July 24 2013]. Available from, .

Lok, Corie. 2010. Speed Reading. Nature 463:416-418. DOI:10.1038/463416a

Dai, Hong-Jie, Yen-Ching Chang, Richard Tzong-Han Tsai, and Wen-Lian Hsu. 2010. New Challenges for Biological Text-Mining in the Next Decade. Journal of Computer Science and Technology 25 (1):169-179. DOI: 10.1007/s11390-010-9313-5

Hoekman, Anne. 2013. Journal Publishing Technologies: XML 2008 [cited September 23 2013]. Available from

Brown, Alex. 2003. XML in Serial Publishing: Past, Present and Future. OCLC Systems & Services 19 (4):149-154. DOI: 10.1108/10650750310698775

Ramakrishnan, Cartic, Abhishek Patnia, Eduard Hovy, and Gully APC Burns. 2012. Layout-aware text extraction from full-text PDF of scientific articles. Source Code for Biology and Medicine 7 (1):7. DOI: 10.1186/1751-0473-7-7

Prof. Hunter , Lawrence, and K Bretonnel Cohen. 2006. Biomedical Language Processing: Perspective What’s Beyond PubMed? Molecular cell 21 (5):589-594. DOI: 10.1016/j.molcel.2006.02.012

Krallinger, Martin, Alfonso Valencia, and Lynette Hirschman. 2008. Linking Genes to Literature: Text Mining, Information Extraction, and Retrieval Applications for Biology. Genome Biol 9 (Suppl 2):S8.1-S8.14. DOI: 10.1186/gb-2008-9-s2-s8

Smit, Eefke, and Maurits van der Graaf. 2012. Journal Article Mining: The Scholarly Publishers' Perspective. Learned Publishing 25 (1):35-46. DOI: 10.1087/20120106

Hearst, Marti A. 2013. What is Text Mining? 2003 [cited September 20 2013]. Available from

Cohen, K. Bretonnel, and Prof. Hunter , Lawrence. 2008. Getting Started in Text Mining. PLoS Computational Biology 4 (1):1-3. DOI:

JISC. 2013. The Model NESLi2 Licence for Journals 2013 [cited September 20 2013]. Available from

Hargreaves, Ian. 2011. Digital Opportunity: A Review of Intellectual Property and Growth. [cited July 24, 2013]. Available from .

Manyika, James, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. 2011. Big Data:The Next Frontier for Innovation, Competition, and Productivity. [cited July 24 2013.] Available from

McDonald, Diane, and Ursula Kelly. 2012. The Value and Benefits of Text Mining to UK Further and Higher Education. In Digital Infrastructure Directions, edited by J. Redfearn. [cited July 24 2013]. Available from .

Gold in the Text? 2012. Nature 483:124. DOI: doi:10.1038/483124a

Van Noorden, Richard 2012. Trouble at the Text Mine. Natue 483:134-135. DOI: doi:10.1038/483134a

Aspesi, Claudio, Andrea Rosso, and Richard Wielechowski. 2012. Reed Elsevier: Is Elsevier Heading for a Political Train-Wreck? In Bernstein Research. [cited July 24 2013.] No longer available online.

Emery, Jill. 2009. Working In A Text Mine: Is Access About To Go Down? Journal of Electronic Resources Librarianship 20 (3):135-138. DOI: 10.1080/19411260802412745

Wikimedia Foundation Inc. 2014. Document type definition. In Wikipedia, the free encyclopedia. [cited February 18 2014]. Available from

Smit, Eefke, and Maurits van der Graaf. 2011. Journal Article Mining: A Research Study into Practices, Policies, Plans and Promises. Amsterdam: Publishing Research Consortium. [cited July 24 2013]. Available from

Internal Revenue Service. 2013. Unrelated Business Income Define 2014 [cited September 20 2013]. Available from

British Columbia Electronic Library Network. 2013. BC ELN Database Licensing Framework 2014 [cited September 20 2013]. Available from

California Digital Library. 2013. Licensing Toolkit 2013 [cited September 20 2013]. Available from

BioMed Central. 2014. Using BioMed Central's open access full-text corpus for text mining research 2014 [cited February 11 2014]. Available from

PLOS Biology. 2014. Publishing science, accelerating research 2014 [cited February 11 2014]. Available from

Kisjes, Iris. University College London and Elsevier launch UCL Big Data Institute, December 18 2013. Available from

Van Noorden, Richard. 2014. Elsevier Opens Its Papers to Text-Mining. Nature 506:17.

Sciverse. 2014. Content APIs. Elsevier 2014 [cited February 27 2014]. Available from

Elsevier. 2014. Text and Data Mining 2014 [cited February 27 2014]. Available from

How to Cite
Williams, L. A., Fox, L. M., Roeder, C., & Hunter, L. (2014). Negotiating a Text Mining License for Faculty Researchers. Information Technology and Libraries, 33(3), 5-21.