Negotiating a Text Mining License for Faculty Researchers

Leslie A. Williams, Lynne M Fox, Christophe Roeder, Lawrence Hunter


This case study examines strategies used to leverage the library’s existing journal licenses to obtain a large collection of full-text journal articles in extensible markup language (XML) format; the right to text mine the collection; and the right to use the collection and the data mined from it for grant-funded research to develop biomedical natural language processing (BNLP) tools. Researchers attempted to obtain content directly from PubMed Central (PMC). This attempt failed due to limits on use of content in PMC. Next researchers and their library liaison attempted to obtain content from contacts in the technical divisions of the publishing industry. This resulted in an incomplete research data set. Then researchers, the library liaison, and the acquisitions librarian collaborated with the sales and technical staff of a major science, technology, engineering, and medical (STEM) publisher to successfully create a method for obtaining XML content as an extension of the library’s typical acquisition process for electronic resources. Our experience led us to realize that text mining rights of full-text articles in XML format should routinely be included in the negotiation of the library’s licenses.

Full Text:



University of Colorado Anschutz Medical Campus. 2013. University of Colorado Anschutz Medical Campus Quick Facts 2013 [cited September 9 2013]. Available from

Leach, Sonia M, Hannah Tipney, Weiguo Feng, William A Feng, Priyanka Kasliwal, Ronald P Schuyler, Trevor Williams, Richard A Spritz, and Lawrence Prof. Hunter . 2009. Biomedical Discovery Acceleration, With Applications to Craniofacial Development. PLoS computational biology 5 (3):1-19. DOI: 10.1371/journal.pcbi.1000215

Prof. Hunter , Lawrence. 2014. Hanalyzer: A 3R system for genome-scale discovery. Anschutz Medical Campus, University of Colorado 2009 [cited January 11 2014]. Available from

Clark, Jonathan. 2013. Text Mining and Scholarly Publishing. [cited July 24 2013]. Available from, .

Lok, Corie. 2010. Speed Reading. Nature 463:416-418. DOI:10.1038/463416a

Dai, Hong-Jie, Yen-Ching Chang, Richard Tzong-Han Tsai, and Wen-Lian Hsu. 2010. New Challenges for Biological Text-Mining in the Next Decade. Journal of Computer Science and Technology 25 (1):169-179. DOI: 10.1007/s11390-010-9313-5

Hoekman, Anne. 2013. Journal Publishing Technologies: XML 2008 [cited September 23 2013]. Available from

Brown, Alex. 2003. XML in Serial Publishing: Past, Present and Future. OCLC Systems & Services 19 (4):149-154. DOI: 10.1108/10650750310698775

Ramakrishnan, Cartic, Abhishek Patnia, Eduard Hovy, and Gully APC Burns. 2012. Layout-aware text extraction from full-text PDF of scientific articles. Source Code for Biology and Medicine 7 (1):7. DOI: 10.1186/1751-0473-7-7

Prof. Hunter , Lawrence, and K Bretonnel Cohen. 2006. Biomedical Language Processing: Perspective What’s Beyond PubMed? Molecular cell 21 (5):589-594. DOI: 10.1016/j.molcel.2006.02.012

Krallinger, Martin, Alfonso Valencia, and Lynette Hirschman. 2008. Linking Genes to Literature: Text Mining, Information Extraction, and Retrieval Applications for Biology. Genome Biol 9 (Suppl 2):S8.1-S8.14. DOI: 10.1186/gb-2008-9-s2-s8

Smit, Eefke, and Maurits van der Graaf. 2012. Journal Article Mining: The Scholarly Publishers' Perspective. Learned Publishing 25 (1):35-46. DOI: 10.1087/20120106

Hearst, Marti A. 2013. What is Text Mining? 2003 [cited September 20 2013]. Available from

Cohen, K. Bretonnel, and Prof. Hunter , Lawrence. 2008. Getting Started in Text Mining. PLoS Computational Biology 4 (1):1-3. DOI:

JISC. 2013. The Model NESLi2 Licence for Journals 2013 [cited September 20 2013]. Available from

Hargreaves, Ian. 2011. Digital Opportunity: A Review of Intellectual Property and Growth. [cited July 24, 2013]. Available from .

Manyika, James, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. 2011. Big Data:The Next Frontier for Innovation, Competition, and Productivity. [cited July 24 2013.] Available from

McDonald, Diane, and Ursula Kelly. 2012. The Value and Benefits of Text Mining to UK Further and Higher Education. In Digital Infrastructure Directions, edited by J. Redfearn. [cited July 24 2013]. Available from .

Gold in the Text? 2012. Nature 483:124. DOI: doi:10.1038/483124a

Van Noorden, Richard 2012. Trouble at the Text Mine. Natue 483:134-135. DOI: doi:10.1038/483134a

Aspesi, Claudio, Andrea Rosso, and Richard Wielechowski. 2012. Reed Elsevier: Is Elsevier Heading for a Political Train-Wreck? In Bernstein Research. [cited July 24 2013.] No longer available online.

Emery, Jill. 2009. Working In A Text Mine: Is Access About To Go Down? Journal of Electronic Resources Librarianship 20 (3):135-138. DOI: 10.1080/19411260802412745

Wikimedia Foundation Inc. 2014. Document type definition. In Wikipedia, the free encyclopedia. [cited February 18 2014]. Available from

Smit, Eefke, and Maurits van der Graaf. 2011. Journal Article Mining: A Research Study into Practices, Policies, Plans and Promises. Amsterdam: Publishing Research Consortium. [cited July 24 2013]. Available from

Internal Revenue Service. 2013. Unrelated Business Income Define 2014 [cited September 20 2013]. Available from

British Columbia Electronic Library Network. 2013. BC ELN Database Licensing Framework 2014 [cited September 20 2013]. Available from

California Digital Library. 2013. Licensing Toolkit 2013 [cited September 20 2013]. Available from

BioMed Central. 2014. Using BioMed Central's open access full-text corpus for text mining research 2014 [cited February 11 2014]. Available from

PLOS Biology. 2014. Publishing science, accelerating research 2014 [cited February 11 2014]. Available from

Kisjes, Iris. University College London and Elsevier launch UCL Big Data Institute, December 18 2013. Available from

Van Noorden, Richard. 2014. Elsevier Opens Its Papers to Text-Mining. Nature 506:17.

Sciverse. 2014. Content APIs. Elsevier 2014 [cited February 27 2014]. Available from

Elsevier. 2014. Text and Data Mining 2014 [cited February 27 2014]. Available from



  • There are currently no refbacks.

License URL:


SCImago Journal & Country Rank data for ITAL