Negotiating a Text Mining License for Faculty Researchers
This case study examines strategies used to leverage the library’s existing journal licenses to obtain a large collection of full-text journal articles in extensible markup language (XML) format; the right to text mine the collection; and the right to use the collection and the data mined from it for grant-funded research to develop biomedical natural language processing (BNLP) tools. Researchers attempted to obtain content directly from PubMed Central (PMC). This attempt failed due to limits on use of content in PMC. Next researchers and their library liaison attempted to obtain content from contacts in the technical divisions of the publishing industry. This resulted in an incomplete research data set. Then researchers, the library liaison, and the acquisitions librarian collaborated with the sales and technical staff of a major science, technology, engineering, and medical (STEM) publisher to successfully create a method for obtaining XML content as an extension of the library’s typical acquisition process for electronic resources. Our experience led us to realize that text mining rights of full-text articles in XML format should routinely be included in the negotiation of the library’s licenses.
University of Colorado Anschutz Medical Campus. 2013. University of Colorado Anschutz Medical Campus Quick Facts 2013 [cited September 9 2013]. Available from http://www.ucdenver.edu/about/WhoWeAre/Documents/CUAnschutz_facts_041613.pdf.
Leach, Sonia M, Hannah Tipney, Weiguo Feng, William A Feng, Priyanka Kasliwal, Ronald P Schuyler, Trevor Williams, Richard A Spritz, and Lawrence Prof. Hunter . 2009. Biomedical Discovery Acceleration, With Applications to Craniofacial Development. PLoS computational biology 5 (3):1-19. DOI: 10.1371/journal.pcbi.1000215
Prof. Hunter , Lawrence. 2014. Hanalyzer: A 3R system for genome-scale discovery. Anschutz Medical Campus, University of Colorado 2009 [cited January 11 2014]. Available from http://hanalyzer.sourceforge.net/.
Clark, Jonathan. 2013. Text Mining and Scholarly Publishing. [cited July 24 2013]. Available from http://publishingresearch.net/index.php?option=com_docman&view=download&alias=4-prc-text-mining-and-scholarly-publishin-feb-2013&category_slug=prc-guides-1&Itemid=824, .
Lok, Corie. 2010. Speed Reading. Nature 463:416-418. DOI:10.1038/463416a
Dai, Hong-Jie, Yen-Ching Chang, Richard Tzong-Han Tsai, and Wen-Lian Hsu. 2010. New Challenges for Biological Text-Mining in the Next Decade. Journal of Computer Science and Technology 25 (1):169-179. DOI: 10.1007/s11390-010-9313-5
Hoekman, Anne. 2013. Journal Publishing Technologies: XML 2008 [cited September 23 2013]. Available from https://www.msu.edu/~hoekmana/WRA%20420/ISMTE%20article.pdf.
Brown, Alex. 2003. XML in Serial Publishing: Past, Present and Future. OCLC Systems & Services 19 (4):149-154. DOI: 10.1108/10650750310698775
Ramakrishnan, Cartic, Abhishek Patnia, Eduard Hovy, and Gully APC Burns. 2012. Layout-aware text extraction from full-text PDF of scientific articles. Source Code for Biology and Medicine 7 (1):7. DOI: 10.1186/1751-0473-7-7
Prof. Hunter , Lawrence, and K Bretonnel Cohen. 2006. Biomedical Language Processing: Perspective What’s Beyond PubMed? Molecular cell 21 (5):589-594. DOI: 10.1016/j.molcel.2006.02.012
Krallinger, Martin, Alfonso Valencia, and Lynette Hirschman. 2008. Linking Genes to Literature: Text Mining, Information Extraction, and Retrieval Applications for Biology. Genome Biol 9 (Suppl 2):S8.1-S8.14. DOI: 10.1186/gb-2008-9-s2-s8
Smit, Eefke, and Maurits van der Graaf. 2012. Journal Article Mining: The Scholarly Publishers' Perspective. Learned Publishing 25 (1):35-46. DOI: 10.1087/20120106
Hearst, Marti A. 2013. What is Text Mining? 2003 [cited September 20 2013]. Available from http://people.ischool.berkeley.edu/~hearst/text-mining.html.
Cohen, K. Bretonnel, and Prof. Hunter , Lawrence. 2008. Getting Started in Text Mining. PLoS Computational Biology 4 (1):1-3. DOI: http://dx.doi.org/10.1371%2Fjournal.pcbi.0040020
JISC. 2013. The Model NESLi2 Licence for Journals 2013 [cited September 20 2013]. Available from http://www.jisc-collections.ac.uk/Help-and-information/How-Model-Licences-work/NESLi2-Model-Licence-/.
Hargreaves, Ian. 2011. Digital Opportunity: A Review of Intellectual Property and Growth. [cited July 24, 2013]. Available from http://www.ipo.gov.uk/ipreview-finalreport.pdf .
Manyika, James, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. 2011. Big Data:The Next Frontier for Innovation, Competition, and Productivity. [cited July 24 2013.] Available from http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation.
McDonald, Diane, and Ursula Kelly. 2012. The Value and Benefits of Text Mining to UK Further and Higher Education. In Digital Infrastructure Directions, edited by J. Redfearn. [cited July 24 2013]. Available from http://www.jisc.ac.uk/reports/value-and-benefits-of-text-mining .
Gold in the Text? 2012. Nature 483:124. DOI: doi:10.1038/483124a
Van Noorden, Richard 2012. Trouble at the Text Mine. Natue 483:134-135. DOI: doi:10.1038/483134a
Aspesi, Claudio, Andrea Rosso, and Richard Wielechowski. 2012. Reed Elsevier: Is Elsevier Heading for a Political Train-Wreck? In Bernstein Research. [cited July 24 2013.] No longer available online.
Emery, Jill. 2009. Working In A Text Mine: Is Access About To Go Down? Journal of Electronic Resources Librarianship 20 (3):135-138. DOI: 10.1080/19411260802412745
Wikimedia Foundation Inc. 2014. Document type definition. In Wikipedia, the free encyclopedia. [cited February 18 2014]. Available from http://en.wikipedia.org/wiki/Document_type_definition
Smit, Eefke, and Maurits van der Graaf. 2011. Journal Article Mining: A Research Study into Practices, Policies, Plans and Promises. Amsterdam: Publishing Research Consortium. [cited July 24 2013]. Available from http://www.ingentaconnect.com/content/alpsp/lp/2012/00000025/00000001/art00006.
Internal Revenue Service. 2013. Unrelated Business Income Define 2014 [cited September 20 2013]. Available from http://www.irs.gov/Charities-&-Non-Profits/Unrelated-Business-Income-Defined.
British Columbia Electronic Library Network. 2013. BC ELN Database Licensing Framework 2014 [cited September 20 2013]. Available from http://www.cdlib.org/services/collections/toolkit/.
California Digital Library. 2013. Licensing Toolkit 2013 [cited September 20 2013]. Available from http://www.cdlib.org/services/collections/toolkit/.
BioMed Central. 2014. Using BioMed Central's open access full-text corpus for text mining research 2014 [cited February 11 2014]. Available from http://www.biomedcentral.com/about/datamining.
PLOS Biology. 2014. Publishing science, accelerating research 2014 [cited February 11 2014]. Available from http://www.plosbiology.org/static/help%3Bjsessionid=A5A46C9212C875109B9C766706535506#xmlContent.
Kisjes, Iris. University College London and Elsevier launch UCL Big Data Institute, December 18 2013. Available from http://www.elsevier.com/connect/university-college-london-and-elsevier-launch-ucl-big-data-institute.
Van Noorden, Richard. 2014. Elsevier Opens Its Papers to Text-Mining. Nature 506:17.
Sciverse. 2014. Content APIs. Elsevier 2014 [cited February 27 2014]. Available from http://www.developers.elsevier.com/cms/content-apis.
Elsevier. 2014. Text and Data Mining 2014 [cited February 27 2014]. Available from http://www.elsevier.com/about/universal-access/content-mining-policies.