Reference Rot in the Repository: A Case Study of Electronic Theses and Dissertations (ETDs) in an Academic Library
This study examines ETDs deposited during the period 2011-2015 in an institutional repository, to determine the degree to which the documents suffer from reference rot, that is, linkrot plus content drift. The authors converted and examined 664 doctoral dissertations in total, extracting 11,437 links, finding overall that 77% of links were active, and 23% exhibited linkrot. A stratified random sample of 49 ETDs was performed which produced 990 active links, which were then checked for content drift based on mementos found in the Wayback Machine. Mementos were found for 77% of links, and approximately half of these, 492 of 990, exhibited content drift. The results serve to emphasize not only the necessity of broader awareness of this problem, but also to stimulate action on the preservation front.
T. H. Teper and B. Kraemer, “Long-Term Retention of Electronic Theses and Dissertations,” College & Research Libraries 63, no. 1 (January 1, 2002), 64. doi:10.5860/crl.63.1.61.
The term “reference rot” was introduced by the Hiberlink team. “Hiberlink – About”, accessed March 31, 2016 http://hiberlink.org/about.html.
LOCKSS: Lots of Copies Keep Stuff Safe, accessed December 6, 2016 http://www.lockss.org/about/what-is-lockss/.
Mark Edward Phillips, Daniel Gelaw Alemneh, and Brenda Reyes Ayala, “Analysis of URL References in ETDs: A Case Study at the University of North Texas,” Library Management 35, no. 4/5 (June 3, 2014), 294. doi:10.1108/LM-08-2013-0073.
Wallace Koehler, “An Analysis of Web Page and Web Site Constancy and Permanence,” Journal of the American Society for Information Science 50, no. 2 (January 1, 1999): 162–80, doi:10.1002/(SICI)1097-4571(1999)50:2<162::AID-ASI7>3.0.CO;2-B.
---. “Web Page Change and Persistence—a Four-Year Longitudinal Study,” Journal of the American Society for Information Science & Technology 53, no. 2 (January 15, 2002): 162–71, doi:10.1002/asi.10018.
---. "A longitudinal study of Web pages continued: a consideration of document persistence." Information Research 9, no. 2 (2004): 9-2. http://www.informationr.net/ir/9-2/paper174.html.
Fatih Oguz and Wallace Koehler, “URL Decay at Year 20: A Research Note,” Journal of the Association for Information Science and Technology 67, no. 2 (February 1, 2016): 477–79, doi:10.1002/asi.23561.
Mary F. Casserly and James Bird, “Web Citation Availability: Analysis and Implications for Scholarship,” College and Research Libraries 64, no. 4 (July 2003): 300–317, http://crl.acrl.org/content/64/4/300.full.pdf.
Diomidis Spinellis, “The Decay and Failures of Web References,” Communications of the ACM 46, no. 1 (January 2003): 71–77, doi:10.1145/602421.602422.
Carmine Sellitto, “A Study of Missing Web-Cites in Scholarly Articles: Towards an Evaluation Framework,” Journal of Information Science 30, no. 6 (December 1, 2004): 484–95, doi:10.1177/0165551504047822.
Matthew E. Falagas, Efthymia A. Karveli, and Vassiliki I. Tritsaroli, “The Risk of Using the Internet as Reference Resource: A Comparative Study,” International Journal of Medical Informatics 77, no. 4 (April 2008): 280–86, doi:10.1016/j.ijmedinf.2007.07.001.
Cassie Wagner et al., “Disappearing Act: Decay of Uniform Resource Locators in Health Care Management Journals,” Journal of the Medical Library Association 97, no. 2 (April 2009): 122–30, doi:10.3163/1536-5050.97.2.009.
Robert Sanderson, Mark Phillips, and Herbert Van de Sompel, “Analyzing the Persistence of Referenced Web Resources with Memento,” arXiv:1105.3459 [Cs], May 17, 2011, http://arxiv.org/abs/1105.3459.
Jonathan Zittrain, Kendra Albert, and Lawrence Lessig, “Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations,” Legal Information Management 14, no. 2 (June 2014): 88–99, doi:10.1017/S1472669614000255.
“Hiberlink - About,” accessed March 31, 2016 http://hiberlink.org/about.html.
“Hiberlink - Our Research,” accessed March 31, 2016 http://hiberlink.org/research.html.
Martin Klein, Herbert Van de Sompel, Robert Sanderson, Harihar Shankar, Lyudmila Balakireva, Ke Zhou, Richard Tobin. “Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot,” PLoS One 9, no. 12 (December 26, 2014), doi:10.1371/journal.pone.0115253.
Shawn M. Jones, Herbert Van de Sompel, Harihar Shankar, Martin Klein, Richard Tobin, Claire Grover. “Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content,” PLOS ONE 11, no. 12 (December 2, 2016): e0167475, doi:10.1371/journal.pone.0167475.
Martin Halbert, Katherine Skinner, and Matt Schultz, “Preserving Electronic Theses and Dissertations: Findings of the Lifecycle Management for ETDs Project,” Text, (August 6, 2015), 2. http://educopia.org/presentations/preserving-electronic-theses-and-dissertations-findings-lifecycle-management-etds.
For a recent overview, see Sarah Potvin and Santi Thompson, “An Analysis of Evolving Metadata Influences, Standards, and Practices in Electronic Theses and Dissertations,” Library Resources & Technical Services 60, no. 2 (March 31, 2016): 99–114, doi:10.5860/lrts.60n2.99.
Joy M. Perrin, Heidi M. Winkler, and Le Yang, “Digital Preservation Challenges with an ETD Collection — A Case Study at Texas Tech University,” The Journal of Academic Librarianship 41, no. 1 (January 2015): 98–104, doi:10.1016/j.acalib.2014.11.002.
Sanderson, Phillips, and Van de Sompel, “Analyzing the Persistence of Referenced Web Resources with Memento.” http://arxiv.org/abs/1105.3459.
Phillips, Alemneh, and Ayala, "Analysis of URL references," doi:10.1108/LM-08-2013-0073.
Alfred S. Sife and Ronald Bernard, “Persistence and Decay of Web Citations Used in Theses and Dissertations Available at the Sokoine National Agricultural Library, Tanzania,” International Journal of Education and Development Using Information and Communication Technology 9, no. 2 (2013): 85–94. http://eric.ed.gov/?id=EJ1071354.
“ETD2014 — University of Leicester,” University of Leicester, accessed January 27, 2016, http://www2.le.ac.uk/library/downloads/etd2014.
EDINA, University of Edinburgh, “Reference Rot: Threat and Remedy,” (Education, 04:54:38 UTC), http://www.slideshare.net/edinadocumentationofficer/reference-rot-and-linked-data-threat-and-remedy.
Peter Burnhill, Muriel Mewissen, and Richard Wincewicz, “Reference Rot in Scholarly Statement: Threat and Remedy,” Insights the UKSG Journal 28, no. 2 (July 7, 2015): 55–61, doi:10.1629/uksg.237.
Concordia University Graduate Programs, accessed April 7, 2016, http://www.concordia.ca/academics/graduate.html.
Klein et al., "Scholarly Context Not Found," doi:10.1371/journal.pone.0115253.
Ke Zhou, Richard Tobin, and Claire Grover, “Extraction and Analysis of Referenced Web Links in Large-Scale Scholarly Articles,” in Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’14 (Piscataway, NJ, USA: IEEE Press, 2014), 451–452, http://dl.acm.org/citation.cfm?id=2740769.2740863.
Give me text!,Open Knowledge International, accessed October 26, 2015-March 7, 2016, http://givemetext.okfnlabs.org/.
Phillips, Alemneh, and Ayala, "Analysis of URL references", doi:10.1108/LM-08-2013-0073.
“In Search of the Perfect URL Validation Regex,” accessed December 7, 2016, https://mathiasbynens.be/demo/url-regex. We selected "@gruber v2" for our extraction.
cURL v7.45.0, "command line tool and library for transferring data with URLs," accessed October 18, 2015, http://curl.haxx.se/.
We have used the term "memento" in lowercase to denote a snapshot souvenir page, to distinguish from an automated service utilizing the Memento protocol.
For a good overview of the types of problems, see Michael L. Nelson, Scott G. Ainsworth, Justin F. Brunelle, Mat Kelly, Hany SalahEldeen and Michele Weigle, "Assessing the Quality of Web Archives" 1 vol., Computer Science Presentations, Book 8 (Old Dominion University. ODU Digital Commons, 2014). http://digitalcommons.odu.edu/computerscience_presentations/8.
Shawn M. Jones, et al. “Scholarly Context Adrift", doi:10.1371/journal.pone.0167475.
OpenDOAR search of Institutional Repositories with Theses at http://www.opendoar.org/find.php, accessed August 26, 2016.
Joachim Schöpfel. "Adding value to electronic theses and dissertations in institutional repositories." D-Lib Magazine 19, no. 3 (2013): 1. doi:10.1045/march2013-schopfel
Strategic Digital Initiatives Working Group. Implementation of a Modern Digital Library at The Ohio State University. (Apr 2014). http://go.osu.edu/osul_di_whitepaper . (Published).
Tim Gollins. “Parsimonious Preservation: Preventing Pointless Processes! (The Small Simple Steps That Take Digital Preservation a Long Way Forward),” in Online Information Proceedings UK National Archives, 2009. http://www.nationalarchives.gov.uk/documents/information-management/parsimonious-preservation.pdf.
Margaret Hedstrom."Digital preservation: a time bomb for digital libraries." Computers and the Humanities 31, no. 3 (1997): 189-202. doi:10.1023/A:1000676723815.
Zittrain, Albert, and Lessig,"Perma", doi:10.1017/S1472669614000255.
Herbert Van de Sompel, Michael L. Nelson, Robert Sanderson, Lyudmila L. Balakireva, Scott Ainsworth, and Harihar Shankar. “Memento: Time Travel for the Web,” arXiv:0911.1112 [Cs], November 5, 2009, http://arxiv.org/abs/0911.1112.