A 21st Century Technical Infrastructure for Digital Preservation
Digital preservation systems and practices are rooted in research and development efforts from the late 1990s and early 2000s when the cultural heritage sector started to tackle these challenges in isolation. Since then, the commercial sector has sought to solve similar challenges, using different technical strategies such as software defined storage and function-as-a-service. While commercial sector solutions are not necessarily created with long-term preservation in mind, they are well aligned with the digital preservation use case. The cultural heritage sector can benefit from adapting these modern approaches to increase sustainability and leverage technological advancements widely in use across Fortune 500 companies.
Abhijith Shenoy, “The Pros and Cons of Erasure Coding & Replication vs RAID in Next-Gen Storage Platforms” (Software Developer Conference, Storage Networking Industry Association, 2015), https://perma.cc/YFS5-KXKK.
Abutalib Aghayev et al., “File Systems Unfit as Distributed Storage Backends” (Proceedings of the 27th ACM Symposium on Operating Systems Principles—SOSP ’19, Huntsville, Ontario, Canada: Association for Computing Machinery, 2019): 353–69, https://doi.org/10.1145/3341301.3359656.
Alex Garnett, Mike Winter, and Justin Simpson, “Checksums on Modern Filesystems, or: On the Virtuous Consumption of CPU Cycles,” in IPres 1028 Conference [Proceedings] (International Conference on Digital Preservation, Boston, Mass., 2018), https://doi.org/10.17605/OSF.IO/Y4Z3E.
Andrew Hankinson et al., “Implementation Notes, Oxford Common File Layout Specification,” July 7, 2020, https://perma.cc/PVF8-SQFN.
Andrew Hankinson et al., “Oxford Common File Layout Specification,” July 7, 2020, https://perma.cc/S73Z-3N6K.
Andrew Hankinson et al., “The Oxford Common File Layout: A Common Approach to Digital Preservation,” Publications 7, no. 2 (June 2019): 39, https://doi.org/10.3390/publications7020039.
Andrew Woods, “Implementations | OCFL/Spec,” GitHub, February 10, 2021, https://github.com/OCFL/spec.
Ben Fino-Radin and Michelle Lee, “[Starling]” (presentation, Designing Storage Architectures for Digital Collections, Washington, DC: Library of Congress, 2019), https://perma.cc/7LGU-UEW9.
Ben Goldman, “It’s Not Easy Being Green(e): Digital Preservation in the Age of Climate Change,” in Archival Values: Essays in Honor of Mark A. Greene, ed. Mary A. Caldera and Christine Weidman (Chicago: American Library Association, 2018), 274–95, https://scholarsphere.psu.edu/concern/generic_works/bvq27zn11p.
Bill Branan, “Cloud-Native Preservation” (OSF, October 22, 2019), https://osf.io/kmdyf/.
Brian Hickmann and Kynan Shook, “ZFS and RAID-Z: The Über-FS?” (University of Wisconsin–Madison, December 2007), https://perma.cc/W5PD-ENPP.
David Rosenthal, “Cloud for Preservation,” DSHR’s Blog, 2019, https://perma.cc/ZLS9-R857.
David Rosenthal, “Optical Media Durability: Update,” DSHR’s Blog, August 20, 2020, https://perma.cc/VKW9-83J3.
David S. H. Rosenthal et al., “Requirements for Digital Preservation Systems: A Bottom-Up Approach,” D-Lib Magazine 11, no. 11 (2005), https://perma.cc/X2R7-R5XP.
Desire Athow, “Here’s What Sony’s Million Gigabyte Storage Cabinet Looks Like,” TechRadar (blog), 2020, https://perma.cc/VHN4-LAYT.
Edward Shishkin, “Resier5 (Format Release 5.X.Y),” MARC mailing list archive, 2019, https://perma.cc/DN8Y-V8KQ.
Eric Jonas et al., “Cloud Programming Simplified: A Berkeley View on Serverless Computing” (University of California, Berkeley, February 10, 2019), https://perma.cc/YAM2-TZ8W.
Evviva Weinraub et al., Beyond the Repository: Integrating Local Preservation Systems with National Distribution Services (Northwestern University, 2018), https://doi.org/10.21985/N28M2Z.
“Fedora Content Versioning,” 2005, https://duraspace.org/archive/fedora/files/download/2.0/userdocs/server/features/versioning.html.
“Fujifilm Launches ‘Fujifilm Software-Defined Tape,’” FUJIFILM Europe, May 19, 2020, https://perma.cc/B3GN-PLR9.
Giacinto Donvito, Giovanni Marzulli, and Domenico Diacono, “Testing of Several Distributed File-Systems (HDFS, Ceph and GlusterFS) for Supporting the HEP Experiments Analysis,” Journal of Physics: Conference Series 513, no. 4 (June 2014): 042014, https://doi.org/10.1088/1742-6596/513/4/042014.
Glenn Heinle, “Unlocking Ceph” (presentation, Designing Storage Architectures for Digital Collections, Washington, DC: Library of Congress, 2019), https://perma.cc/Z2R9-79ZE.
The Good, the Bad, and the Ugly,” Box Blog, October 12, 2011, https://perma.cc/MVP7-YVZV.
Hannah Frost, “Version 1.0 of the Oxford Common File Layout (OCFL) Released,” Stanford Libraries (blog), July 23, 2020, 1, https://perma.cc/5J5M-GYQW.
Henry Newman, “Industry Trends” (presentation, Designing Storage Architectures for Digital Collections, Washington, DC: Library of Congress, 2019), https://perma.cc/3MGK-N5U3.
IBM Systems, “Tape Goes High Speed,” 2016, https://perma.cc/FNV9-RTG9.
“Import Metadata,” documentation for Archivematica 1.12.1, Artefactual Systems, Inc., accessed May 21, 2021, https://perma.cc/UE3R-BDGZ..
“Ingest,” documentation for Archivematica 1.12.1, Artefactual Systems, Inc., accessed May 21, 2021, https://perma.cc/5SN5-GFX3.
Joseph Migga Kizza, “Virtualization Technology and Security,” in Guide to Computer Network Security, Computer Communications and Networks (Springer, Cham, 2017), 457–75, https://doi.org/10.1007/978-3-319-55606-2_21.
Julian Morley, “Storage Cost Modeling” (presentation, PASIG, Mexico City, Mexico, 2019), https://doi.org/10.6084/m9.figshare.7795829.v1.
Junkil Ryu and Chanik Park, “Effects of Data Scrubbing on Reliability in Storage Systems,” IEICE TRANSACTIONS on Information and Systems E92-D, no. 9 (September 1, 2009): 1639–49, https://doi.org/10.1587/transinf.E92.D.1639.
Keith L. Pendergrass et al., “Toward Environmentally Sustainable Digital Preservation,” The American Archivist 82, no. 1 (2019): 165–206, https://doi.org/10.17723/0360-9081-82.1.165.
Marco La Rosa et al., “Our Thoughts on OCFL over S3 · Issue #522 · OCFL/Spec,” GitHub, accessed March 12, 2021, https://perma.cc/PA3G-CB78.
Mark Carlson et al., “Software Defined Storage,” (white paper, Storage Network Industry Association, January 2015), https://perma.cc/AQ4T-9YXQ.
Matthew Addis, Which Checksum Algorithm Should I Use? (DPC Technology Watch Guidance note, Digital Preservation Coalition, December 11, 2020), https://doi.org/10.7207/twgn20-12.
Matthew Ahrens, “OpenZFS: A Community of Open Source ZFS Developers,” in AsiaBSDCon 2014 (AsiaBSDCon, Tokyo, Japan: BSD Research, 2014), 27–32, https://perma.cc/XG79-PBU7.
Micah Altman et al., “NDSA Storage Report: Reflections on National Digital Stewardship Alliance Member Approaches to Preservation Storage Technologies,” D-Lib Magazine 19, no. 5/6 (May 2013), https://doi.org/10.1045/may2013-altman.
Michael Armbrust et al., Above the Clouds: A Berkeley View of Cloud Computing, (technical report, EECS Department, University of California, Berkeley, February 10, 2009), https://perma.cc/QJ5W-8S5Y.
Michelle Gallinger et al., “Trends in Digital Preservation Capacity and Practice: Results from the 2nd Bi-Annual National Digital Stewardship Alliance Storage Survey,” D-Lib Magazine 23, no. 7/8 (2017), https://doi.org/10.1045/july2017-gallinger.
Nathan Tallman, “Software Defined Storage,” (presentation for the NDSA Infrastructure Interest Group, March 16, 2020), https://doi.org/10.26207/3nn2-zv13.
Nathan Tallman and Lauren Work, “Approaching Appraisal: Framing Criteria for Selecting Digital Content for Preservation,” in IPres 1028 Conference [Proceedings] (International Conference on Digital Preservation, Boston, Mass.: OSF, 2018), https://doi.org/10.17605/OSF.IO/8Y6DC.
NDSA Levels of Preservation Revisions Working Group, “Levels of Digital Preservation, 2019 LOP Matrix, V2.0 (OSF, October 14, 2019), https://osf.io/2mkwx/.
NDSA Storage Infrastructure Survey Working Group, 2019 Storage Infrastructure Survey: Results of the Storage Infrastructure Survey (OSF, 2020), https://doi.org/10.17605/OSF.IO/UWSG7.
Ontario Council of University Libraries, “Ontario Library Research Cloud,” accessed April 14, 2021, https://perma.cc/KMP9-FS8K.
“Open Source Cloud Computing Infrastructure,” OpenStack, accessed April 14, 2021, https://perma.cc/G9GE-92JD.
Peter Fairley, “Ethereum Plans to Cut Its Absurd Energy Consumption by 99 Percent,” IEEE Spectrum (blog), January 2, 2019, https://perma.cc/GCH7-T556.
PREMIS Editorial Committee, PREMIS Data Dictionary for Preservation Metadata, version 3.0 (Library of Congress, November 2015), https://perma.cc/L79V-GQV7.
Raghavendra Talur, “BitRot Detection | Gluster/Glusterfs-Specs,” GitHub, August 15, 2015, https://github.com/gluster/glusterfs-specs/blob/fe4c5ecb4688f5fa19351829e5022bcb676cf686/done/GlusterFS%203.7/BitRot.md.
Sarah Barsness et al., 2017 Fixity Survey Report: An NDSA Report (OSF, April 24, 2018), https://doi.org/10.17605/OSF.IO/SNJBV.
Sibyl Schaefer et al., “User Guide for the Preservation Storage Criteria,” February 25, 2020, https://doi.org/10.17605/OSF.IO/SJC6U.
“A Simple Explanation of the Triple Bottom Line,” University of Wisconsin Sustainable Management, October 2, 2019, https://perma.cc/2HWF-3MMQ.
T. Bui et al., “ARCHANGEL: Trusted Archives of Digital Public Documents,” in Proceedings ACM Document Engineering 2018 (Association for Computing Machinery, arxiv.org, 2018), https://doi.org/10.1145/3209280.3229120.
Tamara Scott, “Big Data Storage Wars: Ceph vs Gluster,” TechnologyAdvice (blog), May 14, 2019, https://perma.cc/2YY2-BBXG.
Copyright (c) 2021 Nathan Tallman
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.