A Systematic Approach Towards Web Preservation

  • Muzammil Khan Preston University Islamabad
  • Arif Ur Rahman Faculty of Computer Science, Free University of Bozen-Bolzano

Abstract

The main purpose of the article is to divide the web preservation process into small explicable stages and design a step-by-step web preservation process that leads to creating a well-organized web archive. A number of research articles are studied about web preservation projects and web archives, and designed a step-by-step systematic approach for web preservation. The proposed comprehensive web preservation process describes and combines strengths of different techniques observed during the study for preserving digital web contents into a digital web archive. For each web preservation step, different approaches and possible implementation techniques have been identified that can be adopted in digital archiving. The potential value of the proposed model is to guide the archivist, related personnel, and organizations to effectively preserved their intellectual digital contents for future use. Moreover, the model can help to initiate a web preservation process and create a well-organized web archive to efficiently manage the archived web contents. A section briefly describes the implementation of the proposed approach in a digital news stories preservation framework for archiving news published online from different sources.

References

"World Wide Web Size," The size of the World Wide Web, visited on Jan 31, 2019, http://www.worldwidewebsize.com/.

Brian F. Lavoie, "The Open Archival Information System Reference Model: Introductory Guide," Microform & Imaging Review 33, no. 2 (2004): 68-81.

Alexandros Ntoulas, Junghoo Cho, and Christopher Olston, "What's New on the Web? The Evolution of The Web from a Search Engine Perspective," in Proceedings of the 13th International Conference on World Wide Web-04 (New York, NY: ACM, 2004), 1-12.

Teru Agata et al., "Life Span of Web Pages: A Survey of 10 million Pages Collected in 2001," IEEE/ACM Joint Conference on Digital Libraries, (IEEE, 2014), 463-64, https://doi.org/10.1109/JCDL.2014.6970226.

Timothy Robert Hart and Denise de Vries, "Metadata Provenance and Vulnerability," Information Technology and Libraries 36, no. 4 (Dec. 2017): 24-33, https://doi.org/10.6017/ital.v36i4.10146.

Claire Warwick et al., "Library and Information Resources and Users of Digital Resources in the Humanities," Program 42, no. 1 (2008): 5-27, https://doi.org/10.1108/00330330810851555.

Susan Farrell, K. Ashley, and R. Davis, "A Guide to Web Preservation," Practical Advice for Web and Records Managers Based on Best Practices from the JISC-Funded PoWR Project (2010), https://jiscpowr.jiscinvolve.org/wp/files/2010/06/Guide-2010-final.pdf.

Peter Lyman, "Archiving the World Wide Web," Washington, Library of Congress (2002), https://www.clir.org/pubs/reports/pub106/web/.

Diomidis Spinellis, "The Decay and Failures of Web References," Communications of the ACM 46, no. 1 (2003): 71-77, https://dl.acm.org/citation.cfm?doid=602421.602422.

Digital Archive for Chinese Studies (DACHS) Archive2 https://www.zo.uni-heidelberg.de/boa/digital_resources/dachs/index_en.html, visited on Jan 31, 2019.

Julien Masanès, "Web Archiving Methods and Approaches: A Comparative Study," Library Trends 54, no. 1 (2005): 72-90, https://doi.org/10.1353/lib.2006.0005.

Hanno Lecher, "Small Scale Academic Web Archiving: DACHS," in Web Archiving (Berlin/Heidelberg: Springer, 2006), 213-25, https://doi.org/10.1007/978-3-540-46332-0_10.

Daniel Gomes et al., "Introducing the Portuguese Web Archive Initiative," in 8th international Web Archiving Workshop (Berlin/Heidelberg: Springer, 2009).

Gerrit Voerman et al., "Archiving the Web: Political Party Web Sites in the Netherlands," European Political Science 2, no. 1 (2002): 68-75, https://doi.org/10.1057/eps.2002.51.

Sonja Gabriel, "Public Sector Records Management: A Practical Guide," Records Management Journal 18, no. 2 (2008), https://doi.org/10.1108/00242530810911914.

Jung-ran Park and Andrew Brenza, "Evaluation of Semi-Automatic Metadata Generation Tools: A Survey of the Current State of the Art," Information Technology and Libraries 34, no. 3 (Sept, 2015): 22-42, https://doi.org/10.6017/ital.v34i3.5889.

Muzammil Khan and Arif Ur Rahman, "Digital News Story Preservation Framework," in Digital Libraries: Providing Quality Information: 17th International Conference on Asia-Pacific Digital Libraries, ICADL 2015 Seoul, Korea, December 9-12, 2015 (Proceedings, vol. 9469, Springer, 2015), 350-52, https://doi.org/10.1007/978-3-319-27974-9

Muzammil Khan, "Using Text Processing Techniques for Linking News Stories for Digital Preservation," PhD Thesis, Faculty of Computer Science, Preston University Kohat, Islamabad Campus, HEC Pakistan, 2018.

Dennis Dimick, "Adobe Acrobat Captures the Web," Washington Apple Pi Journal (1999): 23-25.

Trupti Udapure, Ravindra D. Kale, and Rajesh C. Dharmik, "Study of Web Crawler and Its Different Types," IOSR Journal of Computer Engineering (IOSR-JCE) 16, no. 1 (2014): 01-05, https://doi.org/10.9790/0661-16160105.

Dora Biblarz et al., "Guidelines for a Collection Development Policy Using the Conspectus Model," International Federation of Library Associations and Institutions, Section on Acquisition and Collection Development (2001).

Farrell, Ashley, and Davis, "Guide to Web Preservation;" E. Pinsent et al., "PoWR: The Preservation of Web Resources Handbook," http://jisc.ac.uk/publications/programmerelated/2008/powrhandbook.aspx (2010)

Michael Day, "Preserving the Fabric of Our Lives: A Survey of Web Preservation Initiatives," Lecture Notes in Computer Science (Berlin/Heidelberg: Springer, 2003): 461-72, https://doi.org/10.1007/978-3-540-45175-4_42.

Allan Arvidson, "The Royal Swedish Web Archive: A Complete Collection of Web Pages," International Preservation News (2001): 10-12.

Andreas Rauber, Andreas Aschenbrenner, and Oliver Witvoet, "Austrian Online Archive Processing: Analyzing Archives of the World Wide Web," Research and Advanced Technology for Digital Libraries (2002): ECDL 2002. Lecture Notes in Computer Science, vol 2458, (Berlin/Heidelberg: Springer, 2002), 16-31, https://doi.org/10.1007/3-540-45747-X_2.

William Arms, "Collecting and Preserving the Web: The Minerva Prototype," RLG DigiNews 5, no. 2 (2001).

Sonya Betz and Robyn Hall, "Self-Archiving with Ease in an Institutional Repository: Micro Interactions and the User Experience," Information Technology and Libraries 34, no. 3 (Sept. 2015): 43-58, https://doi.org/10.6017/ital.v34i3.5900.

Serge Abiteboul et al., "A First Experience in Archiving the French Web," in International Conference on Theory and Practice of Digital Libraries, (Berlin/Heidelberg: Springer, 2002), 1-15, https://doi.org/10.1007/3-540-45747-X_1; Sergey Brin and Lawrence Page, "Reprint of: The Anatomy of a Large-Scale Hypertextual Web Search Engine," Computer Networks 56, no. 18 (2012): 3825-33, https://doi.org/10.1016/j.comnet.2012.10.007.

NISO-Press, "Understanding Metadata," National Information Standards (2004), http://www.niso.org/publications/understanding-metadata.

Jane Greenberg, "Understanding Metadata and Metadata Schemes," Cataloging & Classification Quarterly 40, no. 3-4 (2009): 17-36, https://doi.org/10.1300/J104v40n03_02.

Michael Day, "Preservation Metadata Initiatives: Practicality, Sustainability, and Interoperability," Publishers: Archivschule Marburg (2004): 91-117.

Corey Harper, "Dublin Core Metadata Initiative: Beyond the Element Set," Information Standards Quarterly 22, no. 1 (2010): 20-31.

Jane Greenberg, "Dublin Core: History, Key Concepts, and Evolving Context (Part One)," in Slide Presentation on dc-2010 International Conference on Dublin Core and Metadata Applications Pittsburgh, PA (2010).

Cundiff V. Morgan, "An Introduction to the Metadata Encoding and Transmission Standard (METS)," Library Hi Tech 22, no. 1 (2004): 52-64, https://doi.org/10.1108/07378830410524495; Leta Negandhi, "Metadata Encoding and Transmission Standard (METS),"In Texas Conference on Digital Libraries, TCDL-2012 (2012).

Sally H. McCallum, "An Introduction to the Metadata Object Description Schema (MODS)," Library Hi Tech 22, no. 1 (2004): 82-88, https://doi.org/10.1108/07378830410524521.

R. Gartner, "MODE: Metadata Object Description Schema," JISC Techwatch Report TSW (2003): 03-06. www.loc.gov/standards/mods/.

VRA-Core, "An Introduction of VRA Core," http://www.loc.gov/standards/vracore/VRA Core4 Intro.pdf, Created: Oct 2014.

VRA-Core, "VRA Core Element Outline," http://www.loc.gov/standards/vracore/VRA Core4 Outline.pdf, Created: Feb 2007.

Priscilla Caplan, "Understanding PREMIS," Washington DC, USA: Library of Congress, (2009), https://www.loc.gov/standards/premis/understanding-premis.pdf; J. Relay, "An Introduction to PREMIS," Singapore IPRESS Tutorial, (2011), http://www.loc.gov/standards/premis/premistutorial iPRES2011 singapore.pdf.

Jennifer Schaffner, "The Metadata is the Interface: Better Description for Better Discovery of Archives and Special Collections, Synthesized from User Studies," Making Archival and Special Collections More Accessible, 85 (2015).

Joao Miranda and Daniel Gomes, "Trends in Web Characteristics," in Web Congress, 2009. LA-WEB'09. Latin American, (IEEE, 2009), 146-53, https://doi.org/10.1109/LA-WEB.2009.28.

Daniel Gomes, João Miranda, and Miguel Costa, "A Survey on Web Archiving Initiatives," Research and Advanced Technology for Digital Libraries (2011): 408-20, https://doi.org/10.1007/978-3-642-24469-8_41.

Miguel Costa and Mário J. Silva, "Evaluating Web Archive Search Systems," in International Conference on Web Information Systems Engineering (Berlin/Heidelberg: Springer, 2012), 440-454. https://doi.org/10.1007/978-3-642-35063-4_32.

Georgia Solomou and Dimitrios Koutsomitropoulos, "Towards an Evaluation of Semantic Searching in Digital Repositories: A DSpace Case-Study," Program 49, no. 1 (2015): 63-90, https://doi.org/10.1108/PROG-07-2013-0037.

Liu Yan Quan and Sarah Briggs, "A Library in the Palm of Your Hand: Mobile Services in Top 100 University Libraries," Information Technology and Libraries 34, no. 2 (June 2015): 133, https://doi.org/10.6017/ital.v34i2.5650.

Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval 463. (New York: ACM Pr., 1999).

Daniel Burda and Frank Teuteberg, "Sustaining Accessibility of Information through Digital Preservation: A Literature Review," Journal of Information Science, 39, no. 4 (2013): 442-58, https://doi.org/10.1177/0165551513480107.

Muzammil Khan et al., "Normalizing Digital News-Stories for Preservation," in Digital Information Management (ICDIM), 2016 Eleventh International Conference on (IEEE, 2016), 85-90, https://doi.org/10.1109/ICDIM.2016.7829785.

Muzammil Khan, Arif Ur Rahman, and M. Daud Awan, "Term-Based Approach for Linking Digital News Stories," in Italian Research Conference on Digital Libraries (Cham, Switzerland: Springer, 2018), 127-38, https://doi.org/10.1007/978-3-319-73165-0_13.

Published
2019-03-18
How to Cite
Khan, M., & Rahman, A. U. (2019). A Systematic Approach Towards Web Preservation. Information Technology and Libraries, 38(1), 71-90. https://doi.org/10.6017/ital.v38i1.10181
Section
Articles