TY - JOUR AU - Han, Yan AU - Wan, Xueheng PY - 2018/03/19 Y2 - 2024/03/28 TI - Digitization of Text Documents Using PDF/A JF - Information Technology and Libraries JA - ITAL VL - 37 IS - 1 SE - Communications DO - 10.6017/ital.v37i1.9878 UR - https://ital.corejournals.org/index.php/ital/article/view/9878 SP - 52-64 AB - <span id="docs-internal-guid-9dbfc1a3-a25e-1e22-0ac4-2f0cac9e1691"><span>The purpose of this article is to demonstrate a practical use case of PDF/A file format for digitization of textual documents, following recommendation of using PDF/A as a preferred digitization file format. The authors showed how to convert and combine all the TIFFs with associated metadata into a single PDF/A-2b file for a document. Using open source software with real-life examples, the authors show readers how to convert TIFF images, extract associated metadata and ICC profiles, and validate against the newly released PDF/A validator. The generated PDF/A file is a self-contained and self-described container which accommodates all the data from digitization of textual materials, including page-level metadata and/or ICC profiles. With theoretical analysis and empirical examples, PDF/A file format has many advantages over traditional preferred file format TIFF / JPEG2000 for digitization of textual documents. </span></span> ER -