Help Us Understand ITAL's Readership

Take this brief survey to tell us a little about how you came to ITAL today, how you're connected with library technology, and what you'd like to see in the journal. It won't take much of your time (no more than 5 minutes) and will help us understand the context in which we are working.

Digitization of Text Documents Using PDF/A

Yan Han, Xueheng Wan

Abstract


The purpose of this article is to demonstrate a practical use case of PDF/A file format for digitization of textual documents, following recommendation of using PDF/A as a preferred digitization file format. The authors showed how to convert and combine all the TIFFs with associated metadata into a single PDF/A-2b file for a document. Using open source software with real-life examples, the authors show readers how to convert TIFF images, extract associated metadata and ICC profiles, and validate against the newly released PDF/A validator. The generated PDF/A file is a self-contained and self-described container which accommodates all the data from digitization of textual materials, including page-level metadata and/or ICC profiles. With theoretical analysis and empirical examples, PDF/A file format has many advantages over traditional preferred file format TIFF / JPEG2000 for digitization of textual documents.

Full Text:

PDF


DOI: https://doi.org/10.6017/ital.v37i1.9878

Refbacks

  • There are currently no refbacks.




License URL: http://creativecommons.org/licenses/by/3.0/

/ojs/public/site/images/ejadmin/lita_67

ISSN:2163-5226

SCImago Journal & Country Rank data for ITAL