Evaluating the Impact of the Long-S upon 18th-Century Encyclopedia Britannica Automatic Subject Metadata Generation Results

  • Sam Grabus Drexel University


This research compares automatic subject metadata generation when the pre-1800s Long-S character is corrected to a standard < s >. The test environment includes entries from the third edition of the Encyclopedia Britannica, and the HIVE automatic subject indexing tool. A comparative study of metadata generated before and after correction of the Long-S demonstrated an average of 26.51 percent potentially relevant terms per entry omitted from results if the Long-S is not corrected. Results confirm that correcting the Long-S increases the availability of terms that can be used for creating quality metadata records. A relationship is also demonstrated between shorter entries and an increase in omitted terms when the Long-S is not corrected.


A. Belaid et al., “Automatic indexing and reformulation of ancient dictionaries“ (paper presented at the First International Workshop on Document Image Analysis for Libraries, Palo Alto, CA, 2004), https://doi.org/10.1109/DIAL.2004.1263264.

Andrew West, “The Rules for Long-S," TUGboat 32, no. 1 (2011).

Beatrice Alex et al., “Digitised Historical Text: Does it have to be mediOCRe" (paper presented at the KONVENS 2012 (LThist 2012 workshop), Vienna, September 21, 2012)

G. Bueno-de-la-Fuente, D. Rodríguez Mateos, and J. Greenberg, “Chapter 10 - Automatic Text Indexing with SKOS Vocabularies in HIVE" (Elsevier Ltd, 2016)

Ingrid Tieken-Boon van Ostade, “Spelling systems,“ in An Introduction to Late Modern English (Edinburgh University Press, 2009).

Jane Greenberg et al., “HIVE: Helping interdisciplinary vocabulary engineering,“ Bulletin of the American Society for Information Science and Technology 37, no. 4 (2011), https://doi.org/10.1002/bult.2011.1720370407.

John Walsh, “The use of Library of Congress Subject Headings in digital collections," Library Review 60, no. 4 (2011), https://doi.org/10.1108/00242531111127875.

Karen Attar, “S and Long S," in Oxford Companion to the Book, eds. Michael Felix Suarez and H. R. II Woudhuysen (Oxford: Oxford University Press, 2010)

Koraljka Golub et al., “A framework for evaluating automatic indexing or classification in the context of retrieval,“ Journal of the Association for Information Science and Technology 67, no. 1 (2016), https://doi.org/10.1002/asi.23600

Liz Woolcott, “Understanding Metadata: What is Metadata, and What is it For?,” Routledge (November 17, 2017), https://doi.org/10.1080/01639374.2017.1358232

Lynne C. Howarth, “Metadata and Bibliographic Control: Soul-Mates or Two Solitudes?,“ Cataloging & Classification Quarterly 40, no. 3-4 (2005), https://doi.org/10.1300/J104v40n03_03.

Marcia Lei Zeng and Lois Mai Chan, “Metadata Interoperability and Standardization - A Study of Methodology, Part II," D-Lib Magazine 12, no. 6 (2006)

“Nineteenth-century knowledge project," (GitHub Repository), 2020, https://tu-plogan.github.io/.

Sam Grabus et al., “Representing Aboutness: Automatically Indexing 19th- Century Encyclopedia Britannica Entries,” NASKO 7 (2019), pp. 138-48, https://doi.org/10.7152/nasko.v7i1.15635.

Sheila Bair and Sharon Carlson, “Where Keywords Fail: Using Metadata to Facilitate Digital Humanities Scholarship," Journal of Library Metadata 8, no. 3 (2008), https://doi.org/10.1080/19386380802398503.

Ted Underwood, “A half-decent OCR normalizer for English texts after 1700," The Stone and the Shell, December 10, 2013, https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english-texts-after-1700/.

How to Cite
Grabus, S. (2020). Evaluating the Impact of the Long-S upon 18th-Century Encyclopedia Britannica Automatic Subject Metadata Generation Results. Information Technology and Libraries, 39(3). https://doi.org/10.6017/ital.v39i3.12235