Crosswalking EAD : Collaboration in Archival Description

Different library departments must work together, both formally and informally, in implementing encoded archival description and in repackaging descriptive information about archival collections to other formats, particularly machine-readable cataloging. The authors, one a technical services librarian and the other a special collections archivist, describe their experiences collaborating in these processes at The Ohio State University. Although other institutions may differ in their organizational structure, the authors hope to provide technical guidance, as well as a model of collaboration between archivists and technical services personnel. Careful dialogue and planning are essential to transcend the traditional divide between archival and library descriptive practices and systems.

Finding aids are descriptions of archival collections condensed into a single document.These tend to be much more detailed, and more hierarchically organized, than descriptions of books or serials in library catalogs and have often posed challenges for integrating descriptive access for mixed collections.The hierarchical organization of finding aids begins with a broad description of the collection, including its title, information about its creator, its physical extent, its scope and content, the dates of its holdings, and policies to be followed in its use.Successive levels of the description include the collection's conceptual division into series, often followed by subdivisions into subseries and files, in some cases proceeding to descriptions of individual items.Encoded Archival Description (EAD) is a standard for marking up these multilevel documents, dividing their regularly occurring descriptive elements into data fields.Parts of EAD documents can be easily exchanged with machine-readable cataloging (MARC), the traditional carrier for descriptive information in libraries.
Processes for mapping EAD to MARC, as well as converting legacy MARC AMC (archival and manuscripts control) records to EAD, have been described by previous authors. 1he authors of the present study have benefited from examining and testing other ways to reuse information and reduce duplication of staff effort.This article will build on those earlier works, describing the technical implementation of several types of mapping.The authors have found that collaborating across departments, and to some extent, across cultures, is essential in implementing EAD, and in repackaging its rich information.Collaboration in libraries has many definitions and several articles in the literature have addressed collaboration across institutions in implementing EAD. 2 Collaboration between special collections and technical services personnel within an institution is also essential in developing such projects, but the literature offers few such models.It is hoped that this paper will contribute a useful model that may be applied at other institutions.

Explanation of EAD
EAD is an extensible markup language (XML) Document Type Definition (DTD) and the international standard for XML encoding of archival finding aids.EAD provides a means of structuring the language of finding aids, so that they may be processed for presentation on the Web, and so that their descriptive elements can be exchanged with other metadata systems.Regularly structured and defined tags surround elements of the archival description, in essence creating indexable and searchable fields within the text of the document.Created by the Society of American Archivists in 1994, EAD was revised in 2002.This revision provided structural changes that facilitated logical groupings of descriptive elements and accommodated the need to keep EAD compatible with the General International Standard Archival Description, or ISAD(G). 3 The tagged data found in an EAD document can be manipulated in several ways.With extensible style sheet language (XSLT) scripts, EAD can be transformed to HTML, suitable for display on the Web.Linking elements in EAD make it possible to bring together files associated with the collection description, such as digital images of objects in the collection.XSLT style sheets can also be used to generate alternate displays of a finding aid.For example, repeated instances of personal or corporate names may occur throughout a single finding aid.When they are tagged as EAD <persname> or <corpname> elements, they may be extracted through XSLT into an alphabetical name index.Further, each element may be assigned a normalizing attribute, so all variant forms of a particular name can be resolved into a single authoritative form, derived from controlled vocabularies such as the Library of Congress Name Authority File (LCNAF).An XSLT script can then be written that will search out every normalized name attribute, pulling all of the associated entries together.In short, XSLT processes EAD elements in whatever form the author of the style sheet prescribes, outputting thousands of records in seconds.EAD encoding, done once, provides the material for numerous useful presentations of the information.
Furthermore, tagged data can be exchanged with other metadata systems.Data may be mapped to any descriptive scheme.This process is facilitated in EAD authoring templates distributed with the EAD Cookbook, in which encoding analogs are built into the underlying code.Elements that describe the finding aid itself are mapped to Dublin Core, which captures metadata for digital objects.Elements that describe the archival collection, such as names, scope notes, biographical or historical notes, and material formats, are mapped to MARC.These correspondences among descriptive systems ensure a standardized system for creation of MARC records.

Institutional Background
The challenge at The Ohio State University Libraries (OSUL) was to establish a crossdepartmental approach to the creation of EAD finding aids and the MARC records that describe them.OSUL includes a division of special collections and archives and a separate division for technical services, each reporting to an assistant director.It was necessary for the special collections and technical services divisions to work together in developing standard practices regarding EAD.
Special collections encompasses five repositories, including the Rare Books and Manuscripts Library, the Jerome Lawrence and Robert E. Lee Theatre Research Institute (TRI), the Hilandar Research Library of medieval Slavic manuscripts, and the Cartoon Research Library.The fifth repository is Archives, which includes the Robert Byrd Polar Research Program and the John Glenn archives, as well as materials documenting the history of OSU itself.Registers, inventories, and complete finding aids to the various collections exist on paper, in electronic word-processed documents, and in databases.
Special collections cataloging (SCCAT) is one of five departments within technical services at OSUL.A mid-1990s reorganization created a department mostly staffed with book catalogers and the department's focus in past years has been on reducing book and serial backlogs.In addition, a cataloger from SCCAT has created collection-level bibliographic records from the finding aids produced by special collections staff.Recently, some archives have begun to mount HTML finding aids on their Web sites and these have been linked to records in OSU's catalog through the 856 linking URL field.
Although finding aids themselves had not been a technical services responsibility in years past, OSUL administration looked to SCCAT to ensure compliance with EAD best practices, since this department works closely with all the special collections.In addition, as some collections' holdings had been described through word-processed documents, databases, spreadsheets, or any combination of these, SCCAT was very interested in making sure that information about collections was collected uniformly to ensure easy creation of a MARC record in OSUL's catalog.

EAD at OSUL
The first application of EAD at OSUL was at the TRI in 1994.A finding aid to the Twyla Tharp collection was encoded and added to the Dance Heritage Coalition group of finding aids on a New York Public Library server, which provided an environment for searching and displaying finding aids.Local delivery of EAD at OSUL began later, with the initiation of two significant projects.
In 2001, OSUL attempted to provide EAD content on a broader scale by outsourcing the conversion of several word-processed finding aids.Financial support was available to convert legacy-finding aids, but as this project preceded the examination and adoption of EAD best practices at OSUL, staff were unable to provide substantive guidance to the vendor for content and display.Outsourcing was therefore problematic.It quickly became apparent that the degree to which finding aids were standardized prior to markup affected the results a vendor was able to deliver.In addition, lacking guidance about how to mark up specific elements of finding aids, such as varying formats of container lists, made it very difficult to realize the benefits of outsourcing.
Meanwhile, another EAD project was underway at OSU's Cartoon Research Library.A two-year grant from the Getty Institute supported the processing, from July 2000 to June 2002, of a large collection of cartoon art.The terms of the grant required that the finding aid be encoded in EAD.After considering various means of authoring, editing, and delivering the finding aid, the project archivist chose the EAD Cookbook for NoteTab, an open-source application developed at the University of Illinois and offered for free to members of the archival profession. 4 key feature of NoteTab is its support for creation of dialogue boxes that mimic the look of simple data-entry interfaces.The complicated tagging structure of EAD is supplied "behind the scenes."The simplified interface makes staff training significantly less complicated (figures 1 and 2).Basic EAD-compliant inputting templates distributed with the EAD Cookbook for NoteTab can be modified to fit local markup practices.In the case of the Cartoon Research Library project, this meant that inputting templates could be created to accommodate a finding aid organized by comic strip and newspaper titles, rather than as an ordinal box-and-folder list.

Cross-Departmental Collaboration
Once the two-year Getty-funded project was completed, internal funding was made available for further development of EAD at OSUL.The authors of this article (the head of SCCAT and the project archivist from the Cartoon Research Library) were designated to collaborate on implementing EAD across OSUL, with the archivist assigned to work half-time as OSUL's EAD specialist.The task was to create EAD markup routines for OSUL's various repositories, testing the routines on a sample group of legacy finding aids.In this effort, the authors collaborated with curators from the various repositories and members of the libraries' information technology (IT) division.
Experience gained from both the outsourcing project and the Getty-funded project made it clear that standardization would be essential.The manner in which legacy finding aids had been written greatly affected the ability to encode them.The process was generally most successful when curators had adhered to the archival principle of general-to-specific description, and referred to Hensen's Archives, Personal Papers, and Manuscripts to articulate areas of the finding aid that corresponded to MARC records. 5Some collection descriptions, written more as inventories than as hierarchical finding aids, lacked such essential elements as scope-and-content notes and biographical notes.Often there was no information about restrictions on access and use, or a preferred form of citation.This may have been because the repositories' staff assumed the role of communicating these policies to researchers, rather than putting this information in finding aids.
The EAD specialist, after consulting the RLG Best Practices Guidelines for EAD and the Best Practices of the Online Archive of California, circulated a document to curators explaining what would be needed in order to make the finding aids ready for EAD markup. 6Particular stress was placed on the fact that these finding aids would be presented on the Web to an audience unfamiliar with local holdings and institutional practices, so that the broader top-level information would be essential.In addition, this information would help to shape each finding aid into a truly hierarchical description. 7Conforming to these standards would ensure not only that finding aids were good candidates for EAD markup, but that the necessary elements for useful MARC records would also be in place.It was hoped, through collaboration, to reduce the need to repeatedly follow up with curators and archivists in search of more descriptive information.It should be noted that some finding aids were simply too idiosyncratic to be restructured into regular data fields; extensive rewriting would have been necessary in order to prepare them for EAD compliance.These were rejected for this project.The finding aids ultimately chosen for the sample group were those with high research value that could, with a reasonable amount of effort, be restructured in accordance with EAD.At the same time, some flexibility had to be provided.Archival collections can encompass a broad range of materials, including literary manuscripts, business records, academic papers, and visual materials.While any finding aid can be written to general standards, different material types lead to differences in arrangement and descriptive practices.The challenge, then, was to write EAD-compliant practices that were flexible enough to accommodate a variety of findingaid types.While asking that various repositories adhere to the same encoding standards, it was necessary to respect their need to present their collection descriptions in ways that were faithful to the nature of the collections and meaningful to researchers.
The key to achieving this goal lay in the use of XSLT.As mentioned earlier, XSLT allows for different presentations of an encoded finding aid.For example, many finding aids take the form of a simple box-and-folder list, with brief descriptions of contents.XSLT for PAD is usually written to produce this type of output (figure 3).But XSLT's flexibility in selecting and displaying elements of archival description makes it possible to reorder the output to resemble a more typical literary-manuscript collection description (figure 4).The finding aids in figures 3 and 4, while they are marked up following the same encoding practices, differ significantly in appearance, each following the traditional practices associated with its collection type--personal papers and literary manuscripts, respectively.Flexibility in presentation of the finding aids balanced the demand for conformity in description and markup. 8Once reengineering began, it was important for all parties to consult so that all necessary information was encoded in reengineered finding aids, even if presentation choices might differ later.
The interrelationship between PAD and MARC was of particular interest to the authors, as it reflects the expanding responsibilities of the SCCAT department (and technical services operations across libraries.)Librarians at OSUL are also interested in exploring applications for XML, and PAD provided a good case for testing software.In the past, the speed at which catalogers had been able to create records for archival collections had been limited by special collections curators' time in preparing and prioritizing finding aids, and by cataloger's time in interpreting the information in the finding aid and transferring it to the MARC format.Faced with growing requests to provide MARC records for archival collections, mapping PAD to MARC, along with increased staff training, promised a way for SCCAT to meet increased expectations without neglecting other cataloging responsibilities.Another staffing issue was addressed by training student assistants to use the encoding templates.The interface needed to be simple enough in its presentation so that student employees without any previous experience with XML or EAD could successfully use it.The authors were pleased to find that, with one to five hours of training (depending on the complexity of the encoding), student staff were able to successfully use the NoteTab templates.This meant that full-time staff and curators in the special collections departments were able to devote their time to activities other than encoding documents.The key to this success lay in careful planning of the brief training program beforehand, and a willingness to answer follow-up questions.
In addition to marking up legacy finding aids, and inspired by the prospect of repackaging descriptive information, other means of creating PAD records were explored.A project at TRI reinforced the point that standards-compliant legacy descriptive data are crucial to successful repackaging.The authors mapped fields of an existing collections database to corresponding EAD fields.A systems librarian in the IT division created a PHP script to generate output from the database, appropriately mapped and tagged, ready for incorporation into the XML document.This script was made available to staff at TRI, who could create output with a mouse click, meaning that additions to the database can be easily integrated into the EAD finding aid without further intervention from IT personnel.The close correspondence between the database fields and appropriate elements of standard archival description proved valuable in mapping the data, even though years had passed since the curator created the database.As with text-finding aids, the curator's adherence to archival descriptive standards in creating the database made the mapping to EAD possible.
Once a workflow for creation of new finding aids was in place, a "reverse mapping" with the content of hundreds of MARC records was explored.A special collections curator wanted published material from a larger collection incorporated as a series into a finding aid.These objects--loose manuscripts of plays bound into volumes titled according to the radio or television series that had featured the plays--had already been cataloged using MARC.It was possible to pull all the necessary information from the catalog, selecting the records using the catalog management interface, and output four selected fields from each record as plain text (figure 5).The default output labels for each element reflect the MARC fields from which they were taken.Both the title and the year of each item were outputted with the label "TITLE" because the dates in the catalog records were in the MARC 245 subfield f, according to manuscript cataloging practice."ALT TITLE," representing the name of each volume, was derived from the 730 field (uniform title for radio or television series).
The data extracted from MARC was then mapped to EAD elements.A Perl script, developed by a student assistant, was used to mark up the records as tagged EAD, based on the authors' mapping.The bound-volume titles were mapped to subseries-level components in the hierarchical description, and the play titles, dates, and local call numbers to item-level components directly below them in the hierarchy.The Perl script picked up the dates of the earliest and latest plays in each volume, outputting them as beginning and ending dates for the appropriate subseries (see figure 6).(The presence of the TITLE label for both title and date in the text output did not hamper the Perl scripting process, since the script was designed to select elements based on their positions in the records, rather than on their labels.)This kind of repackaging of information makes a lot of sense at OSUL because the numerous collections may include various types of material in many formats and may already be cataloged or described in some way.In addition to taking advantage of existing bibliographic information, this project contributed a greater degree of knowledge about both EAD and MARC across the organization.
Finally, the authors tested the actual creation of bibliographic records for processed collections.Currently, SCCAT catalogers select data elements, one finding aid at a time, for insertion or summary into finding aids from HTML or Word documents.EAD can generate MARC records with minimal human intervention.Since MARC encoding analogs were provided in the EAD Cookbook templates, the authors were anxious to take advantage of this content mapping to generate marked-up, tagged collection records.
Staff at many institutions have experimented with mapping data across descriptive schemes such as EAD, MARC21, MARC XML, and other standards.However, the authors were unable to find an XSLT file that provided an appropriate EAD-to-MARC transformation.Among the crosswalking utilities available on the Web, it was found that many XSLT programs were available for transforming MARC21 to various document types, including MARC XML, HTML, Dublin Core, Metadata Object Description Schema (MODS), Metadata Authority Description Scheme (MADS), Open Archives Initiative MARC (OAI-MARC), and EAD, but not for transforming EAD to MARC21.A promising suite of products, MARCEdit 4.5, is offered at the Oregon State University Web site, and includes EAD-to-MARC XSLT.However, this program is one part of the larger suite, which produces files encoded as MARC XML, a schema not in use at OSUL and for which there was no local systems support.An XSLT style sheet was needed that could be used in OSUL's local setup, a simple batch file (included in the EAD Cookbook for NoteTab) that runs Saxon 6.5.3.At the time of this writing, it appears that several institutions are developing their own XSLT for EAD-to-MARC transformation, outputting results that are usable in their local systems.Perhaps by the time this article is published, an authoritative version will be included with the EAD Cookbook, or otherwise widely distributed.In the meantime, the authors developed their own version of an XSLT style sheet that accomplished the desired transformation.Mapping of the data is straightforward for some fields but not so for others.A single EAD element could conceptually be placed in more than one MARC field.For example, creators who also represent subjects of a collection may be considered, in a MARC environment, as worthy of both 1xx and 6xx fields.While the default mappings in EAD assign creators to 1xx fields, catalogers who understand how their catalog indexes different types of information can make a strong case for what might otherwise be seen as needless duplication.It does no good to accept the default mappings on principle if people are misled by the results of their author-or subject-limited searches.Since mapping elements from a finding aid to multiple MARC fields would require replicating index terms in the finding aid--possibly further confusing users by listing names more than once--a one-size-fits-all transformation style sheet is not possible, but a basic, universal framework can be constructed and edited as needed. 9he OSUL style sheet outputs a plain-text version of a MARC21 record with appropriate alphanumeric tags for fields, subfields, and indicators.The authors designed the style sheet for use across departments at OSUL, although others may edit it as needed for local use. 10he output text can easily be copied into OCLC's Connexion client-cataloging interface (although other interfaces should work as well.)Using constant data sets that automatically apply leader and local-control information, a record can be easily generated, reviewed, and uploaded to OCLC (see figure 7).(Since OSUL is not an RLIN member, this process was not tested using that database.)Obviously the occasional formatting irregularity requires human intervention, but the process promises to be much simpler overall.Another advantage to the local development of XSLT is that local genre terms and local indexes may be included.In the test case, for example, the professional term "cartoonist" was drawn from the EAD field <occupation> and mapped to a 656 index term in a local index.
This process has also provided clearer guidelines for collective description.Curators, understandably intent on describing the valuable resources in their care, are often impatient with the field length-limits imposed by the catalog, and they may not be happy with the information catalogers have chosen to include based on those limits.The EAD process has shown how clearer and more concise scope-and-content or biographical-historical notes are more easily transferred into a cataloging record with less chance of important information being left out.

Future Directions
This collaboration has only whetted the authors' appetites for more experimentation and discovery.Currently, OSUL is participating in the development of a statewide system for searching and delivery of finding aids, working with its OhioLINK consortium.The authors' belief in the need to involve both catalogers and archivists has already borne fruit as this project takes off, spearheaded by a mixed group of staff from several institutions across the state.
In addition, this project has provided much-needed knowledge about XML to staff at OSUL.The experience shows that expertise in EAD might help the libraries as a whole evaluate and use other types of XML, as well as XML software that may be useful in Web development and content management.A recently hired metadata librarian has begun building on this experience to test other forms of XML, such as Metadata Encoding and Transmission Standard.
OSUL special collections has adopted a policy that every collection whose finding aid is available online (in any format) will be represented by a record in the library catalog.This serves as a de-facto prioritization, since newly processed collections and high-profile collections rise to the top of the cataloging workload here.
Like many institutions, OSUL is pondering the future role of its catalog in providing access to materials, but this policy has served it well so far.At the very least, the bibliographic record provides an inventory-and audit-control mechanism that is important in the context of the larger library system.At the most, it integrates controlled vocabularies and subject headings across formats--allowing people to discover primary-source material when searching the catalog.The link between the MARC record and the complete finding aid (in the 856 field) also reassures curators that their more complete descriptive information will be available to researchers, despite the field-length limits of OSUL's catalog.
Therefore, simplifying the process of cataloging has been a high priority.
The authors' experiments have, of necessity, been affected by the OSUL catalog, as well as the technical requirements of XML authoring software.OSU's library system from Innovative Interfaces has a great deal of flexibility in isolating individual records and fields within those records for output.However, it is sometimes a multistep process and not always as straightforward as the authors would like.
Not surprising to anyone working in a library environment, standards continue to change, requiring slight changes in the workflow at OSUL, as well.Along with ongoing changes in catalog software and recent revisions to the PAD standard, the descriptive rules forming the basis of archival finding aids are currently being reevaluated and a new standard has just been published. 11As the new guidelines are studied, ways may be found to improve practices and further facilitate standardization.

Conclusion
The authors have found that their work with PAD is strengthened by the differing perspectives and skills brought to the process.In addition, their work is more widely understood across the large and complex organization that is OSUL than it would be with either of them working in isolation.Working together has provided the opportunity to set standards, bridge differences in descriptive schemes, and build a base from which it is possible to work toward increasingly sophisticated delivery of information resources.
Institutional context and IT support will greatly affect the results of any collaboration.Of course, a library culture where archivists, systems personnel, and technical services staff are able to discuss related issues would also be necessary to best make decisions.Although the different specialties will always have different technical knowledge, a surprisingly high level of comprehension can be achieved.
The future looks bright, as the authors anticipate taking advantage of XML standards and the built-in (and prescient) sophistication of MARC to exchange information about OSUL's rich research collections.

Figure 2 .
Figure 2. NoteTab Interface Showing Input Dialog Box

Figure 3 .
Figure 3. Finding Aid with Box and Folder List

Figure 5 .
Figure 5.Text Output of Records from Catalog

Figure 6 .
Figure 6.Catalog Output Wrapped with Tags from Perl Script

Figure 7 .
Figure 7. Derived Record in Connexion Interface