The Minnesota Union List of Serials

This paper describes development of a MARC serials format union catalog of serials caUed the Minnesota Union List of Serials ( MULS). The Preliminary Edition, published August 1972, contains over 37,000 main entries in 1,566 text pages produced through photocomposition in News Gothic typefont using the full MARC character set. The total number of entries is over 59,000, including cross-references. Conceptualization and scope of the system as well as its design, data conversion, computer and programming support, photocomposition, costs, and problems are discussed.


INTRODUCTION
This paper has been prepared to inform the profession of the development of the Minnesota Union List of Serials (MULS), the data base of which represents significant differences from those of previously reported union lists.As one can see from Figure 1, a MULS Preliminary Edition sample page, MULS is a full bibliographic union serials catalog.It uses the MARC serials format as its formal structure.The Preliminary Edition contained 37,289 University of Minnesota serial titles.The file now includes holdings of the Minneapolis Public Library, eight private colleges, and ten Minnesota State Agency libraries including the Minnesota Historical Society.Augmentation of the data is continuing so that all Minnesota academic, large public, and selected special libraries will be included in the coming two years.
Several years ago, the University of Minnesota began investigating the development of its own unified serials catalog as a first stage in the development of an automated serials management system.At that time many libraries in the state had developed their own serials lists, and regional consortia had created lists of their members' holdings.These resources, coupled with networking through the MINITEX (Minnesota Inter-Library Teletype Exchange) program, made possible the MULS initial development.The MINITEX program links together seventy-two libraries via teletype to the University of Minnesota for rapid interchange of library materials.State supported academic institutions, public libraries, and private colleges in the local metropolitan area also participate in this program.
In spring 1971, it became apparent that a union list had become a necessity if the expanding MINITEX program and the university were to  MnU Wll 1962. 196$.1968.1970-310.58qfSA4MnU R Current votume oruy.
France 1llus1ratton ll!ter,we et thea ttale Sftfr;snce •llustrat!On Franceln,l•tut natfontfci'etudes dernocraphtqw1 .; and • some library automation activities among these libraries, with the largest automation staff and activity at the University of Minnesota.The parallel developments of networking and systems design at the university made possible the proposal to the MINITEX Program Advisory Board for funds to develop the system and publish the first union list.In summer 1971 this program received approval and work was begun in mid-August.On September 1, 1972, the Preliminary Edition of MULS was published and distributed to participating university and MINITEX network members.Following is a report of this work, its results and problems.

PROGRAM SCOPE
Obviously, to create a system capable of eventually including library holdings state-wide and to convert such data requires definition of an initial and future scope.The initial scope was defined as: At the moment of this writing we have the initial scope completed, are just completing a, b, and c, and have planned work on d and e for 1973.
In view of this scope the initial MULS magnetic tape system was based on the MARC format to permit: • publication of a photocomposed or line-printer-method full union list; • publication of regional combination or individual library lists using an IBM 1403 line printer equipped with the ALA graphic print train; • storage of complete and verified information on each serial as known, together with the source of the cataloging data; • extraction of the data via individual libraries to assist those wishing to develop automated serials management systems including check• in, claiming, binding, etc.; • conversion of the file to other storage media such as disk; • fulfillment of the smallest to the largest libraries' needs for biblio• .graphic detail; and • extension to a fully automated resource sharing system which would further improve the benefits of library cooperation.
With this picture of the program scope, the design factors, data conversion, computer system, programs, photocomposition, costs, and problems will be described below.

SYSTEM DESIGN
The easiest way to look at the MULS design is to gain an understanding of the MULS MARC Record content as shown in Table 1.This record is the basic unit which is entered, including all associated cross-references or added entries to be made.It in tum generates each of these secondary entries in the file.In this brief description we will assume the reader is familiar with the MARC serials record as described in Serials: A MARC Format: Preliminary Edition and its Addendum No. 1. 1 • 2 There are some differences between the MULS format and the LC MARC format, most importantly the addition of a sort field (Tag 249) and the subfield arrange• ment for holding fields (Tag 850).Other variations have been indicated in Table 1, which uses the same organization as that contained in the LC format description referred to above.
Figure 2 shows a page from a master•file listing.Note entry no.2074000.This listing is formatted with the sequence number of the record appearing on the first line, followed by the bibliographic level and the remaining leader information.Next the record directory entries are found for fields 008-950 as applicable.On the next line are the 008 fixed length data ele• Indicators In general we have not followed LC in the use of indicators.One exception is the use of filing indicator for the 100 and 200 series tags, which we implemented before seeing that this feature was provided in the Addendum No.I to the LC format.
Therefore, the indicators except as above are both blank.

Subfield codes
Except for the holdings statements (TAG 850) we have generally followed LC philosophy.For TAG 850 we now precede the $a sub:Geld with a $z sub6eld, suppressed on printing, which contains the 4 digit number identifying each specific holding library which is also found at the end of the 008 field.We have followed LC numbering for the above data elements, and have substituted blanks on the tape record for those elements omitted.We have also expanded the 008 field to include a variable number of 4 character elements which contain the index number of each holdings location listed in the z subfield of TAG 850.o:.a p,.,.,,..,J 1~ .. "rubt•"'• I>Ar" L't!"st!tut p"'ed'aMolllque natlonale.ull:ia i-r"&nc~: .. l11stitut p edaiiOMiqua nationale.a:,1965,19b5, uno-  ments with the last four digits the holdings location index number which is the same as the suppressed $z subfield in the 850 field.Then the variable fields are listed in numeric sequence.Note the subfields as indicated by $z, $b, etc.The number to the left of each $a is the MARC tag number.Another departure from MARC is to store the call number as a subfield of the holdings statement since it may vary among participating libraries.
To contrast how the information is stored and how it appears when published, the same record is shown in the left column of Figure 1.Also, the next record shown is generated from an added entry TAG 730 in this parent record.We have prepared a detailed coding manual which is followed by our coders; this document presents various examples of conditions and details the full system structural requirements.
These changes in the format were made to simplify wherever possible, to provide for conditions which the original LC format did not cover, and to preserve the MARC structure with full text.With the exception of subject headings, all bibliographic text is stored.Other MARC tags may be added to the system at any time.
The initial system was tape-based, as our computer system at that time did not have uncommitted disk drives.Also, we needed to gain some detailed knowledge of the file and record characteristics to most effectively design the disk-based system.This knowledge could be gained easily after some basic data were stored in the system.Since programmer time was our most precious commodity, this phased approach was used to: ( 1) achieve enough support on the tape system to permit publication of the Preliminary Edition of MULS while gathering file and data characteristics; and (2) bring into operation a disk-based system with completely automatic addedentry correction and generation, coupled with very flexible correction procedures.

DATA CONVERSION
Various methods of data conversion were investigated.Two requirements seemed obvious in our system-compilation of data on a code sheet and efficient, accurate keyboarding.Further, since the MARC character set was being used, any potential device had to provide a minimal keying situation to accommodate this character set.Compilation of data on a code sheet was necessary because multiple files in multiple locations would be checked to gather all of the information.Keyboarding had to be efficient as it was initially estimated that some 25 million characters would be entered before we were ready to publish the union list.
The IBM Model V Record Only Magnetic Tape Selectric typewriter ( MT jST) was chosen as offering the best approach for high volume, short duration use.Three machines, each equipped with the special MARC element and key buttons, were leased.Typists easily corrected their discovered errors on these units.Each typist followed detailed typing instructions and, after mastering the coding manual practices and procedures, was a trained coder.
During July I August 1971 all training aids were prepared, forms designed, and staff recruited.The initial staff complement received their training during the last two weeks of August.During September the data gathering staff was brought to full strength and consisted of: Project Director Editors (librarians-library assistants) Senior Clerk-Typists Clerks (students) Full-time equivalents are used as staff were in many cases part time or temporarily lent to the project.During the period August 1971-June 15, 1972, which comprised the total data preparation time for the Preliminary Edition, five librarians and thirty-five students actually were trained and participated in the project.
It took about six weeks to bring most of the staff to an acceptable performance level.Some students found the work too complex or detailed and voluntarily left the project.One clerk-typist did not gain sufficient proficiency to pass out of a trainee status and was terminated at the end of her probation period.Thereafter, with a staff of this size, performance problems were minimal.
The data to be included in the Preliminary Edition comprised the university's • currently received, centrally recorded serials ( 20,000 titles); • inactive Periodical Division titles ( 8,000 titles); • coordinate campus locations of the university ( 4,000 titles); • complete departmental library titles excluding the Bio-Medical Library ( 6,000 titles).
The Bio-Medical Library was excluded due to its present mechanized serials system which would be used to produce a separate serials list, issued as volume 3 of MULS to the university and the MINITEX participating libraries.This separate publication was necessary due to the short time in which the initial data were to be collected.However, the Bio-Medical Library is now also being included in the body of the MULS data base.
These four categories of serials necessitated quite different approaches dependent upon the available check-in files, shelflists, or catalogs.For example: to capture data on the currently received, centrally recorded titles we photocopied the Kardex drawers from the serial check-in file maintained in our headquarters library.These running titles were checked against the official card catalog in the library.If the title was found, the bibliographic infom1ation was transcribed, together with all Kardex and catalog locations.If not, the Kardex data were copied onto a code sheet for subsequent verification together with its listed location.About 5 percent of the time the photocopied sheet was illegible.These entries had to be transcribed from the check-in file, verified, and then passed on to the next step.When bibliographic data had been assembled on the code sheets they were edited in groups, each group accompanied by its photocopied sheet.Corrections were entered by editors, the catalog or check-in file was rechecked as necessary, and then the sheets were sorted by holding location.
Next all holdings information was procured from the remote location to make sure it was the most reliable information.Finally, the sheets were returned to be rechecked and typed."Mopping up" occurred at each holding location to encode inactive titles and uncataloged serials.When a title could not be verified, the piece itself was used to develop the main entry, added entries, and other pertinent cataloging information.
Similar procedures were used on the inactive Periodical Division shelflist.Departmental library locations involved the use of shelf-locator visible indexes and shelflists, coupled with check-in files and branch catalogs.Coordinate campus locations outside the Twin Cities metropolitan area required the checking of title/holdings listings provided by these campus libraries.Many entry problems resulted, because variant cataloging approaches were used in many of these libraries.
Typing and subsequent input were done as coding sheets became ready for keyboarding and were therefore in random order.Over 40,000 individual records were typed, each averaging about 480 characters (an approximate the complete file was proofread from the thirteen volume master-file listing, another 5 million keystrokes were required to delete, to reenter, and to correct entries and associated cross-references.Our final keyboarding stroke count was exceedingly close to our original estimate of 25 million characters.
The proofreading portion of the data conversion took twice as long as originally anticipated, causing a delay of two months in photocomposition scheduling.Proofreading was completed on June 15, 1972, and on the following Monday the photocomposition vendor received the final output tape.Due to some format changes and continued systems problems the photocomposition output was not received until July 21.Printing and binding followed and on August 28 the Preliminary Edition, consisting of 1,566 text pages in two class A bound volumes, was ready for distribution.

COMPUTER SYSTEM
Two computer systems were used in MULS production.One system was used to convert MT /ST cassette tapes and involved initially an IBM 2495 cassette converter coupled to an IBM 360/ 20 system.This configuration was replaced by off-line tape conversion using a Data Action Tape Pooler and the same computer for code conversion and record blocking.Twohour to one-day service was provided by this service center, located in a local insurance company.
The raw data tape resulting from the above process then required processing on the second computer system, an IBM 360/50 at the University of Minnesota.All programs are written for the COBOL F compiler and operate under OS/ MFT using 1600 cpi magnetic tape.Two 80K core partitions are required for the updating and printing programs.The ALA graphic print train is used to print the file and control listings.Figure 2 was printed with this character set.

PROGRAMS
MULS programs for the present tape system were conceived as two sets: ( 1) conversion, file creation, and updating; and ( 2) printing functions.The first set performs the following functions: • identification and checking of fields for validity, tagging, and structure from the raw input tape; • creation of MARC-type main entries; • creation of secondary entries generated from the added entry (TAG 730) and cross-reference (TAG 950) fields; • creation of correction and deletion entries; • sorting of main entries and the generated secondary entries in alphabetical sequence; • sorting of correction and deletion entries in sequence number order; • addition of new records;  However, any change in a !00, 200, 730, or 950 tag requires deletion of the complete record with its secondary entries, and reentry of the record in its changed form.This is because a two-pass update would be required in the tape system to automatically colTect secondary entries as well as to generate them.
The second set of programs perfom1s the following functions: • printing of a formatted work list selectively by location or combination of locations, diacritical printing preceding the character to which it applies; and • printing of a conventional union list format which closely duplicates the design of the photocomposed page in Figure 1.Selectivity by location or groups of locations is present and all diacritical characters are overprinted as in the photocomposed list.

PHOTOCOMPOSITION
The Preliminary Edition of MULS, as shown in Figure 1, was photocomposed by a Twin Cities firm using a Harris Fototronic CRT composition system and an IBM 370/ 145 computer system.We chose the lowest bidder which was fortunately a local firm.The bid required the vendor to program from our MARC format master file tape an input tape for the photocomposer which would produce the specified format, using the MARC character set in a font to be chosen from sample text pages.The vendor's bid included programming, composing, and procurement of several of the characters used by MARC which were not in his current font repertoire.A test tape was provided to the vendor for his developmental use, together with documentation on the MARC MULS system.
After seeing the initial result of our specified format we were not pleased with the result.The reason for this was compounded by the fact that: • the vendor had not followed some of the suggestions; • the vendor had made some unspecified changes; • the program had injected some data errors and other unacceptable conditions; and • •the library, in its total lack of experience with this variable density form of display, had no idea of the real effect of its proposed format in getting efficient character density coupled with attractiveness.
Each of the design problems was looked at in order to adjust character size, column length or width, continuation line placement, display form (bold, regular, oblique), and relative data element placement.Four iterations were required to finally produce the format shown in Figure 1.As a result, our photocomposition and printing costs were half the costs had the original format been developed.Style and readability also improved dramatically.
The choice of type font was made by comparing sample pages in both serif and sans serif styles, including Times Roman and other well-known fonts.Various library staff members were asked to vote on their preferred font.News Gothic was an overwhehning favorite by both public and technical services oriented librarians.
The photocomposition vendor had produced many catalogs and books using other special alphabets and characters, but had not previously done any catalog from a MARC format tape.This made possible a high degree of expertise on their part in handling our special character requirements, but added some developmental problems because of lack of MARC format experience.Except for superscripts, subscripts, and the underline, all MARC characters have been needed to display the text.
Our advice to those considering catalog photocomposition is to request bids, as the price on this service has continued to drop.The page price will be dependent upon the services perfom1ed.In our case the vendor handled all composition programming.One can estimate that at a minimum 40 percent-50 percent of the page charge would be involved in this service.Also, the size of the job will cause a variance in the price a vendor will quote-the larger the number of pages, the cheaper the cost per page.On a very large application it may be to the library's advantage, if resources permit, to train their own programmer to program the composing device.However, we feel that our best needs were served b y contracting for this support as our programming staff was limited and did not have any prior composing-machine experience.

COSTS
The expenditure to produce a computer-based serials catalog will vary dependent upon salary and equipment rates and the conditions found in the library system.In the case of MULS, condition of the files used ranged from disastrous to excellent, yet with only fragmentary information in each file.Moreover, entry forms varied greatly among the many check-in, shelflist, and catalog files.Therefore, data collection was much more expensive than it would have been had we keyboarded directly from one existing file of data.To present some idea of costs for others planning similar activities, we have developed some average costing information from our expenditures.
Each main entry in MULS costs $2.81 on an average, figuring all known actual charges or subsidized costs.This main entry cost includes all associated secondary entries, which is about one secondary entry generated per ll, 178 Journal of Library Automation Vol.6/3 September 1973 1.5 main entries.This $2.81 breaks down to approximately $1.00 for design, programming, and administrative costs; $1.40 for data conversion; and $.41 for photocomposition, final printing and binding.
Let us look at some specific items which figure into this average cost per record to give the reader some idea of what is reasonable to expect in a project of this sort.A good example is conversion of MT/ST cassette tapes to computer compatible magnetic tape, including code conversion and blocking of the records.Our per-cassette conversion cost varied from $.50 to •$2.00 per cassette.This variance was caused by a change from online to off-line conversion and the problem of handling cassette tapes which did not have the proper stop code at their end.Our actual billed average throughout the whole project was $. 73 per cassette.If no tapes had been prepared omitting stop codes and if total off-line conversion had been used, our average would have been $.50 per cassette.A typical cassette tape averaged seventy-five new MARC entries, so this was a very economical charge for this method.
Another specific cost to examine is computer time.On our IBM 360/50 system, time is billed as time on/off the system and not according to some calculation of CPU /channel/storage/peripheral device usage.Normally an internal university rate is a great deal cheaper than a commercial rate for the same equipment.However, the billing method used in our system has probably increased our costs for computer time over the CPU time method of billing, since the user is at the mercy of contending with other jobs on the system at the same time; i.e., waiting for his processing turn.This has had a noticeable effect in our case; run times to update the file have varied from four to six hours machine onjoff time almost independent of the number of transactions being processed.
Photocomposition page rates over the last few years have been dropping as competition in this area has flourished.Two years ago it was common to receive quotes of $6.00 per page or even higher.Most prices we received were under this figure; but at the time our contract was • signed, our successful bidder, who also was our lowest bidder, quoted $2.60 per page.This included full programming support to convert our MARC format tape for creation of the photocomposer input tape.Today rates much lower than this can be found.Moreover, rates under $1.00 per page can be obtained if the customer is able to create his own input or driver tape for the photocomposition device, making this method considerably more attractive for even low volume per-page printing.In the case of MULS, one photocomposed page equals ten double column computer printed pages without photoreduction.Photoreduction can cut computer output pages about one-third, yet obviously not to the limit achieved through the photocomposition method.Therefore, considerable printing costs can be saved dependent upon the number of copies of each page printed.

PROBLEMS
The problems encountered during this project and its daily operation presently have been, for the most part, those commonly found in any large scale project.The large volume of data, less than ideal computer environment, condition of the original data, and large staff required to produce this effort all magnify many problems which seem unimportant in a small or short term project.In general these problems fall into the following categories: ( 1) data handling and bibliographic; ( 2) communications; ( 3) estimating; and ( 4) hardware or computer related problems.
Data Handling and Bibliographic.Those who create and use research library catalogs can appreciate the formidable physical problem in any data conversion activity.A half century or more of cataloging variations must be brought together; mistakes in the original data, differences in format of cards, and spelling or usage inconsistencies must be weeded.Couple this situation with a new staff, large in number but containing few professionals.The result could be disastrous if proper decision-making and problem identification did not occur.Not knowing the magnitude of these problems we decided on almost verbatim transcription of records but spelling out all abbreviated words in any filing field.
When our first file listing appeared-some 40,000 main entries plus 30,000 secondary entries-we saw that the filing arrangement was very poor due mainly to spelling variants, failure to consistently follow instructions to spell out abbreviated terms (which somehow escaped editing), and different entry forms for the same body.Transcription of data from the original source was very accurate but because of these problems in the original data our proofreading resulted in some change occurring in about 10,000 of these 70,000 records.The use of punctuation marks in main entries varied so much that some corporate entries were filing in five or six separate groups in the list, each separated perhaps by several pages.
The great shocker was the arrangement under the United States, as some coders had copied exactly from the card without spelling out U.S. and inserting a period and space.About a dozen entries had failed to be caught by the editors and appeared as one block.Then, to compound the problem, others spelled out United States but forgot to insert a period after it.Moreover, very early in the project the typists incorrectly inserted 2 spaces after the period.In all, there were six forms to the U.S. entries alone, with only one being correct.
This lesson taught us that no matter how well instructions and examples are prepared misunderstanding can result; and, of course, editors and others will not catch all possible errors.However, these major errors were eliminated before publication.With the large volume of data and limited funds our conversion process was quite streamlined with most of the errorchecking resulting after the data were on tape and displayed in their proper relation to other records.Few keyboarding errors occurred which were not caught at typing.The predominant errors resided in the nature of the original data, or in the lack of some piece of information from three or four different files which may have been checked in building the full record.
Communications.In any large project effective communication is necessary to improve quality of work and progress toward completion of the scheduled task.Frequently scheduled meetings of the staff were used to inform all project members of decisions, receive their suggestions and criticisms, and develop coordinated work assignments among the teams of each editor /librarian.All typing personnel were trained as coders and were periodically relieved of typing to code.This gave them an insight into detecting problems for referral to the professional staff, renewed their knowledge of proper format, and provided more variety in their work.All project members were capable of performing tasks of coding,.control list checking, and proofreading.The most capable clerical staff also assisted the editors in editorial work.It was felt that our use of the team approach, unified training, frequent staff meetings, and very detailed written documentation served to channel communication with a resultant minimization of these problems-once the first few months of the project had passed.
Estimating.In most data conversion work accurate estimating is required on many matters.Some estimates we made were very accurate, such as basic time and staff to complete initial coding, typing time and staff, and supplies needed.However, other estimates were not very accurate.For example, the time to edit and correct the file once basic data collection was completed was double our original estimate and required more typing than anticipated.This caused the publication schedule to be delayed two months.Difficulties at the computer center and at the photocomposition vendor caused another two months delay, even though it is doubtful that our photocomposition firm would have been ready had we met our original estimate.Our original target was publication not later than two months after the basic data collection period of six months, i.e., in eight months.However, on a project of this size, and with the addition of about 7,000 more titles than we had originally estimated, we did not feel that the fiftyfour weeks really required was excessive.
Computer time was also difficult to estimate because of the time on/ off the system.Dependent upon the nature of the other jobs on the computer, this time varied greatly, for updating runs were almost independent of the number of transactions.There is always room for improvement in estimating, and, obviously, we have learned many things from this experience to use in further work.
Hardware/Computer Center.Our largest problem was creating firm computer scheduling commitments on our campus IBM 360/50 computer, which serves the business functions of the university.All other campus computing facilities use Control Data equipment which is six-bit character, word oriented.With the extended character set requirement and the availability of the IBM 360/ 50, which we were already using for other work with the ALA graphic print train, it was natural for us to choose this system.Current facilities are now satisfactory to permit our tape batch system operation and the development of our new disk-based batch system.Tape pooling operations for the MT / ST have caused some problems due to equipment changes at our vendor.We have now switched to a new conversion source as our former vendor upgraded his data entry system to keyto-disk.The three MT/ ST typewriters we leased pedormed quite reliably, but one machine seemed to have more down time than the others.Now that our typing load is down 1 we have cancelled two Model V s and will maintain two machines.We are now choosing a new system for key input to cassette tape.On the new equipment we will do our proofreading and initial correction off-line resulting in a further cost saving.This was not possible previously as our typing load required two-shift operation on all machines during the Preliminary Edition preparation time.

CONCLUSION
A great amount of effort has been expended to achieve a unified serials data base to serve Minnesota's libraries.It is our hope that this system can continue to be developed in as flexible a way as possible so that future needs can be supported through the system.Only the imagination of those involved in networking is the limit to identifying the future needs to be met through access to this data base.Of course, we would hope that one day our data could benefit the development of other similar programs in other states and, perhaps more importantly, in achieving a true national serials data base.

ACKNOWLEDGMENT
Many staff members at the university and other institutions contributed their invaluable counsel as we h~ve proceeded on the development of the system and the data base.The MULS project staff particularly receives our deep gratitude for its yeoman effort.Special commendation is due Mr. Don Norris for systems design and principal programming support.Mr. Carl Sandberg, who wrote all printer output programs, also contributed invaluable assistance to the project.The MINITEX program and University Library administration receive our appreciation for placing their confidence .in the Systems Division.MULS and its support system is truly a product resulting from the coordinated concern and interest of the aforementioned individuals and groups.

176
Journal of Library Automation Vol.6/ 3 September 1973 • deletion of an old record; • addition of a new variable field, including holdings statements; • substitution of data in a variable field; • deletion of a variable field; • production of a transaction file reflecting changes to the data base; and • generation of a new master tape, which can include resequencing the entire file andjor producing a work list of the file.