Records Retrieved by Personal Author Using Derived Search Keys

Previous papers in this series and experience at the Ohio College Library Center have established that truncated derived search keys are efficient for retrieval of entries by name-title and title from large on~line computerized files of catalog records.4 Experiments reported in the earlier papers were " ... based on the assumption that each key had a probable use equal to all other keys."5 However, Guthrie and Slifko have shown that random selection of entries, rather than keys, yields results closer to actual experience but with a higher number of entries per reply.6 For example, they found on retrieving from a file of 857,725 records using a 4, 5 (four characters of main entry, five characters of title) key tl1at when the basis of the search was random keys there was one entry per reply 81.3 percent of the time, but when the basis was random records, there was one entry per reply 55.7 percent of the time. This paper presents the results of experimentation with search keys to be used in constructing an author index to a large file of on-line catalog records. An interactive environment is assumed, with the interrogator employing a remote terminal. A companion paper de:;etibes the findings of an investigation into retrieval efficiency of search keys derived from corporate author names.7

or fewer names when a variety of search key combinations were employed ranging from three to six characters from the surname, zero to three characters from the first name, and with or without the middle initial.Table 1 is an extraction from Figure l and contains the number of names retrieved at a level of 90 percent likelihood for the various search keys employed.Figure 2 has the same structure as Figure 1 but contains the degree of distinctness as percentages, ( no. of distinct keys) 100 no. of entries x percent.Table 2 records distinctness arranged by number of characters per key. Figure 3 is a graphical representation of the degrees of distinctness of the various keys.In this figure, different types of lines connect points representing key structures that contain an equal number of characters.
The bottom line in Table l may be read as saying that 90 percent of the time a 4,2,1 key will retrieve five or fewer names from a file of 167,745 personal name keys.The bottom line of Table 2 states that from the same file the 4,2,1 key.yields a single name 64.1 percent of the time.

DISCUSSION,
This experiment has shown the degree of distinctness-that is to say, the number of distinct keys divided by the total number of entries from which all keys were derived-to be a useful tool in determining what key structures may be efficiently used.As seen by comparing Figure 1 with Figure 2 and Table 1 with Table 2, there is a high degree of correlation between distinctness aJ}d the likelihood of retrieving a certain number of names 90,  99, or 99.5 percent of the time.Thus, the investigator can eliminate many un~esirable key structures on the merits of distinctness alone and pool his remaining resources toward studying in detail other structures .. 'When the 8,7,1 key was tested, it yielded a uniqueness percentage of 68.8 that represents the upper limit of uniqueness in this experiment.From Table 2 it is apparent that the bottom three keys yield a percentage of uniqueness near the upper limit.
Table 2 shows a distinct jump in percentage of uniqueness between the n,O and n,l key structures.Another sharp increase occurs between n,m and n,rn,l structures.Each section of the key is derived from a Markov string, and it appears from the discontinuities between sections that the parts of personal names are not highly correlated.
As pointed out in previous papers, a key structure that possesses a rela-tively high degree of distinctness also yields a small percentage of replies containing many entries.For the name-only search key, this effect could be reduced by performing the retrieval in two steps when necess~ry.First, the full names for each author whose name matcl1es the entered search key would be displayed; names appearing with more than one work would be displayed only once.Next, the retriever would choose the name desired and request all of the titles associated with it.However, some title displays could be excessive-William Shakespeare's name appears with more than 500 works.A paper currently in preparation at OCLC describes an algorithm whose interactive use resolves this type of search problerri. 8NCLUSION This investigation has yielded findings showing that there are several truncated search keys derived from personal names that ate sufficiently specific to perform efficiently as an author index to a file of 161,745 personal names, thereby providing an on-line index that will make it.possible for a terminal user to obtain a listing of all titles by a given author: in an on-line catalog.

Fig. 3 .
Fig. 3. Degree of Distinctness.Lines Connect Points Whose Key Structures Have an Equal Number of Characters

OF CHARACTERS EXTRACTED FROM THE SURNAME 3 4 5 6
j: II 0 " ....... j: 2 .J: .....It "i ~ ~ II 3 J::..... " ~ NO. 31 Fig. 1.Number of Names Retrieved 90, 99, and 99.5 Percent of the Titne for Different Key Structuresacters than the key segment to be derived, the segment was left-justified and padded out with blanks.If there was no middle name or middle initial, a blank was used.RESULTSFigure1presents the findings at three levels of likelihood for retrieving n

Table 1 .
Number of Names Retrieved With 90 Percent Likelihood

Table 2 .
Distinctness by Number of Characters Per Key