High-Performance Annotation Tagging over Solr Full-text Indexes

Authors

  • Paolo Manghi National Research Council Institute of Information Science and Technologies
  • Michele Artini National Research Council Institute of Information Science and Technologies
  • Alessia Bardi National Research Council Institute of Information Science and Technologies
  • Claudio Atzori National Research Council Institute of Information Science and Technologies
  • Sandro La Bruzzo National Research Council Institute of Information Science and Technologies
  • Marko Mikulicic National Research Council Institute of Information Science and Technologies

DOI:

https://doi.org/10.6017/ital.v33i3.4633

Abstract

In this work, we focus on the problem of “annotation tagging” over Information Spaces of objects stored in a full-text index. In such a scenario, tags are assigned to objects by “data curator” users with the purpose of classification, while generic end-users will perceive tags as searchable and browsable object properties. To carry out their activities, data curators need “annotation tagging tools” which allow them to “bulk” tag or untag large sets of objects in temporary work sessions, where they can “virtually” and in “real-time” experiment the effect of their actions before making the changes visible to end-users. The implementation of these tools over full-text indexes is a challenge, since bulk object updates in this context are far from being real-time and in critical cases may slow down index performance. We devised TagTick, a tool which offers to data curators a fully functional annotation tagging environment over the full-text index Apache Solr, regarded as a “de-facto standard” in this area. TagTick consists of a TagTick Virtualizer module, which extends the APIs of Solr to support real-time, virtual, bulk-tagging operations, and a TagTick User Interface module, which offers end-user functionalities for annotation tagging. The tool scales optimally with the number and size of bulk tag operations, without compromising index performance.

Author Biography

Paolo Manghi, National Research Council Institute of Information Science and Technologies

Researcher at ISTI-CNR

References

Zubiaga, A., K¨orner, C., Strohmaier, M.: Tags vs shelves: from social tagging to social classification. In: Proceedings of the 22nd ACM conference on Hypertext and hypermedia. HT ’11, New York, NY, USA, ACM (2011) 93–102

Wang, M., Ni, B., Hua, X.S., Chua, T.S.: Assistive tagging: A survey of multimedia tagging with human-computer joint exploration. ACM Comput. Surv. 44(4) (September 2012) 25:1–25:24

Chen, L., Xu, D., Tsang, I., Luo, J.: Tag-based web photo retrieval mproved by batch mode re-tagging. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. (june 2010) 3440 –3446

Quintarelli, E., Resmini, A., Rosati, L.: Information architecture: Facetag: Integrating bottomup and top-down classification in a social tagging system. Bulletin of the American Society for Information Science and Technology 33(5) (2007) 10–15

Christiaens, S.: Metadata mechanisms: From ontology to folksonomy ... and back. In: Lecture Notes in Computer Science: On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops, Springer (2006)

Mahoui, M., Boston-Clay, C., Stein, R., Tirupattur, N.: Collaborative tagging of art digital libraries: Who should be tagging? In Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F., eds.: Theory and Practice of Digital Libraries. Volume 7489 of Lecture Notes in Computer Science. Springer Berlin Heidelberg (2012) 162–172

Passant, A.: Laublet p.: Meaning of a tag: A collaborative approach to bridge the gap between tagging and linked data. In: Proceedings of the Linked Data on the Web (LDOW2008) workshop at WWW2008. (2008)

Khoo, M., Tudhope, D., Binding, C., Abels, E., Lin, X., Massam, D.: Towards digital repository interoperability: The document indexing and semantic tagging interface for libraries (distil). In Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F., eds.: Theory and Practice of Digital Libraries. Volume 7489 of Lecture Notes in Computer Science. Springer Berlin Heidelberg (2012) 439–444

Trant, J.: Studying social tagging and folksonomy: A review and framework. Journal of Digital Information (January 2009)

Marlow, C., Naaman, M., Boyd, D., Davis, M.: Ht06, tagging paper, taxonomy, flickr, academic article, to read. In: Proceedings of the seventeenth conference on Hypertext and hypermedia. HYPERTEXT ’06, New York, NY, USA, ACM (2006) 31–40

Civan, A., Jones,W., Klasnja, P., Bruce, H.: Better to organize personal information by folders or by tags?: The devil is in the details. Proceedings of the American Society for Information Science and Technology 45(1) (2008) 1–

Lykke, M., Høj, A., Madsen, L., Golub, K., Tudhope, D. In: Tagging behaviour with support from controlled vocabulary. Emerald Group Publishing Limited (2012) 41–50 2012; 4.

Schreiber, G., Amin, A., Aroyo, L., van Assem, M., de Boer, V., Hardman, L., Hildebrand, M., Omelayenko, B., van Osenbruggen, J., Tordai, A.,Wielemaker, J.,Wielinga, B.: Semantic annotation and search of cultural-heritage collections: The multimedian e-culture demonstrator. Web Semantics: Science, Services and Agents on the World Wide Web 6(4) (2008) 243 – 249 Semantic Web Challenge 2006/2007.

Maynard, D., Greenwood, M.: Large scale semantic annotation, indexing and search at the national archives. In: Proceedings of LREC. Volume 12. (2012)

Bardi A., Manghi P., Zoppi F. Aggregative Data Infrastructures for the Cultural Heritage. In: MTSR 2012 - 6th Metadata and Semantics Research Conference (C�diz, Spain, November 28-30 2012). Proceedings, pp. 239 - 251. Dodero, JuanManuel and Palomo-Duarte, Manuel and Karampiperis, Pythagoras (eds.). (Communications in Computer and Information Science, vol. 343). Springer, 2012.

Manghi, P., Bolikowski, L., Manola, N., Shirrwagen, J., Smith, T.: Openaireplus: the European scholarly communication data infrastructure. D-Lib Magazine 18(9-10) (September October 2012)

Antonopoulos, P., Konstantinou, I., Tsoumakos, D., Koziris, N.: Efficient updates for webscale indexes over the cloud. In: Data Engineering Workshops (ICDEW), 2012 IEEE 28th International Conference on. (april 2012) 135 –142

Chen, C., Li, F., Ooi, B.C.,Wu, S.: Ti: an efficient indexing mechanism for real-time search on tweets. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. SIGMOD ’11, New York, NY, USA, ACM (2011) 649–660

Kuc, R.: Apache Solr 4 Cookbook. Packt Publishing, Limited (2013)

Smiley, D., Pugh, E.: Apache Solr 3 Enterprise Search Serve. Packt Publishing, Limited (2011)

Downloads

Published

2014-09-25

How to Cite

Manghi, P., Artini, M., Bardi, A., Atzori, C., La Bruzzo, S., & Mikulicic, M. (2014). High-Performance Annotation Tagging over Solr Full-text Indexes. Information Technology and Libraries, 33(3), 22–44. https://doi.org/10.6017/ital.v33i3.4633

Issue

Section

Articles