High-Performance Annotation Tagging over Solr Full-text Indexes

Paolo Manghi, Michele Artini, Alessia Bardi, Claudio Atzori, Sandro La Bruzzo, Marko Mikulicic


In this work, we focus on the problem of “annotation tagging” over Information Spaces of objects stored in a full-text index. In such a scenario, tags are assigned to objects by “data curator” users with the purpose of classification, while generic end-users will perceive tags as searchable and browsable object properties. To carry out their activities, data curators need “annotation tagging tools” which allow them to “bulk” tag or untag large sets of objects in temporary work sessions, where they can “virtually” and in “real-time” experiment the effect of their actions before making the changes visible to end-users. The implementation of these tools over full-text indexes is a challenge, since bulk object updates in this context are far from being real-time and in critical cases may slow down index performance. We devised TagTick, a tool which offers to data curators a fully functional annotation tagging environment over the full-text index Apache Solr, regarded as a “de-facto standard” in this area. TagTick consists of a TagTick Virtualizer module, which extends the APIs of Solr to support real-time, virtual, bulk-tagging operations, and a TagTick User Interface module, which offers end-user functionalities for annotation tagging. The tool scales optimally with the number and size of bulk tag operations, without compromising index performance.

Full Text:



