research unit 1

This site is powered by Aigaion - A PHP/Web based management system for shared and annotated bibliographies. For more information visit


Type of publication:Inproceedings
Entered by:chita
TitleImproving Collection Selection with Overlap Awareness
Bibtex cite IDRACTI-RU1-2005-14
Booktitle the 28th International ACM SIGIR Conference
Year published 2005
Month August
Collection selection has been a research issue for years. Typically, in related work, precomputed statistics are employed in order to estimate the expected result quality of each collection, and subsequently the collections are ranked accordingly. Our thesis is that this simple approach is insufficient for several applications in which the collections typically overlap. This is the case, for example, for the collections built by autonomous peers crawling the web. We argue for the extension of existing quality measures using estimators of mutual overlap among collections and present experiments in which this combination outperforms CORI, a popular approach based on quality estimation. We outline our prototype implementation of a P2P web search engine, coined MINERVA1, that allows handling large amounts of data in a distributed and self-organizing manner. We conduct experiments which show that taking overlap into account during collection selection can drastically decrease the number of collections that have to be contacted in order to reach a satisfactory level of recall, which is a great step toward the feasibility of distributed web search.
Bender, Matthias
Michel, Sebastian
Triantafillou, Peter
Weikum, Gerhard
Zimmer, Christian
sigir05.pdf (main file)
Publication ID179