research unit 1

This site is powered by Aigaion - A PHP/Web based management system for shared and annotated bibliographies. For more information visit


Type of publication:Inproceedings
Entered by:chita
TitleGlobal Document Frequency Estimation in Peer-to-Peer Web Search
Bibtex cite IDRACTI-RU1-2006-15
Booktitle 9th International Workshop on the Web and Databases (WebDB 2006)
Year published 2006
Month June
Pages 62-67
Note in conjunction with ACM SIGMOD
Information retrieval (IR) in peer-to-peer (P2P) networks, where the corpus is spread across many loosely coupled peers, has recently gained importance. In contrast to IR systems on a centralized server or server farm, P2P IR faces the additional challenge of either being oblivious to global corpus statistics or having to compute the global measures from local statistics at the individual peers in an efficient, distributed manner. One specific measure of interest is the global document frequency for different terms, which would be very beneficial as term-specific weights in the scoring and ranking of merged search results that have been obtained from different peers. This paper presents an efficient solution for the problem of estimating global document frequencies in a large-scale P2P network with very high dynamics where peers can join and leave the network on short notice. In particular, the developed method takes into account the fact that the lo- cal document collections of autonomous peers may arbitrar- ily overlap, so that global counting needs to be duplicate- insensitive. The method is based on hash sketches as a technique for compact data synopses. Experimental stud- ies demonstrate the estimator?s accuracy, scalability, and ability to cope with high dynamics. Moreover, the benefit for ranking P2P search results is shown by experiments with real-world Web data and queries.
Michel, Sebastian
Bender, Matthias
Triantafillou, Peter
Weikum, Gerhard
webdb2006.pdf (main file)
Publication ID64