Abstract | Peer-to-Peer (P2P) search requires intelligent decisions for query routing: selecting the best peers to which a given query, initiated at some peer, should be forwarded for retrieving additional search results. These decisions are based on statistical summaries for each peer, which are usually organized on a per-keyword basis and managed in a distributed directory of routing indices. Such architectures disregard the possible correlations among keywords. Together with the coarse granularity of per-peer summaries, which are mandated for scalability, this limitation may lead to poor search result quality.
This paper develops and evaluates two solutions to this problem, sk-STAT based on single-key statistics only, and mk-STAT based on additional multi-key statistics. For both cases, hash sketch synopses are used to compactly represent a peer's data items and are efficiently disseminated in the P2P network to form a decentralized directory. Experimental studies with Gnutella and Web data demonstrate the viability and the trade-offs of the approaches. |