Abstract | This paper addresses the efficient processing of
top-k queries in wide-area distributed data
repositories where the index lists for the attribute
values (or text terms) of a query are distributed
across a number of data peers and the
computational costs include network latency,
bandwidth consumption, and local peer work.
We present KLEE, a novel algorithmic
framework for distributed top-k queries,
designed for high performance and flexibility.
KLEE makes a strong case for approximate top-k
algorithms over widely distributed data sources.
It shows how great gains in efficiency can be
enjoyed at low result-quality penalties. Further,
KLEE affords the query-initiating peer the
flexibility to trade-off result quality and expected
performance and to trade-off the number of
communication phases engaged during query
execution versus network bandwidth
performance. We have implemented KLEE and
related algorithms and conducted a
comprehensive performance evaluation. Our
evaluation employed real-world and synthetic
large, web-data collections, and query
benchmarks. Our experimental results show that
KLEE can achieve major performance gains in
terms of network bandwidth, query response
times, and much lighter peer loads, all with small
errors in result precision and other result-quality
measures |