research unit 1

This site is powered by Aigaion - A PHP/Web based management system for shared and annotated bibliographies. For more information visit


Type of publication:Inproceedings
Entered by:chita
TitleSelf-Join Size Estimation in P2P Data Systems
Bibtex cite IDRACTI-RU1-2008-1
Booktitle 24th International Conference on Data Engineering (ICDE 2008)
Year published 2008
Month April
Pages 764-773
Organization IEEE
Location Cancún, México
In this work we tackle the open problem of self-join size (SJS) estimation in a large-scale distributed data system, where tuples of a relation are distributed over data nodes which comprise an overlay network. Our contributions include adaptations of five well-known SJS estimation centralized techniques (coined sequential, cross-sampling, adaptive, bifocal, and sample-count) to the network environment and a novel technique which is based on the use of the Gini coefficient. We develop analyses showing how Gini estimations can lead to estimations of the underlying Zipfian or power-law value distributions. We further contribute distributed sampling algorithms that can estimate accurately and efficiently the Gini coefficient. Finally, we provide detailed experimental evidence testifying for the claimed increased accuracy, precision, and efficiency of the proposed SJS estimation method, compared to the other methods. The proposed approach is the only one to ensure high efficiency, precision, and accuracy regardless of the skew of the underlying data.
Pitoura, Theoni
Triantafillou, Peter
04497485.pdf (main file)
Publication ID48