Abstract: Counting in general, and estimating the cardinality of (multi-) sets in particular, is highly desirable for a large variety of applications, representing a foundational block for the efficient deployment and access of emerging internet-scale information systems. Examples of such applications range from optimizing query access plans in internet-scale databases, to evaluating the significance (rank/score) of various data items in information retrieval applications. The key constraints that any acceptable solution must satisfy are: (i) efficiency: the number of nodes that need be contacted for counting purposes must be small in order to enjoy small latency and bandwidth requirements; (ii) scalability, seemingly contradicting the efficiency goal: arbitrarily large numbers of nodes nay need to add elements to a (multi-) set, which dictates the need for a highly distributed solution, avoiding server-based scalability, bottleneck, and availability problems; (iii) access and storage load balancing: counting and related overhead chores should be distributed fairly to the nodes of the network; (iv) accuracy: tunable, robust (in the presence of dynamics and failures) and highly accurate cardinality estimation; (v) simplicity and ease of integration: special, solution-specific indexing structures should be avoided. In this paper, first we contribute a highly-distributed, scalable, efficient, and accurate (multi-) set cardinality estimator. Subsequently, we show how to use our solution to build and maintain histograms, which have been a basic building block for query optimization for centralized databases, facilitating their porting into the realm of internet-scale data networks.
Abstract: We consider the problem of searching for a piece of
information in a fully interconnected computer network
(also called a complete network or clique) by exploiting
advice about its location from the network nodes. Each
node contains a database that ?knows? what kind of
documents or information are stored in other nodes
(e.g., a node could be a Web server that answers queries
about documents stored on the Web). The databases in
each node, when queried, provide a pointer that leads to
the node that contains the information. However, this
information is up-to-date (or correct) with some
bounded probability. While, in principle, one may always
locate the information by simply visiting the network
nodes in some prescribed ordering, this requires a time
complexity in the order of the number of nodes of the
network. In this paper, we provide algorithms for locating
an information node in the complete communication
network, which take advantage of advice given from
network nodes. The nodes may either give correct advice,
by pointing directly to the information node, or give
wrong advice, by pointing elsewhere. On the lowerbounds?
side, we show that no fixed-memory (i.e., with
memory independent of the network size) deterministic
algorithm may locate the information node in a constant
(independent of the network size) expected number of
steps. Moreover, if p (1/n) is the probability that a
node of an n-node clique gives correct advice, we show
that no algorithm may locate the information node in an
expected number of steps less than 1/p o(1). To study
how the expected number of steps is affected by the
amount of memory allowed to the algorithms, we give a
memoryless randomized algorithm with expected number
of steps 4/p o(1/p) o(1) and a 1-bit randomized
algorithm requiring on the average at most 2/p o(1)
steps. In addition, in the memoryless case, we also
prove a 4/p lower bound for the expected number of
steps in the case where the nodes giving faulty advice
may decide on the content of this advice in any possible
way and not merely at random (adversarial fault model).
Finally, for the case where faulty nodes behave randomly,
we give an optimal, unlimited memory deterministic
algorithm with expected number of steps bounded
from above by 1/p o(1/p) 1.
Abstract: We consider the problem of searching for a piece of information in a fully interconnected computer network or clique by exploiting
advice about its location from the network nodes Each node contains a
database that knows what kind of documents or information are stored
in other nodes e.g. a node could be a Web server that answers queries
about documents stored on the Web. The databases in each node when
queried provide a pointer that leads to the node that contains the information. However this information is up to date or correct with some
bounded probability. While in principle one may always locate the information by simply visiting the network nodes in some prescribed ordering
this requires a time complexity in the order of the number of nodes of the
network. In this paper we provide algorithms for locating an information node in the complete communication network that take advantage
of advice given from network nodes The nodes may either give correct
advice by pointing directly to the information node or give wrong advice
by pointing elsewhere We show that on the averageif the probability that a node provides correct advice is asymptotically larger than
where is the number of the computer nodes then the average time complexity for locating the information node is asymptotically or depending on the available memory.The probability may in general be a function of the number of network nodes . On the lower bounds
side we prove that noxed memory deterministic algorithm can locate
the information node in nite expected number of steps. We also prove
a lower bound of
for the expected number of steps of any algorithm
that locates the information node in the complete network.