Abstract: We argue the case for a new paradigm for architecting structured P2P overlay networks, coined AESOP. AESOP consists of 3 layers: (i) an architecture, PLANES, that ensures significant performance speedups, assuming knowledge of altruistic peers; (ii) an accounting/auditing layer, AltSeAl, that identifies and validates altruistic peers; and (iii) SeAledPLANES, a layer that facilitates the coordination/collaboration of the previous two components. We briefly present these components along with experimental and analytical data of the promised significant performance gains and the related overhead. In light of these very encouraging results, we put this three-layer architecture paradigm forth as the way to structure the P2P overlay networks of the future.
Abstract: We study the fundamental naming and counting problems in networks that are anonymous, unknown, and possibly dynamic. Network dynamicity is modeled by the 1-interval connectivity model [KLO10]. We first prove that on static networks with broadcast counting is impossible to solve without a leader and that naming is impossible to solve even with a leader and even if nodes know n. These impossibilities carry over to dynamic networks as well. With a leader we solve counting in linear time. Then we focus on dynamic networks with broadcast. We show that if nodes know an upper bound on the maximum degree that will ever appear then they can obtain an upper bound on n. Finally, we replace broadcast with one-to-each, in which a node may send a different message to each of its neighbors. This variation is then proved to be computationally equivalent to a full-knowledge model with unique names.
Abstract: In this work, we study the propagation of influence and computation in dynamic networks that are possibly disconnected at every instant. We focus on a synchronous message passing communication model with broadcast and bidirectional links. To allow for bounded end-to-end communication we propose a set of minimal temporal connectivity conditions that bound from the above the time it takes for information to make progress in the network. We show that even in dynamic networks that are disconnected at every instant information may spread as fast as in networks that are connected at every instant. Further, we investigate termination criteria when the nodes know some upper bound on each of the temporal connectivity conditions. We exploit our termination criteria to provide efficient protocols (optimal in some cases) that solve the fundamental counting and all-to-all token dissemination (or gossip) problems. Finally, we show that any protocol that is correct in instantaneous connectivity networks can be adapted to work in temporally connected networks.
Abstract: In this work, we study the propagation of influence and computation in dynamic distributed computing systems that are possibly disconnected at every instant. We focus on a synchronous message-passing communication model with broadcast and bidirectional links. Our network dynamicity assumption is a worst-case dynamicity controlled by an adversary scheduler, which has received much attention recently. We replace the usual (in worst-case dynamic networks) assumption that the network is connected at every instant by minimal temporal connectivity conditions. Our conditions only require that another causal influence occurs within every time window of some given length. Based on this basic idea, we define several novel metrics for capturing the speed of information spreading in a dynamic network. We present several results that correlate these metrics. Moreover, we investigate termination criteria in networks in which an upper bound on any of these metrics is known. We exploit our termination criteria to provide efficient (and optimal in some cases) protocols that solve the fundamental counting and all-to-all token dissemination (or gossip) problems.
Abstract: The problem of communication among mobile nodes is one of the most fundamental problems in ad hoc mobile networks and is at the core of many algorithms, such as for counting the number of nodes, electing a leader, data processing etc. For an exposition of several important problems in ad hoc mobile networks. The work of Chatzigiannakis, Nikoletseas and Spirakis focuses on wireless mobile networks that are subject to highly dynamic structural changes created by mobility, channel fluctuations and device failures. These changes affect topological connectivity, occur with high frequency and may not be predictable in advance. Therefore, the environment where the nodes move (in three-dimensional space with possible obstacles) as well as the motion that the nodes perform are \textit{input} to any distributed algorithm.
Abstract: In this chapter, our focus is on computational network analysis from a theoretical point of view. In particular, we study the \emph{propagation of influence and computation in dynamic distributed computing systems}. We focus on a \emph{synchronous message passing} communication model with bidirectional links. Our network dynamicity assumption is a \emph{worst-case dynamicity} controlled by an adversary scheduler, which has received much attention recently. We first study the fundamental \emph{naming} and \emph{counting} problems (and some variations) in
networks that are \emph{anonymous}, \emph{unknown}, and possibly dynamic. Network dynamicity is modeled here by the \emph{1-interval connectivity model}, in which communication is synchronous and a (worst-case) adversary
chooses the edges of every round subject to the condition that each instance is connected. We then replace this quite strong assumption by minimal \emph{temporal connectivity} conditions. These conditions only require that \emph{another causal influence occurs within every time-window of some given length}. Based on this basic idea we define several novel metrics for capturing the speed of information spreading in a dynamic network. We present several results that correlate these metrics. Moreover, we investigate \emph{termination criteria} in networks in which an upper bound on any of these metrics is known. We exploit these termination criteria to provide efficient (and optimal in some cases) protocols that solve the fundamental \emph{counting} and \emph{all-to-all token dissemination} (or \emph{gossip}) problems. Finally, we propose another model of worst-case temporal connectivity, called \emph{local
communication windows}, that assumes a fixed underlying communication network and restricts the adversary to allow communication between local neighborhoods in every time-window of some fixed length. We prove some basic properties and provide a protocol for counting in this model.
Abstract: This paper addresses the problem of counting the size of a network where (i) processes have the same identifiers (anonymous nodes) and (ii) the et-
work topology constantly changes (dynamic network). Changes are riven by a powerful adversary that can look at internal process states and add and remove edges in order to contrast the convergence of the algorithm to the correct count. The paper proposes two leader-based counting algorithms. Such algorithms are based on a technique that mimics an energy-transfer between network nodes. The first algorithm assumes that the adversary cannot generate either disconnected network graphs or network graphs where nodes have degree greater than D. In such algorithm, the leader can count the size of the network and detect the counting termination in a finite time (i.e., conscious counting algorithm). The second algorithm assumes that the adversary only keeps the network graph connected at any time and we prove that the leader can still converge to a correct count in a finite number of rounds, but it is not conscious when this convergence happens.
Abstract: Counting in general, and estimating the cardinality of (multi-) sets in particular, is highly desirable for a large variety of applications, representing a foundational block for the efficient deployment and access of emerging internet-scale information systems. Examples of such applications range from optimizing query access plans in internet-scale databases, to evaluating the significance (rank/score) of various data items in information retrieval applications. The key constraints that any acceptable solution must satisfy are: (i) efficiency: the number of nodes that need be contacted for counting purposes must be small in order to enjoy small latency and bandwidth requirements; (ii) scalability, seemingly contradicting the efficiency goal: arbitrarily large numbers of nodes nay need to add elements to a (multi-) set, which dictates the need for a highly distributed solution, avoiding server-based scalability, bottleneck, and availability problems; (iii) access and storage load balancing: counting and related overhead chores should be distributed fairly to the nodes of the network; (iv) accuracy: tunable, robust (in the presence of dynamics and failures) and highly accurate cardinality estimation; (v) simplicity and ease of integration: special, solution-specific indexing structures should be avoided. In this paper, first we contribute a highly-distributed, scalable, efficient, and accurate (multi-) set cardinality estimator. Subsequently, we show how to use our solution to build and maintain histograms, which have been a basic building block for query optimization for centralized databases, facilitating their porting into the realm of internet-scale data networks.
Abstract: We consider a synchronous distributed system with n processes that communicate through a dynamic network guaranteeing 1-interval connectivity i.e., the network topology graph might change at each interval while keeping the graph connected at any time. The processes belonging to the distributed system are identified through a set of labels L = {l1 , l2 . . . , lk } (with 1 ≤ k < n). In this challenging system model, the paper addresses the following problem: ”counting the number of processes with the same label”. We provide a distributed algorithm that is able solve the problem based on the notion of energy transfer. Each process owns a fixed energy charge, and tries to discharge itself exchanging, at each round, at most half of its charge with neighbors. The paper also discusses when such counting is possible in the presence of failures. Counting processes with the same label in dynamic networks with homonyms is of great importance because it is as difficult as computing generic aggregating functions.
Abstract: Counting items in a distributed system, and estimating the cardinality of multisets in particular,
is important for a large variety of applications and a fundamental building block for emerging Internet-scale information systems. Examples of such applications range from optimizing query access plans in peer-to-peer data sharing, to computing the significance (rank/score) of data items in distributed information retrieval. The general formal problem addressed in this article is computing the network-wide distinct number of items with some property (e.g., distinct files with file name
containing “spiderman”) where each node in the network holds an arbitrary subset, possibly overlapping the subsets of other nodes. The key requirements that a viable approach must satisfy are:
(1) scalability towards very large network size, (2) efficiency regarding messaging overhead, (3) load
balance of storage and access, (4) accuracy of the cardinality estimation, and (5) simplicity and easy
integration in applications. This article contributes the DHS (Distributed Hash Sketches) method
for this problem setting: a distributed, scalable, efficient, and accurate multiset cardinality estimator.
DHSis based on hash sketches for probabilistic counting, but distributes the bits of each counter
across network nodes in a judicious manner based on principles of Distributed Hash Tables, paying
careful attention to fast access and aggregation as well as update costs. The article discusses various
design choices, exhibiting tunable trade-offs between estimation accuracy, hop-count efficiency, and
load distribution fairness. We further contribute a full-fledged, publicly available, open-source implementation of all our methods, and a comprehensive experimental evaluation for various settings.
Abstract: Online and Realtime counting and estimating the cardinality of sets is highly desirable for a large variety of applications, representing a foundational block for the efficient deployment and access of emerging internet scale information systems. In this work we implement three well known duplicate
insensitive counting algorithms and evaluate their performance in a testbed of resource-limited commercial off-the-shelf hardware devices. We focus on devices that can be used in wireless mobile and sensor applications and evaluate the memory complexity, time complexity and absolute error of the algorithms under different realistic scenaria. Our findings indicate the suitability of each algorithm depending on the application characteristics.
Abstract: Information retrieval (IR) in peer-to-peer (P2P) networks,
where the corpus is spread across many loosely coupled
peers, has recently gained importance. In contrast to IR
systems on a centralized server or server farm, P2P IR faces
the additional challenge of either being oblivious to global
corpus statistics or having to compute the global measures
from local statistics at the individual peers in an efficient,
distributed manner. One specific measure of interest is the
global document frequency for different terms, which would
be very beneficial as term-specific weights in the scoring and
ranking of merged search results that have been obtained
from different peers.
This paper presents an efficient solution for the problem
of estimating global document frequencies in a large-scale
P2P network with very high dynamics where peers can join
and leave the network on short notice. In particular, the
developed method takes into account the fact that the lo-
cal document collections of autonomous peers may arbitrar-
ily overlap, so that global counting needs to be duplicate-
insensitive. The method is based on hash sketches as a
technique for compact data synopses. Experimental stud-
ies demonstrate the estimator?s accuracy, scalability, and
ability to cope with high dynamics. Moreover, the benefit
for ranking P2P search results is shown by experiments with
real-world Web data and queries.
Abstract: Information retrieval (IR) in peer-to-peer (P2P) networks,
where the corpus is spread across many loosely coupled
peers, has recently gained importance. In contrast to IR
systems on a centralized server or server farm, P2P IR faces
the additional challenge of either being oblivious to global
corpus statistics or having to compute the global measures
from local statistics at the individual peers in an efficient,
distributed manner. One specific measure of interest is the
global document frequency for different terms, which would
be very beneficial as term-specific weights in the scoring and
ranking of merged search results that have been obtained
from different peers.
This paper presents an efficient solution for the problem
of estimating global document frequencies in a large-scale
P2P network with very high dynamics where peers can join
and leave the network on short notice. In particular, the
developed method takes into account the fact that the lo-
cal document collections of autonomous peers may arbitrar-
ily overlap, so that global counting needs to be duplicate-
insensitive. The method is based on hash sketches as a
technique for compact data synopses. Experimental stud-
ies demonstrate the estimator?s accuracy, scalability, and
ability to cope with high dynamics. Moreover, the benefit
for ranking P2P search results is shown by experiments with
real-world Web data and queries.
Abstract: In this work, we study the fundamental naming and counting problems (and some variations) in networks that are anonymous, unknown, and possibly dynamic. In counting, nodes must determine the size of the network n and in naming they must end up with unique identities. By anonymous we mean that all nodes begin from identical states
apart possibly from a unique leader node and by unknown that nodes
have no a priori knowledge of the network (apart from some minimal
knowledge when necessary) including ignorance of n. Network dynamicity is modeled by the 1-interval connectivity model [KLO10], in which communication is synchronous and a (worst-case) adversary chooses the edges of every round subject to the condition that each instance is connected. We first focus on static networks with broadcast where we prove that, without a leader, counting is impossible to solve and that naming is impossible to solve even with a leader and even if nodes know n. These impossibilities carry over to dynamic networks as well. We also show that a unique leader suffices in order to solve counting in linear time.
Then we focus on dynamic networks with broadcast. We conjecture that
dynamicity renders nontrivial computation impossible. In view of this,
we let the nodes know an upper bound on the maximum degree that will
ever appear and show that in this case the nodes can obtain an upper
bound on n. Finally, we replace broadcast with one-to-each, in which a
node may send a different message to each of its neighbors. Interestingly,
this natural variation is proved to be computationally equivalent to a
full-knowledge model, in which unique names exist and the size of the
network is known.
Abstract: An ad-hoc mobile network is a collection of mobile hosts, with
wireless communication capabilities, forming a temporary network
without the aid of any established fixed infrastructure.
In such networks, topological connectivity is subject to frequent,
unpredictable change. Our work focuses on networks with high
rate of such changes to connectivity. For such dynamic changing
networks we propose protocols which exploit the co-ordinated
(by the protocol) motion of a small part of the network.
We show that such protocols can be designed to work
correctly and efficiently even in the case of arbitrary (but not
malicious) movements of the hosts not affected by the protocol.
We also propose a methodology for the analysis of the expected
behaviour of protocols for such networks, based on the assumption that mobile hosts (whose motion is not guided by
the protocol) conduct concurrent random walks in their
motion space.
Our work examines some fundamental problems such as pair-wise
communication, election of a leader and counting, and proposes
distributed algorithms for each of them. We provide their
proofs of correctness, and also give rigorous analysis by
combinatorial tools and also via experiments.
Abstract: We consider path protection in the routing and
wavelength assignment (RWA) problem for impairment
constrained WDM optical networks. The proposed multicost
RWA algorithms select the primary and the backup lightpaths by
accounting for physical layer impairments. The backup lightpath
may either be activated (1+1 protection) or it may be reserved and
not activated, with activation taking place when/if needed (1:1
protection). In case of 1:1 protection the period of time where the
quality of its transmission (QoT) is valid, despite the possible
establishment of future connections, should be preserved, so as to
be used in case the primary lightpath fails. We show that, by using
the multicost approach for solving the RWA with protection
problem, great benefits can be achieved both in terms of the
connection blocking rate and in terms of the validity period of the
backup lightpath. Moreover the multicost approach, by providing
a set of candidate lightpaths for each source destination pair,
instead of a single one, offers ease and flexibility in selecting the
primary and the backup lightpaths.
Abstract: We propose QoS-aware scheduling algorithms for Grid Networks that are capable of optimally or near-optimally
assigning computation and communication tasks to grid resources. The routing and scheduling algorithms to be
presented take as input the resource utilization profiles and the task characteristics and QoS requirements, and
co-allocate resources while accounting for the dependencies between communication and computation tasks.
Keywords: communication and computation utilization profiles, multicost routing and scheduling, grid
computing.
Abstract: We present SeAl1, a novel data/resource and data-access management infrastructure designed for the purpose of addressing a key problem in P2P data sharing networks, namely the problem of wide-scale selfish peer behavior. Selfish behavior has been manifested and well documented and it is widely accepted that unless this is dealt with, the scalability, efficiency, and the usefulness of P2P sharing networks will be diminished. SeAl essentially consists of a monitoring/accounting subsystem, an auditing/verification subsystem, and incentive mechanisms. The monitoring subsystem facilitates the classification of peers into selfish/altruistic. The auditing/verification layer provides a shield against perjurer/slandering and colluding peers that may try to cheat the monitoring subsystem. The incentives mechanisms efectively utilize these layers so to increase the computational/networking and data resources that are available to the community. Our extensive performance results show that SeAl performs its tasks swiftly, while the overhead introduced by our accounting and auditing mechanisms in terms of response time, network, and storage overheads are very small.
Abstract: We present SeAl, a novel data/resource and data-access management infrastructure designed for the purpose of addressing a key problem in P2P data sharing networks, namely the problem of wide-scale selfish peer behavior. Selfish behavior has been manifested and well documented and it is widely accepted that unless this is dealt with, the scalability, efficiency, and the usefulness of P2P sharing networks will be diminished. SeAl essentially consists of a monitoring/accounting subsystem, an auditing/verification subsystem, and incentive mechanisms. The monitoring subsystem facilitates the classification of peers into selfish/altruistic. The auditing/verification layer provides a shield against perjurer/slandering and colluding peers that may try to cheat the monitoring subsystem. The incentives mechanisms effectively utilize these layers so to increase the computational/networking and data resources that are available to the community. Our extensive performance results show that SeAl performs its tasks swiftly, while the overhead introduced by our accounting and auditing mechanisms in terms of response time, network, and storage overheads are very small.
Abstract: In this work, we consider a \emph{solution of automata} similar to \emph{Population Protocols} and \emph{Network Constructors}. The automata (also called \emph{nodes}) move passively in a well-mixed solution without being capable of controlling their movement. However, the nodes can \emph{cooperate} by interacting in pairs. Every such interaction may result in an update of the local states of the nodes. Additionally, the nodes may also choose to connect to each other in order to start forming some required structure. We may think of such nodes as the \emph{smallest possible programmable pieces of matter}, like tiny nanorobots or programmable molecules. The model that we introduce here is a more applied version of Network Constructors, imposing \emph{physical} (or \emph{geometrical}) \emph{constraints} on the connections that the nodes are allowed to form. Each node can connect to other nodes only via a very limited number of \emph{local ports}, which implies that at any given time it has only a \emph{bounded number of neighbors}. Connections are always made at \emph{unit distance} and are \emph{perpendicular to connections of neighboring ports}. Though such a model cannot form abstract networks like Network Constructors, it is still capable of forming very practical \emph{2D or 3D shapes}. We provide direct constructors for some basic shape construction problems, like \emph{spanning line}, \emph{spanning square}, and \emph{self-replication}. We then develop \emph{new techniques} for determining the computational and constructive capabilities of our model. One of the main novelties of our approach, concerns our attempt to overcome the inability of such systems to detect termination. In particular, we exploit the assumptions that the system is well-mixed and has a unique leader, in order to \emph{give terminating protocols that are correct with high probability}. This allows us to develop terminating subroutines that can be \emph{sequentially composed} to form larger \emph{modular protocols} (which has not been the case in the relevant literature). One of our main results is a \emph{terminating protocol counting the size $n$ of the system} with high probability. We then use this protocol as a subroutine in order to develop our \emph{universal constructors}, establishing that \emph{it is possible for the nodes to become self-organized with high probability into arbitrarily complex shapes while still detecting termination of the construction}.
Abstract: Numerous research efforts have produced a large number of algorithms and mechanisms for web proxy caches. In order to build powerful web proxies and understand their performance, one must be able to appreciate the impact and significance of earlier contributions and how they can be integrated. To do this we employ a cache replacement algorithm, 'CSP', which integrates key knowledge from previous work. CSP utilizes the communication Cost to fetch web objects, the objects' Sizes, their Popularities, an auxiliary cache and a cache admission control algorithm. We study the impact of these components with respect to hit ratio, latency, and bandwidth requirements.
Numerous research efforts have produced a large number of algorithms and mechanisms for web proxy caches. In order to build powerful web proxies and understand their performance, one must be able to appreciate the impact and significance of earlier contributions and how they can be integrated To do this we employ a cache replacement algorithm, 'CSP, which integrates key knowledge from previous work. CSP utilizes the communication Cost to fetch web objects, the objects' Sizes, their Popularifies, an auxiliary cache and a cache admission control algorithm. We study the impact of these components with respect to hit ratio, latency, and bandwidth requirements. Our results show that there are clear performance gains when utilizing the communication cost, the popularity of objects, and the auxiliary cache. In contrast, the size of objects and the admission controller have a negligible performance impact. Our major conclusions going against those in related work are that (i) LRU is preferable to CSP for important parameter values, (ii) accounting for the objects' sizes does not improve latency and/or bandwidth requirements, and (iii) the collaboration of nearby proxies is not very beneficial. Based on these results, we chart the problem solution space, identifying which algorithm is preferable and under which conditions. Finally, we develop a dynamic replacement algorithm that continuously utilizes the best algorithm as the problem-parameter values (e.g., the access distributions) change with time.