Abstract: The management of Grid resources requires scheduling of both computation and communication tasks at various levels. In this study, we consider the two constituent sub-problems of Grid scheduling, namely: (i) the scheduling of computation tasks to processing resources and (ii) the routing and scheduling of the data movement in a Grid network. Regarding computation tasks, we examine two typical online task scheduling algorithms that employ advance reservations and perform full network simulation experiments to measure their performance when implemented in a centralized or distributed manner. Similarly, for communication tasks, we compare two routing and data scheduling algorithms that are implemented in a centralized or a distributed manner. We examine the effect network propagation delay has on the performance of these algorithms. Our simulation results indicate that a distributed architecture with an exhaustive resource utilization update strategy yields better average end-to-end delay performance than a centralized architecture.

Abstract: Load balancing/sharing is a policy which exploits the communication facility between the servers of a distributed system, by using the exchanging of status information and jobs between any two servers of the system, in order to improve the performance of the whole system. In this work, we propose a new adaptive distributed hierarchical scheme, the Virtual Tree Algorithm (VTA), which creates a virtual binary tree structure over the actual network topology. It uses the Difference-Initiated (DI) technique ([11, 1]) for load balancing/sharing, which needs remote information for the transfer policy, and no additional information for the location policy. We demonstrate here that the introduced virtual construction can keep the exchanged messages to a number favourable to those of the previously known efficient algorithms. To show the above statement and evaluate the performance of our policy, we make use of both analytical and simulation results. By using the simulation model that we developed, we compared our results with one of the most representative and new adaptive, symmetrical, distributed, and efficient algorithms, the Variable Threshold (V THR) algorithm

Abstract: As a result of recent significant technological advances, a new computing and communication environment, Mobile Ad Hoc Networks (MANET), is about to enter the mainstream. A multitude of critical aspects, including mobility, severe limitations and limited reliability, create a new set of crucial issues and trade-offs that must be carefully taken into account in the design of robust and efficient algorithms for these environments. The communication among mobile hosts is one among the many issues that need to be resolved efficiently before MANET becomes a commodity.
In this paper, we propose to discuss the communication problem in MANET as well as present some characteristic techniques for the design, the analysis and the performance evaluation of distributed communication protocols for mobile ad hoc networks. More specifically, we propose to review two different design techniques. While the first type of protocols tries to create and maintain routing paths among the hosts, the second set of protocols uses a randomly moving subset of the hosts that acts as an intermediate pool for receiving and delivering messages. We discuss the main design choices for each approach, along with performance analysis of selected protocols.

Abstract: The problem of communication among mobile nodes is one of the most fundamental problems in ad hoc mobile networks and is at the core of many algorithms, such as for counting the number of nodes, electing a leader, data processing etc. For an exposition of several important problems in ad hoc mobile networks. The work of Chatzigiannakis, Nikoletseas and Spirakis focuses on wireless mobile networks that are subject to highly dynamic structural changes created by mobility, channel fluctuations and device failures. These changes affect topological connectivity, occur with high frequency and may not be predictable in advance. Therefore, the environment where the nodes move (in three-dimensional space with possible obstacles) as well as the motion that the nodes perform are \textit{input} to any distributed algorithm.

Abstract: DAP (DistributedAlgorithms Platform) is a generic and homogeneous simulation environment aiming at the implementation, simulation, and testing of distributedalgorithms for wired and wireless networks. In this work, we present its architecture, the most important design decisions, and discuss its distinct features and functionalities. DAP allows the algorithm designer to implement a distributed protocol by creating his own customized environment, and programming in a standard programming language in a style very similar to that of a real-world application. DAP provides a graphical user interface that allows the designer to monitor and control the execution of simulations, visualize algorithms, as well as gather statistics and other information for their experimental analysis and testing.

Abstract: Wireless sensor networks are comprised of a vast number of devices, situated in an area of interest that self organize in a structureless network, in order to monitor/record/measure an environmental variable or phenomenon and subsequently to disseminate the data to the control center.
Here we present research focused on the development, simulation and evaluation of energy efficient algorithms, our basic goal is to minimize the energy consumption. Despite technology advances, the problem of energy use optimization remains valid since current and emerging hardware solutions fail to solve it.
We aim to reduce communication cost, by introducing novel techniques that facilitate the development of new algorithms. We investigated techniques of distributed adaptation of the operations of a protocol by using information available locally on every node, thus through local choices we improve overall performance. We propose techniques for collecting and exploiting limited local knowledge of the network conditions. In an energy efficient manner, we collect additional information which is used to achieve improvements such as forming energy efficient, low latency and fault tolerant paths to route data. We investigate techniques for managing mobility in networks where movement is a characteristic of the control center as well as the sensors. We examine methods for traversing and covering the network field based on probabilistic movement that uses local criteria to favor certain areas.
The algorithms we develop based on these techniques operate a) at low level managing devices, b) on the routing layer and c) network wide, achieving macroscopic behavior through local interactions. The algorithms are applied in network cases that differ in density, node distribution, available energy and also in fundamentally different models, such as under faults, with incremental node deployment and mobile nodes. In all these settings our techniques achieve significant gains, thus distinguishing their value as tools of algorithmic design.

Abstract: When one engineers distributedalgorithms, some special characteristics
arise that are different from conventional (sequential or parallel)
computing paradigms. These characteristics include: the need for either a
scalable real network environment or a platform supporting a simulated
distributed environment; the need to incorporate asynchrony, where arbitrarya
synchrony is hard, if not impossible, to implement; and the generation
of “difficult” input instances which is a particular challenge. In this
work, we identifys ome of the methodological issues required to address
the above characteristics in distributed algorithm engineering and illustrate
certain approaches to tackle them via case studies. Our discussion
begins byad dressing the need of a simulation environment and how asynchronyis
incorporated when experimenting with distributedalgorithms.
We then proceed bys uggesting two methods for generating difficult input
instances for distributed experiments, namelya game-theoretic one and another
based on simulations of adversarial arguments or lower bound proofs.
We give examples of the experimental analysis of a pursuit-evasion protocol
and of a shared memorypro blem in order to demonstrate these ideas.
We then address a particularlyi nteresting case of conducting experiments
with algorithms for mobile computing and tackle the important issue of
motion of processes in this context. We discuss the two-tier principle as
well as a concurrent random walks approach on an explicit representation
of motions in ad-hoc mobile networks, which allow at least for averagecase
analysis and measurements and may give worst-case inputs in some
cases. Finally, we discuss a useful interplay between theory and practice
that arise in modeling attack propagation in networks.

Abstract: Wireless sensor networks are comprised of a vast number of ultra-small fully autonomous computing, communication and sensing devices, with very restricted energy and computing capabilities, which co-operate to accomplish a large sensing task. Such networks can be very useful in practice in applications that require fine-grain monitoring of physical environment subjected to critical conditions (such as inaccessible terrains or disaster places). Very large numbers of sensor devices can be deployed in areas of interest and use self-organization and collaborative methods to form deeply networked environments. Features including the huge number of sensor devices involved, the severe power, computational and memory limitations, their dense deployment and frequent failures, pose new design and implementation aspects. The efficient and robust realization of such large, highly-dynamic, complex, non-conventional environments is a challenging algorithmic and technological task. In this work we consider certain important aspects of the design, deployment and operation of distributedalgorithms for data propagation in wireless sensor networks and discuss some characteristic protocols, along with an evaluation of their performance.

Abstract: An ad hoc mobile network is a collection of mobile hosts, with wireless communication capabilities, forming a temporary network without the aid of any established fixed infrastructure. In such networks, topological connectivity is subject to frequent, unpredictable change. Our work focuses on networks with high rate of such changes to connectivity. For such dynamically changing networks we propose protocols which exploit the co-ordinated (by the protocol) motion of a small part of the network. We show that such protocols can be designed to work correctly and efficiently even in the case of arbitrary (but not malicious) movements of the hosts not affected by the protocol. We also propose a methodology for the analysis of the expected behavior of protocols for such networks, based on the assumption that mobile hosts (those whose motion is not guided by the protocol) conduct concurrent random walks in their motion space. In particular, our work examines the fundamental problem of communication and proposes distributedalgorithms for it. We provide rigorous proofs of their correctness, and also give performance analyses by combinatorial tools. Finally, we have evaluated these protocols by experimental means.

Abstract: An ad hoc mobile network is a collection of mobile hosts, with wireless communication capabilities, forming a temporary network without the aid of any established fixed infrastructure. In such networks, topological connectivity is subject to frequent, unpredictable change. Our work focuses on networks with high rate of such changes to connectivity. For such dynamically changing networks we propose protocols which exploit the co-ordinated (by the protocol) motion of a small part of the network. We show that such protocols can be designed to work correctly and efficiently even in the case of arbitrary (but not malicious) movements of the hosts not affected by the protocol. We also propose a methodology for the analysis of the expected behavior of protocols for such networks, based on the assumption that mobile hosts (those whose motion is not guided by the protocol) conduct concurrent random walks in their motion space. In particular, our work examines the fundamental problem of communication and proposes distributedalgorithms for it. We provide rigorous proofs of their correctness, and also give performance analyses by combinatorial tools. Finally, we have evaluated these protocols by experimental means.

Abstract: Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network.

Abstract: We address the issue of measuring distribution fairness in Internet-scale networks. This problem has several interesting instances encountered in different applications, ranging from assessing the distribution of load between network nodes for load balancing purposes, to measuring node utilization for optimal resource exploitation, and to guiding autonomous decisions of nodes in networks built with market-based economic principles. Although some metrics have been proposed, particularly for assessing load balancing algorithms, they fall short. We first study the appropriateness of various known and previously proposed statistical metrics for measuring distribution fairness. We put forward a number of required characteristics for appropriate metrics. We propose and comparatively study the appropriateness of the Gini coefficient (G) for this task. Our study reveals as most appropriate the metrics of G, the fairness index (FI), and the coefficient of variation (CV) in this order. Second, we develop six distributed sampling algorithms to estimate metrics online efficiently, accurately, and scalably. One of these algorithms (2-PRWS) is based on two effective optimizations of a basic algorithm, and the other two (the sequential sampling algorithm, LBS-HL, and the clustered sampling one, EBSS) are novel, developed especially to estimate G. Third, we show how these metrics, and especially G, can be readily utilized online by higher-level algorithms, which can now know when to best intervene to correct unfair distributions (in particular, load imbalances). We conclude with a comprehensive experimentation which comparatively evaluates both the various proposed estimation algorithms and the three most appropriate metrics (G, CV, andFI). Specifically, the evaluation quantifies the efficiency (in terms of number of the messages and a latency indicator), precision, and accuracy achieved by the proposed algorithms when estimating the competing fairness metrics. The central conclusion is that the proposed metric, G, can be estimated with a small number of messages and latency, regardless of the skew of the underlying distribution.

Abstract: Two important performance parameters of distributed, rate-based flow control algorithms are their locality and convergence complexity. The former is characterized by the amount of global knowledge that is available to their scheduling mechanisms, while the latter is defined as the number of update operations performed on rates of individual sessions until max-min fairness is reached. Optimistic algorithms allow any session to intermediately receive a rate larger than its max-min fair rate; bottleneck algorithms finalize the rate of a session only if it is restricted by a certain, highly congested link of the network. In this work, we present a comprehensive collection of lower and upper bounds on convergence complexity, under varying degrees of locality, for optimistic, bottleneck, rate-based flow control algorithms. Say that an algorithm is oblivious if its scheduling mechanism uses no information of either the session rates or the network topology. We present a novel, combinatorial construction of a capacitated network, which we use to establish a fundamental lower bound of dn 4 + n 2 on the convergence complexity of any oblivious algorithm, where n is the number of sessions laid out on a network, and d, the session dependency, is a measure of topological dependencies among sessions. Moreover, we devise a novel simulation proof to establish that, perhaps surprisingly, the lower bound of dn 4 + n 2 on convergence complexity still holds for any partially oblivious algorithm, in which the scheduling mechanism is allowed to use information about session rates, but is otherwise unaware of network topology. On the positive side, we prove that the lower bounds for oblivious and partially oblivious algorithms are both tight. We do so by presenting optimal oblivious algorithms, which converge after dn 2 + n 2 update operations are performed in the worst case. To complete the picture, we show that linear convergence complexity can indeed be achieved if information about both session rates and network topology is available to schedulers. We present a counterexample, nonoblivious algorithm, which converges within an optimal number of n update operations. Our results imply a surprising convergence complexity collapse of oblivious and partially oblivious algorithms, and a convergence complexity separation between (partially) oblivious and nonoblivious algorithms for optimistic, bottleneck rate-based flow control.

Abstract: Andrews et al. [Automatic method for hiding latency in high bandwidth networks, in: Proceedings of the ACM Symposium on Theory of Computing, 1996, pp. 257–265; Improved methods for hiding latency in high bandwidth networks, in: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, 1996, pp. 52–61] introduced a number of techniques for automatically hiding latency when performing simulations of networks with unit delay links on networks with arbitrary unequal delay links. In their work, they assume that processors of the host network are identical in computational power to those of the guest network being simulated. They further assume that the links of the host are able to pipeline messages, i.e., they are able to deliver P packets in time O(P+d) where d is the delay on the link.
In this paper we examine the effect of eliminating one or both of these assumptions. In particular, we provide an efficient simulation of a linear array of homogeneous processors connected by unit-delay links on a linear array of heterogeneous processors connected by links with arbitrary delay. We show that the slowdown achieved by our simulation is optimal. We then consider the case of simulating cliques by cliques; i.e., a clique of heterogeneous processors with arbitrary delay links is used to simulate a clique of homogeneous processors with unit delay links. We reduce the slowdown from the obvious bound of the maximum delay link to the average of the link delays. In the case of the linear array we consider both links with and without pipelining. For the clique simulation the links are not assumed to support pipelining.
The main motivation of our results (as was the case with Andrews et al.) is to mitigate the degradation of performance when executing parallel programs designed for different architectures on a network of workstations (NOW). In such a setting it is unlikely that the links provided by the NOW will support pipelining and it is quite probable the processors will be heterogeneous. Combining our result on clique simulation with well-known techniques for simulating shared memory PRAMs on distributed memory machines provides an effective automatic compilation of a PRAM algorithm on a NOW.

Abstract: We present key aspects (hardware, software, topology, networking) of SenseWall, an experimental sensor network test-bed we have created for the implementation and engineering of distributed sensor network algorithms. We then describe how SenseWall has been in particular used to implement two recent state of the art algorithms for energy balanced sensor data propagation. We elaborate on the issues and challenges created by the restrictions and particularities of the experimental test-bed and how we dealt with them. We also carry out a detailed performance evaluation comparing the energy balance protocols to two baseline protocols that include only either single hop or direct data transmissions.

Abstract: We study the problem of fair resource allocation in a simple cooperative multi-agent setting where we have k agents and a set of n objects to be allocated to those agents. Each object is associated with a weight represented by a positive integer or real number. We would like to allocate all objects to the agents so that each object is allocated to only one agent and the weight is distributed fairly. We adopt the fairness index popularized by the networking community as our measure of fairness, and study centralized algorithms for fair resource allocation. Based on the relationship between our problem and number partitioning, we devise a greedy algorithm for fair resource allocation that runs in polynomial time but is not guaranteed to find the optimal solution, and a complete anytime algorithm that finds the optimal solution but runs in exponential time. Then we study the phase transition behavior of the complete algorithm. Finally, we demonstrate that the greedy algorithm actually performs very well and returns almost perfectly fair allocations.

Abstract: In this paper we present the design of a simulator platform called FUSE (Fast Universal Simulator Engine). The term Universal means that the Engine can be adapted easily to different domains and be used for varying simulation needs, although our main target is simulation of distributedalgorithms in distributed computing environments. The Engine is Fast in the sense that the simulation overhead is minimal and very large systems can be simulated. We discuss the architecture and the design decisions that form the basis of these features. We also describe the functionality that is provided to its users (e.g., monitoring, statistics, etc.).

Abstract: In this work we study the implementation of multicost rout-
ing in a distributed way in wireless mobile ad hoc networks.
In contrast to traditional single-cost routing, where each
path is characterized by a scalar, in multicost routing a
vector of cost parameters is assigned to each network link,
from which the cost vectors of candidate paths are calcu-
lated. These parameters are combined in various optimiza-
tion functions, corresponding to diﬀerent routing algorithms,
for selecting the optimal path. Up until now the performance
of multicost and multi-constrained routing in wireless ad hoc
networks has been evaluated either at a theoretical level or
by assuming that nodes are static and have full knowledge
of the network topology and nodes� state. In the present
paper we assess the performance of multicost routing based
on energy-related parameters in mobile ad hoc networks by
embedding its logic in the Dynamic Source Routing (DSR)
algorithm, which is a well-known fully distributed routing
algorithm. We use simulations to compare the performance
of the multicost-DSR algorithm to that of the original DSR
algorithm and examine their behavior under various node
mobility scenarios. The results conﬁrm that the multicost-
DSR algorithm improves the performance of the network in
comparison to the original DSR algorithm in terms of energy eﬃciency. The multicost-DSR algorithm enhances the
performance of the network not only by reducing energy
consumption overall in the network, but also by spreading
energy consumption more uniformly across the network, pro
longing the network lifetime and reducing the packet drop
probability. Furthermore the delay suﬀered by the packets
reaching their destination for the case of the multicost-DSR
algorithm is shown to be lower than in the case of the orig
inal DSR algorithm.

Abstract: With this work we aim to make a three-fold contribution.
We first address the issue of supporting efficiently queries
over string-attributes involving prefix, suffix, containment,
and equality operators in large-scale data networks. Our
first design decision is to employ distributed hash tables
(DHTs) for the data network?s topology, harnessing their
desirable properties. Our next design decision is to derive
DHT-independent solutions, treating DHT as a black box.
Second, we exploit this infrastructure to develop efficient
content based publish/subscribe systems. The main con-
tribution here are algorithms for the efficient processing of
queries (subscriptions) and events (publications). Specifi-
cally, we show that our subscription processing algorithms
require O(logN) messages for a N-node network, and our
event processing algorithms require O(l ? logN) messages
(with l being the average string length).
Third, we develop algorithms for optimizing the proces-
sing of multi-dimensional events, involving several string at-
tributes. Further to our analysis, we provide simulation-
based experiments showing promising performance results
in terms of number of messages, required bandwidth, load
balancing, and response times.

Abstract: This paper addresses the efficient processing of
top-k queries in wide-area distributed data
repositories where the index lists for the attribute
values (or text terms) of a query are distributed
across a number of data peers and the
computational costs include network latency,
bandwidth consumption, and local peer work.
We present KLEE, a novel algorithmic
framework for distributed top-k queries,
designed for high performance and flexibility.
KLEE makes a strong case for approximate top-k
algorithms over widely distributed data sources.
It shows how great gains in efficiency can be
enjoyed at low result-quality penalties. Further,
KLEE affords the query-initiating peer the
flexibility to trade-off result quality and expected
performance and to trade-off the number of
communication phases engaged during query
execution versus network bandwidth
performance. We have implemented KLEE and
related algorithms and conducted a
comprehensive performance evaluation. Our
evaluation employed real-world and synthetic
large, web-data collections, and query
benchmarks. Our experimental results show that
KLEE can achieve major performance gains in
terms of network bandwidth, query response
times, and much lighter peer loads, all with small
errors in result precision and other result-quality
measures

Abstract: We address the issue of measuring storage, or query load distribution fairness in peer-to-peer data management systems. Existing metrics may look promising from the point of view of specific peers, while in reality being far from optimal from a global perspective. Thus, first we define the requirements and study the appropriateness of various statistical metrics for measuring load distribution fairness towards these requirements. The metric proposed as most appropriate is the Gini coefficient (G). Second, we develop novel distributed sampling algorithms to compute G on-line, with high precision, efficiently, and scalably. Third, we show how G can readily be utilized on-line by higher-level algorithms which can now know when to best intervene to correct load imbalances. Our analysis and experiments testify for the efficiency and accuracy of these algorithms, permitting the online use of a rich and reliable metric, conveying a global perspective of the distribution.

Abstract: The promises inherent in users coming together to form data
sharing network communities, bring to the foreground new problems formulated
over such dynamic, ever growing, computing, storage, and networking
infrastructures. A key open challenge is to harness these highly
distributed resources toward the development of an ultra scalable, efficient
search engine. From a technical viewpoint, any acceptable solution
must fully exploit all available resources dictating the removal of any
centralized points of control, which can also readily lead to performance
bottlenecks and reliability/availability problems. Equally importantly,
however, a highly distributed solution can also facilitate pluralism in informing
users about internet content, which is crucial in order to preclude
the formation of information-resource monopolies and the biased visibility
of content from economically-powerful sources. To meet these challenges,
the work described here puts forward MINERVA{\^a}{\"i}¿½{\"i}¿½, a novel search
engine architecture, designed for scalability and efficiency. MINERVA{\^a}{\"i}¿½{\"i}¿½
encompasses a suite of novel algorithms, including algorithms for creating
data networks of interest, placing data on network nodes, load balancing,
top-k algorithms for retrieving data at query time, and replication algorithms
for expediting top-k query processing. We have implemented the
proposed architecture and we report on our extensive experiments with
real-world, web-crawled, and synthetic data and queries, showcasing the
scalability and efficiency traits of MINERVA{\^a}{\"i}¿½{\"i}¿½.

Abstract: In this paper we propose an energy-aware broadcast algorithm for wireless networks. Our algorithm is based on the multicost approach and selects the set of nodes that by transmitting implement broadcasting in an optimally energy-efficient way. The energy-related parameters taken into account are the node transmission power and the node residual energy. The algorithm{\^a}€™s complexity however is non-polynomial, and therefore, we propose a relaxation producing a near-optimal solution in polynomial time. We also consider a distributed information exchange scheme that can be coupled with the proposed algorithms and examine the overhead introduced by this integration. Using simulations we show that the proposed algorithms outperform other solutions in the literature in terms of energy efficiency. Moreover, it is shown that the near-optimal algorithm obtains most of the performance benefits of the optimal algorithm at a smaller computational overhead.

Abstract: An ad-hoc mobile network is a collection of mobile hosts, with
wireless communication capabilities, forming a temporary network
without the aid of any established fixed infrastructure.
In such networks, topological connectivity is subject to frequent,
unpredictable change. Our work focuses on networks with high
rate of such changes to connectivity. For such dynamic changing
networks we propose protocols which exploit the co-ordinated
(by the protocol) motion of a small part of the network.
We show that such protocols can be designed to work
correctly and efficiently even in the case of arbitrary (but not
malicious) movements of the hosts not affected by the protocol.
We also propose a methodology for the analysis of the expected
behaviour of protocols for such networks, based on the assumption that mobile hosts (whose motion is not guided by
the protocol) conduct concurrent random walks in their
motion space.
Our work examines some fundamental problems such as pair-wise
communication, election of a leader and counting, and proposes
distributedalgorithms for each of them. We provide their
proofs of correctness, and also give rigorous analysis by
combinatorial tools and also via experiments.

Abstract: This paper studies the data gathering problem in wireless networks, where data generated at the nodes has to be collected at a single sink. We investigate the relationship between routing optimality and fair resource management. In particular, we prove that for energy balanced data propagation, Pareto optimal routing and flow maximization are equivalent, and also prove that flow maximization is equivalent to maximizing the network lifetime. We algebraically characterize the network structures in which energy balanced data flows are maximal. Moreover, we algebraically characterize communication links which are not used by an optimal flow. This leads to the characterization of minimal network structures supporting the maximal flows.
We note that energy balance, although implying global optimality, is a local property that can be computed efficiently and in a distributed manner. We suggest online distributedalgorithms for energy balance in different optimal network structures and numerically show their stability in particular setting. We remark that although the results obtained in this paper have a direct consequence in energy saving for wireless networks they do not limit themselves to this type of networks neither to energy as a resource. As a matter of fact, the results are much more general and can be used for any type of network and different type of resources.

Abstract: This paper studies the data gathering problem in wireless networks, where data generated at the nodes has to be collected at a single sink. We investigate the relationship between routing optimality and fair resource management. In particular, we prove that for energy-balanced data propagation, Pareto optimal routing and flow maximization are equivalent, and also prove that flow maximization is equivalent to maximizing the network lifetime. We algebraically characterize the network structures in which energy-balanced data flows are maximal. Moreover, we algebraically characterize communication links which are not used by an optimal flow. This leads to the characterization of minimal network structures supporting the maximal flows.
We note that energy-balance, although implying global optimality, is a local property that can be computed efficiently and in a distributed manner. We suggest online distributedalgorithms for energy-balance in different optimal network structures and numerically show their stability in particular setting. We remark that although the results obtained in this paper have a direct consequence in energy saving for wireless networks they do not limit themselves to this type of networks neither to energy as a resource. As a matter of fact, the results are much more general and can be used for any type of network and different types of resources.

Abstract: In this work we tackle the open problem of self-join size (SJS) estimation in a large-scale distributed data system, where tuples of a relation are distributed over data nodes which comprise an overlay network. Our contributions include adaptations of five well-known SJS estimation centralized techniques (coined sequential, cross-sampling, adaptive, bifocal, and sample-count) to the network environment and a novel technique which is based on the use of the Gini coefficient. We develop analyses showing how Gini estimations can lead to estimations of the underlying Zipfian or power-law value distributions. We further contribute distributed sampling algorithms that can estimate accurately and efficiently the Gini coefficient. Finally, we provide detailed experimental evidence testifying for the claimed increased accuracy, precision, and efficiency of the proposed SJS estimation method, compared to the other methods. The proposed approach is the only one to ensure high efficiency, precision, and accuracy regardless of the skew of the underlying data.

Abstract: In this work, we study protocols (i.e. distributedalgorithms) so that populations of distributed processes can construct networks. In order to highlight the basic principles of distributed network construction we keep the model minimal in all respects. In particular, we assume finite-state processes that all begin from the same initial state and all execute the same protocol (i.e. the system is homogeneous). Moreover, we assume pairwise interactions between the processes that are scheduled by an adversary. The only constraint on the adversary scheduler is that it must be fair, intuitively meaning that it must assign to every reachable configuration of the system a non-zero probability to occur. In order to allow processes to construct networks, we let them activate and deactivate their pairwise connections. When two processes interact, the protocol takes as input the states of the processes and the state of their connection and updates all of them. In particular, in every interaction, the protocol may activate an inactive connection, deactivate an active one, or leave the state of a connection unchanged. Initially all connections are inactive and the goal is for the processes, after interacting and activating/deactivating connections for a while, to end up with a desired stable network (i.e. one that does not change any more). We give protocols (optimal in some cases) and lower bounds for several basic network construction problems such as spanning line, spanning ring, spanning star, and regular network. We provide proofs of correctness for all of our protocols and analyze the expected time to convergence of most of them under a uniform random scheduler that selects the next pair of interacting processes uniformly at random from all such pairs. Finally, we prove several universality results by presenting generic protocols that are capable of simulating a Turing Machine (TM) and exploiting it in order to construct a large class of networks. Our universality protocols use a subset of the population (waste) in order to distributedly construct there a TM able to decide a graph class in some given space. Then, the protocols repeatedly construct in the rest of the population (useful space) a graph equiprobably drawn from all possible graphs. The TM works on this and accepts if the presented graph is in the class. We additionally show how to partition the population into k supernodes, each being a line of log k nodes, for the largest such k. This amount of local memory is sufficient for the supernodes to obtain unique names and exploit their names and their memory to realize nontrivial constructions. Delicate composition and reinitialization issues have to be solved for these general constructions to work.

Abstract: Efficient query processing in traditional database
management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statisticsmanagement still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured
overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge.
To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation
of these tools for distributed statistical synopses,
and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability.

Abstract: We consider the important problem of energy balanced data propagation in wireless sensor networks and we extend and generalize
previous works by allowing adaptive energy assignment. We consider the data gathering problem where data are generated by the sensors and
must be routed toward a unique sink. Sensors route data by either sending the data directly to the sink or in a multi-hop fashion by delivering
the data to a neighbouring sensor. Direct and neighbouring transmissions require different levels of energy consumption. Basically, the protocols balance the energy consumption among the sensors by computing the adequate ratios of direct and neighbouring transmissions. An abstract model of energy dissipation as a random walk is proposed, along with rigorous performance analysis techniques. Two efficient distributedalgorithms are presented and analysed, by both rigorous means and simulation.
The first one is easy to implement and fast to execute. The protocol assumes that sensors know a-priori the rate of data they generate.
The sink collects and processes all these information in order to compute the relevant value of the protocol parameter. This value is transmitted
to the sensors which individually compute their optimal ratios of direct and neighbouring transmissions. The second protocol avoids the necessary a-priori knowledge of the data rate generated by sensors by inferring the relevant information from the observation of the data paths.
Furthermore, this algorithm is based on stochastic estimation methods and is adaptive to environmental changes.

Abstract: In emerging pervasive scenarios, data is collected by sensing devices in streams that occur at several distributed points of observation. The size of the data typically far exceeds the storage and computational capabilities of the tiny devices that have to collect and process them. A general and challenging task is to allow (some of) the nodes of a pervasive network to collectively perform monitoring of a neighbourhood of interest by issuing continuous aggregate queries on the streams observed in its vicinity. This class of algorithms is fully decentralized and diffusive in nature: collecting all the data at a few central nodes of the network is unfeasible in networks of low capability devices or in the presence of massive data sets. Two main problems arise in this scenario: (i) the intrinsic complexity of maintaining statistics over a data stream whose size greatly exceeds the capabilities of the device that performs the computation; (ii) composing the partial outcomes computed at different points of observation into an accurate, global statistic over a neighbourhood of interest, which entails coping with several problems, last but not least the receipt of duplicate information along multiple paths of diffusion.
Streaming techniques have emerged as powerful tools to achieve the general goals described above, in the first place because they assume a computational model in which computational and storage resources are assumed to be far exceeded by the amount of data on which computation occurs. In this contribution, we review the main streaming techniques and provide a classification of the computational problems and the applications they effectively address, with an emphasis on decentralized scenarios, which are of particular interest in pervasive networks

Abstract: We consider a security problem on a distributed network.
We assume a network whose nodes are vulnerable to infection
by threats (e.g. viruses), the attackers. A system security
software, the defender, is available in the system. However,
due to the network¢s size, economic and performance reasons,
it is capable to provide safety, i.e. clean nodes from
the possible presence of attackers, only to a limited part of
it. The objective of the defender is to place itself in such a
way as to maximize the number of attackers caught, while
each attacker aims not to be caught.
In [7], a basic case of this problem was modeled as a
non-cooperative game, called the Edge model. There, the
defender could protect a single link of the network. Here,
we consider a more general case of the problem where the
defender is able to scan and protect a set of k links of the
network, which we call the Tuple model. It is natural to expect
that this increased power of the defender should result
in a better quality of protection for the network. Ideally,
this would be achieved at little expense on the existence and
complexity of Nash equilibria (profiles where no entity can
improve its local objective unilaterally by switching placements
on the network).
In this paper we study pure and mixed Nash equilibria
in the model. In particular, we propose algorithms for computing
such equilibria in polynomial time and we provide a
polynomial-time transformation of a special class of Nash
equilibria, called matching equilibria, between the Edge
model and the Tuple model, and vice versa. Finally, we
establish that the increased power of the defender results in
higher-quality protection of the network.

Abstract: In this paper we describe a new simulation platform for complex wireless sensor networks that operate a collection of distributedalgorithms and network protocols. Simulating such systems is complicated because of the need to coordinate different network layers and debug protocol stacks, often with very different interfaces, options, and fidelities. Our platform (which we call WSNGE) is a flexible and extensible environment that provides a highly scalable simulator with unique characteristics. It focuses on user friendliness, providing every function in both scriptable and visual way, allowing the researcher to define simulations and view results in an easy to use graphical environment. Unlike other solutions, WSNGE does not distinguish between different scenario types, allowing multiple different protocols to run at the same time. It enables rich online interaction with running simulations, allowing parameters, topologies or the whole scenario to be altered at any point in time.