Abstract: We here present the Forward Planning Situated Protocol (FPSP), for scalable, energy efficient and fault tolerant data propagation in situated wireless sensor networks. To deal with the increased complexity of such deeply networked sensor systems, instead of emphasizing on a particular aspect of the services provided, i.e. either for low-energy periodic, or low-latency event-driven, or high-success query-based sensing, FPSP uses two novel mechanisms that allow the network operator to adjust the performance of the protocol in terms of energy, latency and success rate on a per-task basis. We emphasize on distributedness, direct or indirect interactions among relatively simple agents, flexibility and robustness.
The protocol operates by employing a series of plan & forward phases through which devices self-organize into forwarding groups that propagate data over discovered paths. FPSP performs a limited number of long range, high power data transmissions to collect information regarding the neighboring devices. The acquired information, allows to plan a (parameterizable long by {\"e}) sequence of short range, low power transmissions between nearby particles, based on certain optimization criteria. All particles that decide to respond (based on local criteria) to these long range transmissions enter the forwarding phase during which information is propagated via the acquired plan. Clearly, the duration of the forwarding phases is characterized by the parameter {\"e}, the transmission medium and the processing speed of the devices. In fact the parameter {\"e} provides a mechanism to adjust the protocol performance in terms of the latency--energy trade-off. By reducing {\"e} the latency is reduced at the cost of spending extra energy, while by increasing {\"e}, the energy dissipation is reduced but the latency is increased.
To control the success rate--energy trade-off, particles react locally on environment and context changes by using a set of rules that are based on response thresholds that relate individual-level plasticity with network-level resiliency, motivated by the nature-inspired method for dividing labor, a metaphor of social insect behavior for solving problems [1]. Each particle has an individual response threshold {\`E} that is related to the "local" density (as observed by the particle, [2]); particles engage in propagation of events when the level of the task-associated stimuli exceeds their thresholds. Let s be the intensity of a stimulus associated with a particular sensing task, set by the human authorities. We adopt the response function T{\`e}(s) = snover sn + {\`e}n, the probability of performing the task as a function of s, where n > 1 determines the steepness of the threshold. Thus, when {\`e} is small (i.e. the network is sparse) then the response probability increases; when s increases (i.e. for critical sensing tasks) the response probability increases as well.
This role-based approach where a selective number of devices do the high cost planning and the rest of the network operates in a low cost state leads to systems that have increased energy efficiency and high fault-tolerance since these long range planning phases allow to bypass obstacles (where no sensors are available) or faulty sensors (that have been disabled due to power failure or other natural events).
Abstract: Wireless Sensor Networks are complex systems consisting of a number of relatively simple autonomous sensing devices spread on a geographical area. The peculiarity of these devices lies on the constraints they face in relation to their energy reserves and their computational, storage and communication capabilities. The utility of these sensors is to measure certain environmental conditions and to detect critical events in relation to these measurements. Those events thereupon have to be reported to a specific central station namely the “sink”. This data propagation generally has the form of a hop-by-hop transmission. In this framework we work on distributed data propagation protocols which are taking into account the energy reserves of the sensors. In particular following the work of Chatzigiannakis et al. on the Probabilistic Forwarding Protocol (PFR) we present the distributed probabilistic protocol EFPFR, which favors transmission from the less depleted sensors in addition to favor transmissions close to the “optimal line”. This protocol is simple and relies only on local information for propagation decisions. Its main goal is to limit the total amount of energy dissipated per event and therefore to extend the network’s operation duration.
Abstract: In this work, we study the propagation of influence and computation in dynamic distributed computing systems that are possibly disconnected at every instant. We focus on a synchronous message-passing communication model with broadcast and bidirectional links. Our network dynamicity assumption is a worst-case dynamicity controlled by an adversary scheduler, which has received much attention recently. We replace the usual (in worst-case dynamic networks) assumption that the network is connected at every instant by minimal temporal connectivity conditions. Our conditions only require that another causal influence occurs within every time window of some given length. Based on this basic idea, we define several novel metrics for capturing the speed of information spreading in a dynamic network. We present several results that correlate these metrics. Moreover, we investigate termination criteria in networks in which an upper bound on any of these metrics is known. We exploit our termination criteria to provide efficient (and optimal in some cases) protocols that solve the fundamental counting and all-to-all token dissemination (or gossip) problems.
Abstract: In this chapter, our focus is on computational network analysis from a theoretical point of view. In particular, we study the \emph{propagation of influence and computation in dynamic distributed computing systems}. We focus on a \emph{synchronous message passing} communication model with bidirectional links. Our network dynamicity assumption is a \emph{worst-case dynamicity} controlled by an adversary scheduler, which has received much attention recently. We first study the fundamental \emph{naming} and \emph{counting} problems (and some variations) in
networks that are \emph{anonymous}, \emph{unknown}, and possibly dynamic. Network dynamicity is modeled here by the \emph{1-interval connectivity model}, in which communication is synchronous and a (worst-case) adversary
chooses the edges of every round subject to the condition that each instance is connected. We then replace this quite strong assumption by minimal \emph{temporal connectivity} conditions. These conditions only require that \emph{another causal influence occurs within every time-window of some given length}. Based on this basic idea we define several novel metrics for capturing the speed of information spreading in a dynamic network. We present several results that correlate these metrics. Moreover, we investigate \emph{termination criteria} in networks in which an upper bound on any of these metrics is known. We exploit these termination criteria to provide efficient (and optimal in some cases) protocols that solve the fundamental \emph{counting} and \emph{all-to-all token dissemination} (or \emph{gossip}) problems. Finally, we propose another model of worst-case temporal connectivity, called \emph{local
communication windows}, that assumes a fixed underlying communication network and restricts the adversary to allow communication between local neighborhoods in every time-window of some fixed length. We prove some basic properties and provide a protocol for counting in this model.
Abstract: Counting in general, and estimating the cardinality of (multi-) sets in particular, is highly desirable for a large variety of applications, representing a foundational block for the efficient deployment and access of emerging internet-scale informationsystems. Examples of such applications range from optimizing query access plans in internet-scale databases, to evaluating the significance (rank/score) of various data items in information retrieval applications. The key constraints that any acceptable solution must satisfy are: (i) efficiency: the number of nodes that need be contacted for counting purposes must be small in order to enjoy small latency and bandwidth requirements; (ii) scalability, seemingly contradicting the efficiency goal: arbitrarily large numbers of nodes nay need to add elements to a (multi-) set, which dictates the need for a highly distributed solution, avoiding server-based scalability, bottleneck, and availability problems; (iii) access and storage load balancing: counting and related overhead chores should be distributed fairly to the nodes of the network; (iv) accuracy: tunable, robust (in the presence of dynamics and failures) and highly accurate cardinality estimation; (v) simplicity and ease of integration: special, solution-specific indexing structures should be avoided. In this paper, first we contribute a highly-distributed, scalable, efficient, and accurate (multi-) set cardinality estimator. Subsequently, we show how to use our solution to build and maintain histograms, which have been a basic building block for query optimization for centralized databases, facilitating their porting into the realm of internet-scale data networks.
Abstract: Counting items in a distributed system, and estimating the cardinality of multisets in particular,
is important for a large variety of applications and a fundamental building block for emerging Internet-scale informationsystems. Examples of such applications range from optimizing query access plans in peer-to-peer data sharing, to computing the significance (rank/score) of data items in distributedinformation retrieval. The general formal problem addressed in this article is computing the network-wide distinct number of items with some property (e.g., distinct files with file name
containing “spiderman”) where each node in the network holds an arbitrary subset, possibly overlapping the subsets of other nodes. The key requirements that a viable approach must satisfy are:
(1) scalability towards very large network size, (2) efficiency regarding messaging overhead, (3) load
balance of storage and access, (4) accuracy of the cardinality estimation, and (5) simplicity and easy
integration in applications. This article contributes the DHS (Distributed Hash Sketches) method
for this problem setting: a distributed, scalable, efficient, and accurate multiset cardinality estimator.
DHSis based on hash sketches for probabilistic counting, but distributes the bits of each counter
across network nodes in a judicious manner based on principles of Distributed Hash Tables, paying
careful attention to fast access and aggregation as well as update costs. The article discusses various
design choices, exhibiting tunable trade-offs between estimation accuracy, hop-count efficiency, and
load distribution fairness. We further contribute a full-fledged, publicly available, open-source implementation of all our methods, and a comprehensive experimental evaluation for various settings.
Abstract: Distributed algorithm designers often assume that system processes execute the same predefined software. Alternatively, when they do not assume that, designers turn to non-cooperative games and seek an outcome that corresponds to a rough consensus when no coordination is allowed. We argue that both assumptions are inapplicable in many real distributedsystems, e.g., the Internet, and propose designing self-stabilizing and Byzantine fault-tolerant distributed game authorities. Once established, the game authority can secure the execution of any complete information game. As a result, we reduce costs that are due to the processes˘ freedom of choice. Namely, we reduce the price of malice.
Abstract: Information retrieval (IR) in peer-to-peer (P2P) networks,
where the corpus is spread across many loosely coupled
peers, has recently gained importance. In contrast to IR
systems on a centralized server or server farm, P2P IR faces
the additional challenge of either being oblivious to global
corpus statistics or having to compute the global measures
from local statistics at the individual peers in an efficient,
distributed manner. One specific measure of interest is the
global document frequency for different terms, which would
be very beneficial as term-specific weights in the scoring and
ranking of merged search results that have been obtained
from different peers.
This paper presents an efficient solution for the problem
of estimating global document frequencies in a large-scale
P2P network with very high dynamics where peers can join
and leave the network on short notice. In particular, the
developed method takes into account the fact that the lo-
cal document collections of autonomous peers may arbitrar-
ily overlap, so that global counting needs to be duplicate-
insensitive. The method is based on hash sketches as a
technique for compact data synopses. Experimental stud-
ies demonstrate the estimator?s accuracy, scalability, and
ability to cope with high dynamics. Moreover, the benefit
for ranking P2P search results is shown by experiments with
real-world Web data and queries.
Abstract: Information retrieval (IR) in peer-to-peer (P2P) networks,
where the corpus is spread across many loosely coupled
peers, has recently gained importance. In contrast to IR
systems on a centralized server or server farm, P2P IR faces
the additional challenge of either being oblivious to global
corpus statistics or having to compute the global measures
from local statistics at the individual peers in an efficient,
distributed manner. One specific measure of interest is the
global document frequency for different terms, which would
be very beneficial as term-specific weights in the scoring and
ranking of merged search results that have been obtained
from different peers.
This paper presents an efficient solution for the problem
of estimating global document frequencies in a large-scale
P2P network with very high dynamics where peers can join
and leave the network on short notice. In particular, the
developed method takes into account the fact that the lo-
cal document collections of autonomous peers may arbitrar-
ily overlap, so that global counting needs to be duplicate-
insensitive. The method is based on hash sketches as a
technique for compact data synopses. Experimental stud-
ies demonstrate the estimator?s accuracy, scalability, and
ability to cope with high dynamics. Moreover, the benefit
for ranking P2P search results is shown by experiments with
real-world Web data and queries.
Abstract: Let n atomic players be routing their unsplitable flow on mresources.
When each player has the option to drop her current resource and select a better
one, and this option is exercised sequentially and unilaterally, then a Nash Equilibrium
(NE) will be eventually reached. Acting sequentially, however, is unrealistic
in large systems. But, allowing concurrency, with an arbitrary number of
players updating their resources at each time point, leads to an oscillation away
from NE, due to big groups of players moving simultaneously and due to nonsmooth
resource cost functions. In this work, we validate experimentally simple
concurrent protocols that are realistic, distributed and myopic yet are scalable, require
only information local at each resource and, still, are experimentally shown
to quickly reach a NE for a range of arbitrary cost functions.
Abstract: Managing corporate Information Technology (IT) environment becomes increasingly complex as server logic architecture becomes distributed and the number of manageable entities increases. At the same time, the open source community has not yet produced a reliable systems and network management solution, even though there are open source initiatives specializing in individual fields of remote management. This paper presents OpenRSM, an integrated remote management system created by integrating individual open source initiatives and augmenting them to support additional functionality so that a lightweight integrated systems and network management solution is produced.
Abstract: Distributed algorithm designers often assume that system processes execute the same predefined software. Alternatively, when they do not assume that, designers turn to non-cooperative games and seek an outcome that corresponds to a rough consensus when no coordination is allowed. We argue that both assumptions are inapplicable in many real distributedsystems, e.g., the Internet, and propose designing self-stabilizing and Byzantine fault-tolerant distributed game authorities. Once established, the game authority can secure the execution of any complete information game. As a result, we reduce costs that are due to the processes˘ freedom of choice. Namely, we reduce the price of malice.
Abstract: In this Phd thesis,, we try to use formal logic and threshold phenomena that asymptotically emerge with certainty in order to build new trust models and to evaluate the existing one. The departure point of our work is that dynamic, global computing systems are not amenable to a static viewpoint of the trust concept, no matter how this concept is formalized. We believe that trust should be a statistical, asymptotic concept to be studied in the limit as the system's components grow according to some growth rate. Thus, our main goal is to define trust as an emerging system property that ``appears'' or "disappears" when a set of properties hold, asymptotically with probability$ 0$ or $1$ correspondingly . Here we try to combine first and second order logic in order to analyze the trust measures of specific network models. Moreover we can use formal logic in order to determine whether generic reliability trust models provide a method for deriving trust between peers/entities as the network's components grow. Our approach can be used in a wide range of applications, such as monitoring the behavior of peers, providing a measure of trust between them, assessing the level of reliability of peers in a network. Wireless sensor networks are comprised of a vast number of ultra-small autonomous computing, communication and sensing devices, with restricted energy and computing capabilities, that co-operate to accomplish a large sensing task. Sensor networks can be very useful in practice. Such systems should at least guarantee the confidentiality and integrity of the information reported to the controlling authorities regarding the realization of environmental events. Therefore, key establishment is critical for the protection in wireless sensor networks and the prevention of adversaries from attacking the network. Finally in this dissertation we also propose three distributed group key establishment protocols suitable for such energy constrained networks. This dissertation is composed of two parts. Part I develops the theory of the first and second order logic of graphs - their definition, and the analysis of their properties that are expressible in the {\em first order language} of graphs. In part II we introduce some new distributed group key establishment protocols suitable for sensor networks. Several key establishment schemes are derived and their performance is demonstrated.
Abstract: Efficient query processing in traditional database
management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributedsystems, on the other hand, statisticsmanagement still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured
overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge.
To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation
of these tools for distributed statistical synopses,
and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability.