Abstract: Clustering is a crucial network design approach to enable large-scale wireless sensor networks (WSNs) deployments. A large variety of clustering approaches has been presented focusing on different performance metrics. Such protocols usually aim at minimizing communication overhead, evenly distributing roles among the participating nodes, as well as controlling the network topology. Simulations on such protocols are performed using theoretical models that are based on unrealistic assumptions like the unit disk graph communication model, ideal wireless communication channels and perfect energy consumption estimations. With these assumptions taken for granted, theoretical models claim various performance milestones that cannot be achieved in realistic conditions. In this paper, we design a new clustering protocol that adapts to the changes in the environment and the needs and goals of the user applications. We address the issues that hinder its performance due to the real environment conditions and provide a deployable protocol. The implementation, integration and experimentation of this new protocol and it's optimizations, were performed using the \textsf{WISEBED} framework. We apply our protocol in multiple indoors wireless sensor testbeds with multiple experimental scenarios to showcase scalability and trade-offs between network properties and configurable protocol parameters. By analysis of the real world experimental output, we present results that depict a more realistic view of the clustering problem, regarding adapting to environmental conditions and the quality of topology control. Our study clearly demonstrates the applicability of our approach and the benefits it offers to both research \& development communities.
Abstract: Raising awareness among young people and changing their behaviour and habits concerning energy usage is key to achieving sustained energy saving. Additionally, young people are very sensitive to environmental protection so raising awareness among children is much easier than with any other group of citizens. This work examines ways to create an innovative Information & Communication Technologies (ICT) ecosystem (including web-based, mobile, social and sensing elements) tailored specifically for school environments, taking into account both the users (faculty, staff, students, parents) and school buildings, thus motivating and supporting young citizens¢ behavioural change to achieve greater energy efficiency. A mixture of open-source IoT hardware and proprietary platforms on the infrastructure level, are currently being utilized for monitoring a fleet of 18 educational buildings across 3 countries, comprising over 700 IoT monitoring points. Hereon presented is the system¢s high-level architecture, as well as several aspects of its implementation, related to the application domain of educational building monitoring and energy efficiency. The system is developed based on open-source technologies and services in order to make it capable of providing open IT-infrastructure and support from different commercial hardware/sensor vendors as well as open-source solutions. The system presented can be used to develop and offer new app-based solutions that can be used either for educational purposes or for managing the energy efficiency of the building. The system is replicable and adaptable to settings that may be different than the scenarios envisioned here (e.g., targeting different climate zones), different IT infrastructures and can be easily extended to accommodate integration with other systems. The overall performance of the system is evaluated in real-world environment in terms of scalability, responsiveness and simplicity.
Abstract: Clustering is an important research topic for wireless sensor
networks (WSNs). A large variety of approaches has been
presented focusing on dierent performance metrics. Even
though all of them have many practical applications, an ex-
tremely limited number of software implementations is avail-
able to the research community. Furthermore, these very few
techniques are implemented for specic WSN systems or are
integrated in complex applications. Thus it is very difficult
to comparatively study their performance and almost impos-
sible to reuse them in future applications under a dierent
scope. In this work we study a large body of well estab-
lished algorithms. We identify their main building blocks
and propose a component-based architecture for developing
clustering algorithms that (a) promotes exchangeability of
algorithms thus enabling the fast prototyping of new ap-
proaches, (b) allows cross-layer implementations to realize
complex applications, (c) oers a common platform to com-
paratively study the performance of dierent approaches,
(d) is hardware and OS independent. We implement 5 well
known algorithms and discuss how to implement 11 more.
We conduct an extended simulation study to demonstrate
the faithfulness of our implementations when compared to
the original implementations. Our simulations are at very
large scale thus also demonstrating the scalability of the
original algorithms beyond their original presentations. We
also conduct experiments to assess their practicality in real
WSNs. We demonstrate how the implemented clustering
algorithms can be combined with routing and group key es-
tablishment algorithms to construct WSN applications. Our
study clearly demonstrates the applicability of our approach
and the benets it oers to both research & development
communities.
Abstract: Counting in general, and estimating the cardinality of (multi-) sets in particular, is highly desirable for a large variety of applications, representing a foundational block for the efficient deployment and access of emerging internet-scale information systems. Examples of such applications range from optimizing query access plans in internet-scale databases, to evaluating the significance (rank/score) of various data items in information retrieval applications. The key constraints that any acceptable solution must satisfy are: (i) efficiency: the number of nodes that need be contacted for counting purposes must be small in order to enjoy small latency and bandwidth requirements; (ii) scalability, seemingly contradicting the efficiency goal: arbitrarily large numbers of nodes nay need to add elements to a (multi-) set, which dictates the need for a highly distributed solution, avoiding server-based scalability, bottleneck, and availability problems; (iii) access and storage load balancing: counting and related overhead chores should be distributed fairly to the nodes of the network; (iv) accuracy: tunable, robust (in the presence of dynamics and failures) and highly accurate cardinality estimation; (v) simplicity and ease of integration: special, solution-specific indexing structures should be avoided. In this paper, first we contribute a highly-distributed, scalable, efficient, and accurate (multi-) set cardinality estimator. Subsequently, we show how to use our solution to build and maintain histograms, which have been a basic building block for query optimization for centralized databases, facilitating their porting into the realm of internet-scale data networks.
Abstract: Evaluating target tracking protocols for wireless sensor networks that can localize multiple mobile devices, can be a very challenging task. Such protocols usually aim at minimizing communication overhead, data processing for the participating nodes, as well as delivering adequate tracking information of the mobile targets in a timely manner. Simulations on such protocols are performed using theoretical models that are based on unrealistic assumptions like the unit disk graph communication model, ideal network localization and perfect distance estimations. With these assumptions taken for granted, theoretical models claim various performance milestones that cannot be achieved in realistic conditions. In this paper we design a new localization protocol, where mobile assets can be tracked passively via software agents. We address the issues that hinder its performance due to the real environment conditions and provide a deployable protocol. The implementation, integration and experimentation of this new protocol and it's optimizations, were performed using the WISEBED framework. We apply our protocol in multiple indoors wireless sensor testbeds with multiple experimental scenarios to showcase scalability and trade-offs between network properties and configurable protocol parameters. By analysis of the real world experimental output, we present results that depict a more realistic view of the target tracking problem, regarding power consumption and the quality of tracking information. Finally we also conduct some very focused simulations to assess the scalability of our protocol in very large networks and multiple mobile assets.
Abstract: Peer-to-Peer (P2P) search requires intelligent decisions for query routing: selecting the best peers to which a given query, initiated at some peer, should be forwarded for retrieving additional search results. These decisions are based on statistical summaries for each peer, which are usually organized on a per-keyword basis and managed in a distributed directory of routing indices. Such architectures disregard the possible correlations among keywords. Together with the coarse granularity of per-peer summaries, which are mandated for scalability, this limitation may lead to poor search result quality.
This paper develops and evaluates two solutions to this problem, sk-STAT based on single-key statistics only, and mk-STAT based on additional multi-key statistics. For both cases, hash sketch synopses are used to compactly represent a peer's data items and are efficiently disseminated in the P2P network to form a decentralized directory. Experimental studies with Gnutella and Web data demonstrate the viability and the trade-offs of the approaches.
Abstract: Counting items in a distributed system, and estimating the cardinality of multisets in particular,
is important for a large variety of applications and a fundamental building block for emerging Internet-scale information systems. Examples of such applications range from optimizing query access plans in peer-to-peer data sharing, to computing the significance (rank/score) of data items in distributed information retrieval. The general formal problem addressed in this article is computing the network-wide distinct number of items with some property (e.g., distinct files with file name
containing “spiderman”) where each node in the network holds an arbitrary subset, possibly overlapping the subsets of other nodes. The key requirements that a viable approach must satisfy are:
(1) scalability towards very large network size, (2) efficiency regarding messaging overhead, (3) load
balance of storage and access, (4) accuracy of the cardinality estimation, and (5) simplicity and easy
integration in applications. This article contributes the DHS (Distributed Hash Sketches) method
for this problem setting: a distributed, scalable, efficient, and accurate multiset cardinality estimator.
DHSis based on hash sketches for probabilistic counting, but distributes the bits of each counter
across network nodes in a judicious manner based on principles of Distributed Hash Tables, paying
careful attention to fast access and aggregation as well as update costs. The article discusses various
design choices, exhibiting tunable trade-offs between estimation accuracy, hop-count efficiency, and
load distribution fairness. We further contribute a full-fledged, publicly available, open-source implementation of all our methods, and a comprehensive experimental evaluation for various settings.
Abstract: We study the important problem of tracking moving
targets in wireless sensor networks. We try to overcome the
limitations of standard state of the art tracking methods based on
continuous location tracking, i.e. the high energy dissipation and
communication overhead imposed by the active participation of
sensors in the tracking process and the low scalability, especially
in sparse networks. Instead, our approach uses sensors in a
passive way: they only record and judiciously spread information
about observed target presence in their vicinity; this information
is then used by the (powerful) tracking agent to locate the target
by just following the traces left at sensors. Our protocol is greedy,
local, distributed, energy efficient and very successful, in the
sense that (as shown by extensive simulations) the tracking agent
manages to quickly locate and follow the target; also, we achieve
good trade-offs between the energy dissipation and latency.
Abstract: Information retrieval (IR) in peer-to-peer (P2P) networks,
where the corpus is spread across many loosely coupled
peers, has recently gained importance. In contrast to IR
systems on a centralized server or server farm, P2P IR faces
the additional challenge of either being oblivious to global
corpus statistics or having to compute the global measures
from local statistics at the individual peers in an efficient,
distributed manner. One specific measure of interest is the
global document frequency for different terms, which would
be very beneficial as term-specific weights in the scoring and
ranking of merged search results that have been obtained
from different peers.
This paper presents an efficient solution for the problem
of estimating global document frequencies in a large-scale
P2P network with very high dynamics where peers can join
and leave the network on short notice. In particular, the
developed method takes into account the fact that the lo-
cal document collections of autonomous peers may arbitrar-
ily overlap, so that global counting needs to be duplicate-
insensitive. The method is based on hash sketches as a
technique for compact data synopses. Experimental stud-
ies demonstrate the estimator?s accuracy, scalability, and
ability to cope with high dynamics. Moreover, the benefit
for ranking P2P search results is shown by experiments with
real-world Web data and queries.
Abstract: The simplex method has been successfully used in solving linear programming problems for many years. Parallel approaches have also extensively been studied due to the intensive computations required, especially for the solution of large linear problems (LPs). In this paper we present a highly scalable parallel implementation framework of the standard full tableau simplex method on a highly parallel (distributed memory) environment. Specifically, we have designed and implemented a suitable column distribution scheme as well as a row distribution scheme and we have entirely tested our implementations over a considerably powerful distributed platform (linux cluster with myrinet interface). We then compare our approaches (a) among each other for variable number of problem size (number of rows and columns) and (b) to other recent and valuable corresponding efforts in the literature. In most cases, the column distribution scheme performs quite/much better than the row distribution scheme. Moreover, both schemes (even the row distribution scheme over large-scale problems) lead to particularly high speedup and efficiency values, which are considerably better in all cases than the ones achieved in other similar research efforts and implementations. Moreover, we further evaluate our basic parallelization scheme over very large LPs in order to validate more reliably the high efficiency and scalability achieved.
Abstract: Information retrieval (IR) in peer-to-peer (P2P) networks,
where the corpus is spread across many loosely coupled
peers, has recently gained importance. In contrast to IR
systems on a centralized server or server farm, P2P IR faces
the additional challenge of either being oblivious to global
corpus statistics or having to compute the global measures
from local statistics at the individual peers in an efficient,
distributed manner. One specific measure of interest is the
global document frequency for different terms, which would
be very beneficial as term-specific weights in the scoring and
ranking of merged search results that have been obtained
from different peers.
This paper presents an efficient solution for the problem
of estimating global document frequencies in a large-scale
P2P network with very high dynamics where peers can join
and leave the network on short notice. In particular, the
developed method takes into account the fact that the lo-
cal document collections of autonomous peers may arbitrar-
ily overlap, so that global counting needs to be duplicate-
insensitive. The method is based on hash sketches as a
technique for compact data synopses. Experimental stud-
ies demonstrate the estimator?s accuracy, scalability, and
ability to cope with high dynamics. Moreover, the benefit
for ranking P2P search results is shown by experiments with
real-world Web data and queries.
Abstract: We extend here the Population Protocol model of Angluin et al. [2004] in order to model more powerful networks of very small resource-limited artefacts (agents) that are possibly mobile. Communication can happen only between pairs of artefacts. A communication graph (or digraph) denotes the permissible pairwise interactions. The main feature of our extended model is to allow edges of the communication graph, G, to have states that belong to a constant size set. We also allow edges to have readable only costs, whose values also belong to a constant size set. We then allow the protocol rules for pairwise interactions to modify the corresponding edge state. Thus, our protocol specifications are still independent of the population size and do not use agent ids, i.e. they preserve scalability, uniformity and anonymity. Our Mediated Population Protocols (MPP) can stably compute graph properties of the communication graph. We show this for the properties of maximal matchings (in undirected communication graphs), also for finding the transitive closure of directed graphs and for finding all edges of small cost. We demonstrate that our mediated protocols are stronger than the classical population protocols, by presenting a mediated protocol that stably computes the product of two positive integers, when G is the complete graph. This is not a semilinear predicate. To show this fact, we state and prove a general Theorem about the Composition of two stably computing mediated population protocols. We also show that all predicates stably computable in our model are (non-uniformly) in the class NSPACE(m), where m is the number of edges of the communication graph. We also define Randomized MPP and show that, any Peano predicate accepted by a MPP, can be verified in deterministic Polynomial Time.
Abstract: The promises inherent in users coming together to form data
sharing network communities, bring to the foreground new problems formulated
over such dynamic, ever growing, computing, storage, and networking
infrastructures. A key open challenge is to harness these highly
distributed resources toward the development of an ultra scalable, efficient
search engine. From a technical viewpoint, any acceptable solution
must fully exploit all available resources dictating the removal of any
centralized points of control, which can also readily lead to performance
bottlenecks and reliability/availability problems. Equally importantly,
however, a highly distributed solution can also facilitate pluralism in informing
users about internet content, which is crucial in order to preclude
the formation of information-resource monopolies and the biased visibility
of content from economically-powerful sources. To meet these challenges,
the work described here puts forward MINERVA{\^a}{\"i}¿½{\"i}¿½, a novel search
engine architecture, designed for scalability and efficiency. MINERVA{\^a}{\"i}¿½{\"i}¿½
encompasses a suite of novel algorithms, including algorithms for creating
data networks of interest, placing data on network nodes, load balancing,
top-k algorithms for retrieving data at query time, and replication algorithms
for expediting top-k query processing. We have implemented the
proposed architecture and we report on our extensive experiments with
real-world, web-crawled, and synthetic data and queries, showcasing the
scalability and efficiency traits of MINERVA{\^a}{\"i}¿½{\"i}¿½.
Abstract: In this work we extend the population protocol model of Angluin et al., in
order to model more powerful networks of very small resource limited
artefacts (agents) that is possible to follow some unpredictable passive
movement. These agents communicate in pairs according to the commands of
an adversary scheduler. A directed (or undirected) communication graph
encodes the following information: each edge (u,\~{o}) denotes that during the
computation it is possible for an interaction between u and \~{o} to happen in
which u is the initiator and \~{o} the responder. The new characteristic of
the proposed mediated population protocol model is the existance of a
passive communication provider that we call mediator. The mediator is a
simple database with communication capabilities. Its main purpose is to
maintain the permissible interactions in communication classes, whose
number is constant and independent of the population size. For this reason
we assume that each agent has a unique identifier for whose existence the
agent itself is not informed and thus cannot store it in its working
memory. When two agents are about to interact they send their ids to the
mediator. The mediator searches for that ordered pair in its database and
if it exists in some communication class it sends back to the agents the
state corresponding to that class. If this interaction is not permitted to
the agents, or, in other words, if this specific pair does not exist in
the database, the agents are informed to abord the interaction. Note that
in this manner for the first time we obtain some control on the safety of
the network and moreover the mediator provides us at any time with the
network topology. Equivalently, we can model the mediator by communication
links that are capable of keeping states from a edge state set of constant
cardinality. This alternative way of thinking of the new model has many
advantages concerning the formal modeling and the design of protocols,
since it enables us to abstract away the implementation details of the
mediator. Moreover, we extend further the new model by allowing the edges
to keep readable only costs, whose values also belong to a constant size
set. We then allow the protocol rules for pairwise interactions to modify
the corresponding edge state by also taking into account the costs. Thus,
our protocol descriptions are still independent of the population size and
do not use agent ids, i.e. they preserve scalability, uniformity and
anonymity. The proposed Mediated Population Protocols (MPP) can stably
compute graph properties of the communication graph. We show this for the
properties of maximal matchings (in undirected communication graphs), also
for finding the transitive closure of directed graphs and for finding all
edges of small cost. We demonstrate that our mediated protocols are
stronger than the classical population protocols. First of all we notice
an obvious fact: the classical model is a special case of the new model,
that is, the new model can compute at least the same things with the
classical one. We then present a mediated protocol that stably computes
the product of two nonnegative integers in the case where G is complete
directed and connected. Such kind of predicates are not semilinear and it
has been proven that classical population protocols in complete graphs can
compute precisely the semilinear predicates, thus in this manner we show
that there is at least one predicate that our model computes and which the
classical model cannot compute. To show this fact, we state and prove a
general Theorem about the composition of two mediated population
protocols, where the first one has stabilizing inputs. We also show that
all predicates stably computable in our model are (non-uniformly) in the
class NSPACE(m), where m is the number of edges of the communication
graph. Finally, we define Randomized MPP and show that, any Peano
predicate accepted by a Randomized MPP, can be verified in deterministic
polynomial time.
Abstract: We present SeAl1, a novel data/resource and data-access management infrastructure designed for the purpose of addressing a key problem in P2P data sharing networks, namely the problem of wide-scale selfish peer behavior. Selfish behavior has been manifested and well documented and it is widely accepted that unless this is dealt with, the scalability, efficiency, and the usefulness of P2P sharing networks will be diminished. SeAl essentially consists of a monitoring/accounting subsystem, an auditing/verification subsystem, and incentive mechanisms. The monitoring subsystem facilitates the classification of peers into selfish/altruistic. The auditing/verification layer provides a shield against perjurer/slandering and colluding peers that may try to cheat the monitoring subsystem. The incentives mechanisms efectively utilize these layers so to increase the computational/networking and data resources that are available to the community. Our extensive performance results show that SeAl performs its tasks swiftly, while the overhead introduced by our accounting and auditing mechanisms in terms of response time, network, and storage overheads are very small.
Abstract: We present SeAl, a novel data/resource and data-access management infrastructure designed for the purpose of addressing a key problem in P2P data sharing networks, namely the problem of wide-scale selfish peer behavior. Selfish behavior has been manifested and well documented and it is widely accepted that unless this is dealt with, the scalability, efficiency, and the usefulness of P2P sharing networks will be diminished. SeAl essentially consists of a monitoring/accounting subsystem, an auditing/verification subsystem, and incentive mechanisms. The monitoring subsystem facilitates the classification of peers into selfish/altruistic. The auditing/verification layer provides a shield against perjurer/slandering and colluding peers that may try to cheat the monitoring subsystem. The incentives mechanisms effectively utilize these layers so to increase the computational/networking and data resources that are available to the community. Our extensive performance results show that SeAl performs its tasks swiftly, while the overhead introduced by our accounting and auditing mechanisms in terms of response time, network, and storage overheads are very small.
Abstract: Efficient query processing in traditional database
management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statisticsmanagement still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured
overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge.
To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation
of these tools for distributed statistical synopses,
and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability.