Abstract: In this work we present the basic concepts in the architecture of a peer-to-peer environment for monitoring multiple wireless sensor networks, called ShareSense. ShareSense, which is currently under development, uses JXTA as a peer-to-peer substrate. We demonstrate its basic functionalities using a simple application scenario, which utilizes multiple disparate wireless sensor networks. This application scenario involves monitoring of such networks using a separate management environment and a custom application GUI, as well as using Google Earth as an additional user interface.
Abstract: In this work we present the architecture and implementation of WebDust, a software platform for managing multiple, heterogeneous (both in terms of software and hardware), geographically disparate sensor networks. We describe in detail the main concepts behind its design, and basic aspects of its implementation, including the services provided to end-users and developers. WebDust uses a peer-to-peer substrate, based on JXTA, in order to unify multiple sensor networks installed in various geographic areas. We aim at providing a software framework that will permit developers to deal with the new and critical aspects that networks of sensors and tiny devices bring into global computing, and to provide a coherent set of high level services, design rules and technical recommendations, in order to be able to develop the envisioned applications of global sensor networks. Furthermore, we give an overview of a deployed distributed testbed, consisting of a total 56 nodes and describing in more detail two specific testbed sites and the integration of the related software and hardware technologies used for its operation with our platform. Finally, we describe the design and implementation of an interface option provided to end-users, based on the popular Google Earth application.
Abstract: The “small world” phenomenon, i.e., the fact that the
global social network is strongly connected in the sense
that every two persons are inter-related through a small
chain of friends, has attracted research attention and has
been strongly related to the results of the social
psychologist¢s Stanley Milgram experiments; properties
of social networks and relevant problems also emerge in
peer-to-peer systems and their study can shed light on
important modern network design properties.
In this paper, we have experimentally studied greedy
routing algorithms, i.e., algorithms that route information
using “long-range” connections that function as
shortcuts connecting “distant” network nodes. In
particular, we have implemented greedy routing
algorithms, and techniques from the recent literature in
networks of line and grid topology using parallelization
for increasing efficiency. To the best of our knowledge, no
similar attempt has been made so far
Abstract: Wireless sensor networks are a recently introduced category of ad hoc computer networks, which are comprised by nodes of small size and limited computing and energy resources. Such nodes are able of measuring physical properties such as temperature, humidity, etc., wireless communication between each other and in some cases interaction with their surrounding environments (through the use of electromechanical parts).
As these networks have begun to be widely available (in terms of cost and commercial hardware availability), their field of application and philosophy of use is constantly evolving. We have numerous examples of their applications, ranging from monitoring the biodiversity of a specific outdoor area to structural health monitoring of bridges, and also networks ranging from few tens of nodes to even thousands of nodes.
In this PhD thesis we investigated the following basic research lines related to wireless sensor networks:
a) their simulation,
b) the development of data propagation protocols suited to such networks and their evaluation through simulation,
c) the modelling of ``hostile'' circumstances (obstacles) during their operation and evaluation of their impact through simulation,
d) the development of a sensor network management application.
Regarding simulation, we initially placed an emphasis to issues such as the effective simulation of networks of several thousands of nodes, and in that respect we developed a network simulator (simDust), which is extendable through the addition of new data propagation protocols and visualization capabilities. This simulator was used to evaluate the performance of a number of characteristic data propagation protocols for wireless sensor networks. Furthermore, we developed a new protocol (VRTP) and evaluated its performance against other similar protocols. Our studies show that the new protocol, that uses dynamic changes of the transmission range of the network nodes, performs better in certain cases than other related protocols, especially in networks containing obstacles and in the case of non-homogeneous placement of nodes.
Moreover, we emphasized on the addition of ``realistic'' conditions to the simulation of such protocols, that have an adversarial effect on their operation. Our goal was to introduce a model for obstacles that adds little computational overhead to a simulator, and also study the effect of the inclusion of such a model on data propagation protocols that use geographic information (absolute or relative). Such protocols are relatively sensitive to dynamic topology changes and network conditions. Through our experiments, we show that the inclusion of obstacles during simulation can have a significant effect on these protocols.
Finally, regarding applications, we initially proposed an architecture (WebDust/ShareSense), for the management of such networks, that would provide basic capabilities of managing such networks and developing applications above it. Features that set it apart are the capability of managing multiple heterogeneous sensor networks, openess, the use of a peer-to-peer architecture for the interconnection of multiple sensor network. A large part of the proposed architecture was implemented, while the overall architecture was extended to also include additional visualization capabilities.
Abstract: Peer-to-Peer (P2P) search requires intelligent decisions for query routing: selecting the best peers to which a given query, initiated at some peer, should be forwarded for retrieving additional search results. These decisions are based on statistical summaries for each peer, which are usually organized on a per-keyword basis and managed in a distributed directory of routing indices. Such architectures disregard the possible correlations among keywords. Together with the coarse granularity of per-peer summaries, which are mandated for scalability, this limitation may lead to poor search result quality.
This paper develops and evaluates two solutions to this problem, sk-STAT based on single-key statistics only, and mk-STAT based on additional multi-key statistics. For both cases, hash sketch synopses are used to compactly represent a peer's data items and are efficiently disseminated in the P2P network to form a decentralized directory. Experimental studies with Gnutella and Web data demonstrate the viability and the trade-offs of the approaches.
Abstract: Counting items in a distributed system, and estimating the cardinality of multisets in particular,
is important for a large variety of applications and a fundamental building block for emerging Internet-scale information systems. Examples of such applications range from optimizing query access plans in peer-to-peer data sharing, to computing the significance (rank/score) of data items in distributed information retrieval. The general formal problem addressed in this article is computing the network-wide distinct number of items with some property (e.g., distinct files with file name
containing “spiderman”) where each node in the network holds an arbitrary subset, possibly overlapping the subsets of other nodes. The key requirements that a viable approach must satisfy are:
(1) scalability towards very large network size, (2) efficiency regarding messaging overhead, (3) load
balance of storage and access, (4) accuracy of the cardinality estimation, and (5) simplicity and easy
integration in applications. This article contributes the DHS (Distributed Hash Sketches) method
for this problem setting: a distributed, scalable, efficient, and accurate multiset cardinality estimator.
DHSis based on hash sketches for probabilistic counting, but distributes the bits of each counter
across network nodes in a judicious manner based on principles of Distributed Hash Tables, paying
careful attention to fast access and aggregation as well as update costs. The article discusses various
design choices, exhibiting tunable trade-offs between estimation accuracy, hop-count efficiency, and
load distribution fairness. We further contribute a full-fledged, publicly available, open-source implementation of all our methods, and a comprehensive experimental evaluation for various settings.
Abstract: Information retrieval (IR) in peer-to-peer (P2P) networks,
where the corpus is spread across many loosely coupled
peers, has recently gained importance. In contrast to IR
systems on a centralized server or server farm, P2P IR faces
the additional challenge of either being oblivious to global
corpus statistics or having to compute the global measures
from local statistics at the individual peers in an efficient,
distributed manner. One specific measure of interest is the
global document frequency for different terms, which would
be very beneficial as term-specific weights in the scoring and
ranking of merged search results that have been obtained
from different peers.
This paper presents an efficient solution for the problem
of estimating global document frequencies in a large-scale
P2P network with very high dynamics where peers can join
and leave the network on short notice. In particular, the
developed method takes into account the fact that the lo-
cal document collections of autonomous peers may arbitrar-
ily overlap, so that global counting needs to be duplicate-
insensitive. The method is based on hash sketches as a
technique for compact data synopses. Experimental stud-
ies demonstrate the estimator?s accuracy, scalability, and
ability to cope with high dynamics. Moreover, the benefit
for ranking P2P search results is shown by experiments with
real-world Web data and queries.
Abstract: Wireless networks are nowadays widely used and constantly evolving; we are constantly surrounded by numerous such networks. A modern idea is to exploit this variety and aggregate all available connections together towards improving Internet connectivity. In this paper we present our design of a networking system that enables participating entities to connect with each other and share their possible broadband connections. Every connected entity will gain broadband connection access from the ones that have and share one, independently from how many connections are available and how fast they are. Our goal is to improve the broadband connections' utilization and aim towards fault tolerance on these connections. To this end, the system is built on top of a peer-to-peer network, where the participating entities share their connections according to various scheduling algorithms.
Abstract: Information retrieval (IR) in peer-to-peer (P2P) networks,
where the corpus is spread across many loosely coupled
peers, has recently gained importance. In contrast to IR
systems on a centralized server or server farm, P2P IR faces
the additional challenge of either being oblivious to global
corpus statistics or having to compute the global measures
from local statistics at the individual peers in an efficient,
distributed manner. One specific measure of interest is the
global document frequency for different terms, which would
be very beneficial as term-specific weights in the scoring and
ranking of merged search results that have been obtained
from different peers.
This paper presents an efficient solution for the problem
of estimating global document frequencies in a large-scale
P2P network with very high dynamics where peers can join
and leave the network on short notice. In particular, the
developed method takes into account the fact that the lo-
cal document collections of autonomous peers may arbitrar-
ily overlap, so that global counting needs to be duplicate-
insensitive. The method is based on hash sketches as a
technique for compact data synopses. Experimental stud-
ies demonstrate the estimator?s accuracy, scalability, and
ability to cope with high dynamics. Moreover, the benefit
for ranking P2P search results is shown by experiments with
real-world Web data and queries.
Abstract: We address the issue of measuring storage, or query load distribution fairness in peer-to-peer data management systems. Existing metrics may look promising from the point of view of specific peers, while in reality being far from optimal from a global perspective. Thus, first we define the requirements and study the appropriateness of various statistical metrics for measuring load distribution fairness towards these requirements. The metric proposed as most appropriate is the Gini coefficient (G). Second, we develop novel distributed sampling algorithms to compute G on-line, with high precision, efficiently, and scalably. Third, we show how G can readily be utilized on-line by higher-level algorithms which can now know when to best intervene to correct load imbalances. Our analysis and experiments testify for the efficiency and accuracy of these algorithms, permitting the online use of a rich and reliable metric, conveying a global perspective of the distribution.
Abstract: We present the NanoPeers architecture paradigm, a
peer-to-peer network of lightweight devices, lacking all or
most of the capabilities of their computer-world counterparts.
We identify the problems arising when we apply current
routing and searching methods to this nano-world, and
present some initial solutions, using a case study of a sensor
network instance; Smart Dust. Furthermore, we propose
the P2P Worlds framework as a hybrid P2P architecture
paradigm, consisting of cooperating layers of P2P
networks, populated by computing entities with escalating
capabilities. Our position is that (i) experience gained
through research and experimentation in the field of P2P
computing, can be indispensable when moving down the
stair of computing capabilities, and that (ii) the proposed
framework can be the basis of numerous real-world applications,
opening up several challenging research problems.
Abstract: Web services are becoming an important enabler of the Semantic Web. Besides the need for a rich description mechanism, Web Service information should be made available in an accessible way for machine processing. In this paper, we propose a new P2P basedapproach for Web Services discovery. Peers that store Web Services information, such as data item descriptions, are efficiently located using a scalable and robust data indexing structure for Peer-to-Peer data networks, NIPPERS. We present a theoretical analysis which shows that the communication cost of the query and update operations scale double-logarithmically with the number of NIPPERS nodes. Furthermore, we show that the network is robust with respect to failures fulfilling quality of web services requirements.
Abstract: The content-based publish/subscribe (pub/sub) paradigm for system design is becoming increasingly popular, offering
unique benefits for a large number of data-intensive applications. Coupled with the peer-to-peer technology, it can serve as
a central building block for developing data-dissemination applications deployed over a large-scale network infrastructure.
A key open problem towards the creation of large-scale content-based pub/sub infrastructures relates to efficiently and accu-
rately matching subscriptions with substring predicates to incoming events. This work addresses this issue.
Abstract: The content-based publish/subscribe (pub/sub)paradigm for system design is becoming increasingly popular, offering unique benefits for a large number of data-intensive applications. Coupled with the peer-to-peer technology, it can serve as a central building block for such applications
deployed over a large-scale network infrastructure. A key problem toward the creation of large-scale contentbased pub/sub infrastructures relates to dealing efficiently with continuous queries (subscriptions) with rich predicates on string attributes; In particular, efficiently and accurately
matching substring queries to incoming events is an open problem. In this work we study this problem. We provide and analyze novel algorithms for processing subscriptions with substring predicates and events in a variety of environments. We provide experimental data demonstrating the
relative performance behavior of the proposed algorithms using as key metrics the network bandwidth requirements, number of messages, load balancing, as well as requirements for extra routing state (and related maintenance) and design flexibility.
Abstract: We consider the conflicting problems of ensuring data-access load balancing and efficiently processing range queries on peer-to-peer data networks maintained over Distributed Hash Tables (DHTs). Placing consecutive data values in neighboring peers is frequently used in DHTs since it accelerates range query processing. However, such a placement is highly susceptible to load imbalances, which are preferably handled by replicating data (since replication also introduces fault tolerance benefits). In this paper, we present HotRoD, a DHT-based architecture that deals effectively with this combined problem through the use of a novel locality-preserving hash function, and a tunable data replication mechanism which allows trading off replication costs for fair load distribution. Our detailed experimentation study shows strong gains in both range query processing efficiency and data-access load balancing, with low replication overhead. To our knowledge, this is the first work that concurrently addresses the two conflicting problems using data replication.
Abstract: We consider the conflicting problems of ensuring data-access load balancing and efficiently processing range queries on peer-to-peer data networks maintained over Distributed Hash Tables (DHTs). Placing consecutive data values in neighboring peers is frequently used in DHTs since it accelerates range query processing. However, such a placement is highly susceptible to load imbalances, which are preferably handled by replicating data (since replication also introduces fault tolerance benefits). In this paper, we present HotRoD, a DHT-based architecture that deals effectively with this combined problem through the use of a novel locality-preserving hash function, and a tunable data replication mechanism which allows trading off replication costs for fair load distribution. Our detailed experimentation study shows strong gains in both range query processing efficiency and data-access load balancing, with low replication overhead. To our knowledge, this is the first work that concurrently addresses the two conflicting problems using data replication.
Abstract: Efficient query processing in traditional database
management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statisticsmanagement still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured
overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge.
To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation
of these tools for distributed statistical synopses,
and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability.
Abstract: The content-based publish/subscribe (pub/sub)
paradigm for system design is becoming increasingly
popular, offering unique benefits for a large number of
data-intensive applications. Coupled with the peer-to-peer
technology, it can serve as a central building block for
such applications deployed over a large-scale network
infrastructure. A key problem toward the creation of
large-scale content-based pub/sub infrastructures relates to
dealing efficiently with continuous queries (subscriptions)
with rich predicates on string attributes; in this work we
study the problem of efficiently and accurately matching
substring queries to incoming events.
Abstract: In this work we study how to process complex queries in DHT-based
Peer-to-Peer (P2P) data networks. Queries are made over tuples and relations
and are expressed in a query language, such as SQL. We describe existing
research approaches for query processing in P2P systems, we suggest
improvements and enhancements, and propose a unifying framework that
consists of a modified DHT architecture, data placement and search algorithms,
and provides efficient support for processing a variety of query types, including
queries with one or more attributes, queries with selection operators (involving
equality and range queries), and queries with join operators. To our knowledge,
this is the first work that puts forth a framework providing support for all these
query types.
Abstract: Peer-to-peer sharing systems are becoming
increasingly popular and an exciting new class of
innovative, internet-based data management
systems. In these systems, users contribute their
own resources (processing units and storage
devices) and content (i.e., documents) to the P2P
community. We focus on the management of
content and resources in such systems. Our goal
is to harness all available resources in the P2P
network so that the users can access all available
content efficiently. Efficiency is taken both from
(i) the point of view of the system, in that we
strive to ensure fair load distribution among all
peer nodes, and (ii) from the point of view of the
users, in that we strive to ensure low user-request
response times.
We propose a novel architecture for this new
class of applications, which differs drastically
from what is either found currently in existing
products or proposed in academia. We contribute
and study novel solutions that achieve our goals,
while at the same time addressing the formidable
challenges due to the autonomy of peers, their
heterogeneous processing and storage capacities,
their different content contributions, the huge
system scale, and the highly dynamic system
environment.
Abstract: The peer-to-peer computing paradigm is an intriguing alternative to Google-style search
engines for querying and ranking Web content. In a network with many thousands or
millions of peers the storage and access load requirements per peer are much lighter
than for a centralized Google-like server farm; thus more powerful techniques from information
retrieval, statistical learning, computational linguistics, and ontological reasoning
can be employed on each peer¢s local search engine for boosting the quality
of search results. In addition, peers can dynamically collaborate on advanced and particularly
difficult queries. Moroever, a peer-to-peer setting is ideally suited to capture
local user behavior, like query logs and click streams, and disseminate and aggregate
this information in the network, at the discretion of the corresponding user, in order to
incorporate richer cognitive models.
This paper gives an overview of ongoing work in the EU Integrated Project DELIS
that aims to develop foundations for a peer-to-peer search engine with Google-or-better
scale, functionality, and quality, which will operate in a completely decentralized and
self-organizing manner. The paper presents the architecture of such a system and the
Minerva prototype testbed, and it discusses various core pieces of the approach: efficient
execution of top-k ranking queries, strategies for query routing when a search request
needs to be forwarded to other peers, maintaining a self-organizing semantic overlay
network, and exploiting and coping with user and community behavior.
Abstract: We present the basic concepts behind the design and implementation of WebDust, a peer-to-peer platform for organizing,
monitoring and controlling wireless sensor networks, along with a discussion of its application regarding an actual testbed.
Our software architecture provides a range of services that allow to create customized applications with relatively low
implementation overhead. WebDust aims to allow heterogeneous components to operate in the same sensor network, and
give the ability to manage and control large numbers of such networks, possibly on a global scale. We also give insight to
several applications that can be implemented using our platform, and a description of our current testbed.