Aigaion: RACTI / RU1 Technical Report Series (Web Based)

[RACTI-RU1-2007-7] Antoniou, Athanasios, Chatzigiannakis, Ioannis, Kinalis, Athanasios, Mylonas, Georgios, Nikoletseas, Sotiris and Papageorgiou, Apostolos, A Peer-to-Peer Environment for Monitoring Multiple Wireless Sensor Networks, in: 10th ACM/IEEE Symposium on Modeling, Analysis and Simulation of Wireless and Mobile Systems (MSWIM 2007), pages 132-135, ACM/IEEE, ACM/IEEE, Chania, Crete Island, Greece, 2007.
Abstract: In this work we present the basic concepts in the architecture of a peer-to-peer environment for monitoring multiple wireless sensor networks, called ShareSense. ShareSense, which is currently under development, uses JXTA as a peer-to-peer substrate. We demonstrate its basic functionalities using a simple application scenario, which utilizes multiple disparate wireless sensor networks. This application scenario involves monitoring of such networks using a separate management environment and a custom application GUI, as well as using Google Earth as an additional user interface.
[RACTI-RU1-2009-12] Chatzigiannakis, Ioannis, Koninis, Christos, Mylonas, Georgios, Colesanti, Ugo and Vitaletti, Andrea, A Peer-to-Peer Framework for Globally-Available Sensor Networks and its Application in Building Management, in: 2nd International Workshop on Sensor Network Engineering (IWSNE 2009), Marina Del Rey, CA, USA, 2009.
Abstract: In this work we present the architecture and implementation of WebDust, a software platform for managing multiple, heterogeneous (both in terms of software and hardware), geographically disparate sensor networks. We describe in detail the main concepts behind its design, and basic aspects of its implementation, including the services provided to end-users and developers. WebDust uses a peer-to-peer substrate, based on JXTA, in order to unify multiple sensor networks installed in various geographic areas. We aim at providing a software framework that will permit developers to deal with the new and critical aspects that networks of sensors and tiny devices bring into global computing, and to provide a coherent set of high level services, design rules and technical recommendations, in order to be able to develop the envisioned applications of global sensor networks. Furthermore, we give an overview of a deployed distributed testbed, consisting of a total 56 nodes and describing in more detail two specific testbed sites and the integration of the related software and hardware technologies used for its operation with our platform. Finally, we describe the design and implementation of an interface option provided to end-users, based on the popular Google Earth application.
[RACTI-RU1-2010-33] Athanassopoulos, Stavros, Kaklamanis, Christos, Laftsidis, Ilias and Papaioannou, Evi, An Experimental Study of Greedy Routing Algorithms, in: The 2010 International Confernce onHigh Performance Computing & Simulation (HPCS), France, 2010.
Abstract: The “small world” phenomenon, i.e., the fact that the global social network is strongly connected in the sense that every two persons are inter-related through a small chain of friends, has attracted research attention and has been strongly related to the results of the social psychologist�s Stanley Milgram experiments; properties of social networks and relevant problems also emerge in peer-to-peer systems and their study can shed light on important modern network design properties. In this paper, we have experimentally studied greedy routing algorithms, i.e., algorithms that route information using “long-range” connections that function as shortcuts connecting “distant” network nodes. In particular, we have implemented greedy routing algorithms, and techniques from the recent literature in networks of line and grid topology using parallelization for increasing efficiency. To the best of our knowledge, no similar attempt has been made so far
[RACTI-RU1-2008-77] Mylonas, Georgios, Design, Simulation and Experimental Development of Data Propagation Protocols and Applications for Wireless Sensor Networks, Department of Computer Engineering and Informatics, University of Patras, 2008.
Abstract: Wireless sensor networks are a recently introduced category of ad hoc computer networks, which are comprised by nodes of small size and limited computing and energy resources. Such nodes are able of measuring physical properties such as temperature, humidity, etc., wireless communication between each other and in some cases interaction with their surrounding environments (through the use of electromechanical parts). As these networks have begun to be widely available (in terms of cost and commercial hardware availability), their field of application and philosophy of use is constantly evolving. We have numerous examples of their applications, ranging from monitoring the biodiversity of a specific outdoor area to structural health monitoring of bridges, and also networks ranging from few tens of nodes to even thousands of nodes. In this PhD thesis we investigated the following basic research lines related to wireless sensor networks: a) their simulation, b) the development of data propagation protocols suited to such networks and their evaluation through simulation, c) the modelling of ``hostile'' circumstances (obstacles) during their operation and evaluation of their impact through simulation, d) the development of a sensor network management application. Regarding simulation, we initially placed an emphasis to issues such as the effective simulation of networks of several thousands of nodes, and in that respect we developed a network simulator (simDust), which is extendable through the addition of new data propagation protocols and visualization capabilities. This simulator was used to evaluate the performance of a number of characteristic data propagation protocols for wireless sensor networks. Furthermore, we developed a new protocol (VRTP) and evaluated its performance against other similar protocols. Our studies show that the new protocol, that uses dynamic changes of the transmission range of the network nodes, performs better in certain cases than other related protocols, especially in networks containing obstacles and in the case of non-homogeneous placement of nodes. Moreover, we emphasized on the addition of ``realistic'' conditions to the simulation of such protocols, that have an adversarial effect on their operation. Our goal was to introduce a model for obstacles that adds little computational overhead to a simulator, and also study the effect of the inclusion of such a model on data propagation protocols that use geographic information (absolute or relative). Such protocols are relatively sensitive to dynamic topology changes and network conditions. Through our experiments, we show that the inclusion of obstacles during simulation can have a significant effect on these protocols. Finally, regarding applications, we initially proposed an architecture (WebDust/ShareSense), for the management of such networks, that would provide basic capabilities of managing such networks and developing applications above it. Features that set it apart are the capability of managing multiple heterogeneous sensor networks, openess, the use of a peer-to-peer architecture for the interconnection of multiple sensor network. A large part of the proposed architecture was implemented, while the overall architecture was extended to also include additional visualization capabilities.
[RACTI-RU1-2006-9] Bender, Matthias, Michel, Sebastian, Ntarmos, Nikos, Triantafillou, Peter, Weikum, Gerhard and Zimmer, Christian, Discovering and Exploiting Keyword and Attribute-Value Co-occurrences to Improve P2P Routing Indices, in: 15th ACM Conference on Information and Knowledge Management (CIKM 2006), 2006.
Abstract: Peer-to-Peer (P2P) search requires intelligent decisions for query routing: selecting the best peers to which a given query, initiated at some peer, should be forwarded for retrieving additional search results. These decisions are based on statistical summaries for each peer, which are usually organized on a per-keyword basis and managed in a distributed directory of routing indices. Such architectures disregard the possible correlations among keywords. Together with the coarse granularity of per-peer summaries, which are mandated for scalability, this limitation may lead to poor search result quality. This paper develops and evaluates two solutions to this problem, sk-STAT based on single-key statistics only, and mk-STAT based on additional multi-key statistics. For both cases, hash sketch synopses are used to compactly represent a peer's data items and are efficiently disseminated in the P2P network to form a decentralized directory. Experimental studies with Gnutella and Web data demonstrate the viability and the trade-offs of the approaches.
[RACTI-RU1-2009-87] Ntarmos, Nikos, Triantafillou, Peter and Weikum, Gerhard, Distributed Hash Sketches: Scalable, Efficient, and Accurate Cardinality Estimation for Distributed Multisets, in: ACM Transactions on Computer Systems, ACM TOCS, 2009.
Abstract: Counting items in a distributed system, and estimating the cardinality of multisets in particular, is important for a large variety of applications and a fundamental building block for emerging Internet-scale information systems. Examples of such applications range from optimizing query access plans in peer-to-peer data sharing, to computing the significance (rank/score) of data items in distributed information retrieval. The general formal problem addressed in this article is computing the network-wide distinct number of items with some property (e.g., distinct files with file name containing “spiderman”) where each node in the network holds an arbitrary subset, possibly overlapping the subsets of other nodes. The key requirements that a viable approach must satisfy are: (1) scalability towards very large network size, (2) efficiency regarding messaging overhead, (3) load balance of storage and access, (4) accuracy of the cardinality estimation, and (5) simplicity and easy integration in applications. This article contributes the DHS (Distributed Hash Sketches) method for this problem setting: a distributed, scalable, efficient, and accurate multiset cardinality estimator. DHSis based on hash sketches for probabilistic counting, but distributes the bits of each counter across network nodes in a judicious manner based on principles of Distributed Hash Tables, paying careful attention to fast access and aggregation as well as update costs. The article discusses various design choices, exhibiting tunable trade-offs between estimation accuracy, hop-count efficiency, and load distribution fairness. We further contribute a full-fledged, publicly available, open-source implementation of all our methods, and a comprehensive experimental evaluation for various settings.
[RACTI-RU1-2006-15] Michel, Sebastian, Bender, Matthias, Triantafillou, Peter and Weikum, Gerhard, Global Document Frequency Estimation in Peer-to-Peer Web Search, in: 9th International Workshop on the Web and Databases (WebDB 2006), pages 62-67, 2006.
Abstract: Information retrieval (IR) in peer-to-peer (P2P) networks, where the corpus is spread across many loosely coupled peers, has recently gained importance. In contrast to IR systems on a centralized server or server farm, P2P IR faces the additional challenge of either being oblivious to global corpus statistics or having to compute the global measures from local statistics at the individual peers in an efficient, distributed manner. One specific measure of interest is the global document frequency for different terms, which would be very beneficial as term-specific weights in the scoring and ranking of merged search results that have been obtained from different peers. This paper presents an efficient solution for the problem of estimating global document frequencies in a large-scale P2P network with very high dynamics where peers can join and leave the network on short notice. In particular, the developed method takes into account the fact that the lo- cal document collections of autonomous peers may arbitrar- ily overlap, so that global counting needs to be duplicate- insensitive. The method is based on hash sketches as a technique for compact data synopses. Experimental stud- ies demonstrate the estimator?s accuracy, scalability, and ability to cope with high dynamics. Moreover, the benefit for ranking P2P search results is shown by experiments with real-world Web data and queries.
[RACTI-RU1-2009-81] Rinis, Ilias and Chatzigiannakis, Ioannis, Interconnecting Multiple Remote Heterogeneous Networks with the Internet, in: 3rd Panhellenic Student Conference on Informatics, EUREKA 2009, Corfu, Greece, 2009.
Abstract: Wireless networks are nowadays widely used and constantly evolving; we are constantly surrounded by numerous such networks. A modern idea is to exploit this variety and aggregate all available connections together towards improving Internet connectivity. In this paper we present our design of a networking system that enables participating entities to connect with each other and share their possible broadband connections. Every connected entity will gain broadband connection access from the ones that have and share one, independently from how many connections are available and how fast they are. Our goal is to improve the broadband connections' utilization and aim towards fault tolerance on these connections. To this end, the system is built on top of a peer-to-peer network, where the participating entities share their connections according to various scheduling algorithms.
[RACTI-RU1-2006-17] Michel, Sebastian, Bender, Matthias, Triantafillou, Peter and Weikum, Gerhard, IQN Routing: Integrating Quality and Novelty in P2P Querying and Ranking, in: 10th International Conference on Extending Database Technology (EDBT 2006), pages 62-67, 2006.
Abstract: Information retrieval (IR) in peer-to-peer (P2P) networks, where the corpus is spread across many loosely coupled peers, has recently gained importance. In contrast to IR systems on a centralized server or server farm, P2P IR faces the additional challenge of either being oblivious to global corpus statistics or having to compute the global measures from local statistics at the individual peers in an efficient, distributed manner. One specific measure of interest is the global document frequency for different terms, which would be very beneficial as term-specific weights in the scoring and ranking of merged search results that have been obtained from different peers. This paper presents an efficient solution for the problem of estimating global document frequencies in a large-scale P2P network with very high dynamics where peers can join and leave the network on short notice. In particular, the developed method takes into account the fact that the lo- cal document collections of autonomous peers may arbitrar- ily overlap, so that global counting needs to be duplicate- insensitive. The method is based on hash sketches as a technique for compact data synopses. Experimental stud- ies demonstrate the estimator?s accuracy, scalability, and ability to cope with high dynamics. Moreover, the benefit for ranking P2P search results is shown by experiments with real-world Web data and queries.
[RACTI-RU1-2007-32] Pitoura, Theoni and Triantafillou, Peter, Load Distribution Fairness in P2P Data Management Systems, in: the IEEE 23rd International Conference on Data Engineering (ICDE 2007), pages 396-405, 2007.
Abstract: We address the issue of measuring storage, or query load distribution fairness in peer-to-peer data management systems. Existing metrics may look promising from the point of view of specific peers, while in reality being far from optimal from a global perspective. Thus, first we define the requirements and study the appropriateness of various statistical metrics for measuring load distribution fairness towards these requirements. The metric proposed as most appropriate is the Gini coefficient (G). Second, we develop novel distributed sampling algorithms to compute G on-line, with high precision, efficiently, and scalably. Third, we show how G can readily be utilized on-line by higher-level algorithms which can now know when to best intervene to correct load imbalances. Our analysis and experiments testify for the efficiency and accuracy of these algorithms, permitting the online use of a rich and reliable metric, conveying a global perspective of the distribution.
[RACTI-RU1-2003-14] Triantafillou, Peter, Ntarmos, Nikos, Nikoletseas, Sotiris and Spirakis, Paul, NanoPeer Networks and P2P Worlds, in: 3rd International Conference on Peer-to-Peer Computing (P2P 2003), pages 40-46, IEEE Computer Society, 2003. [DOI]
Abstract: We present the NanoPeers architecture paradigm, a peer-to-peer network of lightweight devices, lacking all or most of the capabilities of their computer-world counterparts. We identify the problems arising when we apply current routing and searching methods to this nano-world, and present some initial solutions, using a case study of a sensor network instance; Smart Dust. Furthermore, we propose the P2P Worlds framework as a hybrid P2P architecture paradigm, consisting of cooperating layers of P2P networks, populated by computing entities with escalating capabilities. Our position is that (i) experience gained through research and experimentation in the field of P2P computing, can be indispensable when moving down the stair of computing capabilities, and that (ii) the proposed framework can be the basis of numerous real-world applications, opening up several challenging research problems.
[RACTI-RU1-2005-69] Makris, Christos, Sakkopoulos, E., Sioutas, Spyros, Triantafillou, Peter, Tsakalidis, Athanasios and Vassiliadis, B., NIPPERS: Network of InterPolated PeERS for Web Service Discovery, in: International Conference on Information Technology (ITCC 2005), pages 193-198, 2005. [DOI]
Abstract: Web services are becoming an important enabler of the Semantic Web. Besides the need for a rich description mechanism, Web Service information should be made available in an accessible way for machine processing. In this paper, we propose a new P2P basedapproach for Web Services discovery. Peers that store Web Services information, such as data item descriptions, are efficiently located using a scalable and robust data indexing structure for Peer-to-Peer data networks, NIPPERS. We present a theoretical analysis which shows that the communication cost of the query and update operations scale double-logarithmically with the number of NIPPERS nodes. Furthermore, we show that the network is robust with respect to failures fulfilling quality of web services requirements.
[RACTI-RU1-2007-29] Aekaterinidis, Ioannis and Triantafillou, Peter, Publish/Subscribe Information Delivery with Substring Predicates, in: IEEE Internet Computing, 2007.
Abstract: The content-based publish/subscribe (pub/sub) paradigm for system design is becoming increasingly popular, offering unique benefits for a large number of data-intensive applications. Coupled with the peer-to-peer technology, it can serve as a central building block for developing data-dissemination applications deployed over a large-scale network infrastructure. A key open problem towards the creation of large-scale content-based pub/sub infrastructures relates to efficiently and accu- rately matching subscriptions with substring predicates to incoming events. This work addresses this issue.
[RACTI-RU1-2006-71] Aekaterinidis, Ioannis and Triantafillou, Peter, Publish/Subscribe Information Delivery With Substring Predicates, 2006.
Abstract: The content-based publish/subscribe (pub/sub)paradigm for system design is becoming increasingly popular, offering unique benefits for a large number of data-intensive applications. Coupled with the peer-to-peer technology, it can serve as a central building block for such applications deployed over a large-scale network infrastructure. A key problem toward the creation of large-scale contentbased pub/sub infrastructures relates to dealing efficiently with continuous queries (subscriptions) with rich predicates on string attributes; In particular, efficiently and accurately matching substring queries to incoming events is an open problem. In this work we study this problem. We provide and analyze novel algorithms for processing subscriptions with substring predicates and events in a variety of environments. We provide experimental data demonstrating the relative performance behavior of the proposed algorithms using as key metrics the network bandwidth requirements, number of messages, load balancing, as well as requirements for extra routing state (and related maintenance) and design flexibility.
[RACTI-RU1-2006-11] Pitoura, Theoni, Ntarmos, Nikos and Triantafillou, Peter, Replication, Load Balancing, and Efficient Range Query processing in DHT Data Networks, in: 10th International Conference on Extending Database Technology (EDBT 2006), 2006.
Abstract: We consider the conflicting problems of ensuring data-access load balancing and efficiently processing range queries on peer-to-peer data networks maintained over Distributed Hash Tables (DHTs). Placing consecutive data values in neighboring peers is frequently used in DHTs since it accelerates range query processing. However, such a placement is highly susceptible to load imbalances, which are preferably handled by replicating data (since replication also introduces fault tolerance benefits). In this paper, we present HotRoD, a DHT-based architecture that deals effectively with this combined problem through the use of a novel locality-preserving hash function, and a tunable data replication mechanism which allows trading off replication costs for fair load distribution. Our detailed experimentation study shows strong gains in both range query processing efficiency and data-access load balancing, with low replication overhead. To our knowledge, this is the first work that concurrently addresses the two conflicting problems using data replication.
[RACTI-RU1-2005-55] Pitoura, Theoni, Ntarmos, Nikos and Triantafillou, Peter, Replication, Load Balancing, and Efficient Range Query Processing in DHTs, in: 4th Hellenic Data Management Symposium (HDMS 2005), pages 131-148, Springer Berlin / Heidelberg, 2005. [DOI]
Abstract: We consider the conflicting problems of ensuring data-access load balancing and efficiently processing range queries on peer-to-peer data networks maintained over Distributed Hash Tables (DHTs). Placing consecutive data values in neighboring peers is frequently used in DHTs since it accelerates range query processing. However, such a placement is highly susceptible to load imbalances, which are preferably handled by replicating data (since replication also introduces fault tolerance benefits). In this paper, we present HotRoD, a DHT-based architecture that deals effectively with this combined problem through the use of a novel locality-preserving hash function, and a tunable data replication mechanism which allows trading off replication costs for fair load distribution. Our detailed experimentation study shows strong gains in both range query processing efficiency and data-access load balancing, with low replication overhead. To our knowledge, this is the first work that concurrently addresses the two conflicting problems using data replication.
[RACTI-RU1-2009-88] Ntarmos, Nikos, Triantafillou, Peter and Weikum, Gerhard, Statistical Structures for Internet-Scale Data Management, in: Statistical Structures for Internet-Scale Data Management, 2009.
Abstract: Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statisticsmanagement still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability.
[RACTI-RU1-2007-33] Aekaterinidis, Ioannis and Triantafillou, Peter, Substring Matching in P2P Publish/Subscribe Data Management Networks, in: IEEE 23rd International Conference on Data Engineering (ICDE 2007), pages 1390-1394, 2007.
Abstract: The content-based publish/subscribe (pub/sub) paradigm for system design is becoming increasingly popular, offering unique benefits for a large number of data-intensive applications. Coupled with the peer-to-peer technology, it can serve as a central building block for such applications deployed over a large-scale network infrastructure. A key problem toward the creation of large-scale content-based pub/sub infrastructures relates to dealing efficiently with continuous queries (subscriptions) with rich predicates on string attributes; in this work we study the problem of efficiently and accurately matching substring queries to incoming events.
[RACTI-RU1-2003-38] Triantafillou, Peter and Pitoura, Theoni, Towards a Unifying Framework for Complex Query Processing over Structured Peer-to-Peer Data Networks, in: Databases, Information Systems, and Peer-to-Peer Computing (DBISP2P 2003), pages 169-183, Springer, Berlin, Germany, 2003. [DOI]
Abstract: In this work we study how to process complex queries in DHT-based Peer-to-Peer (P2P) data networks. Queries are made over tuples and relations and are expressed in a query language, such as SQL. We describe existing research approaches for query processing in P2P systems, we suggest improvements and enhancements, and propose a unifying framework that consists of a modified DHT architecture, data placement and search algorithms, and provides efficient support for processing a variety of query types, including queries with one or more attributes, queries with selection operators (involving equality and range queries), and queries with join operators. To our knowledge, this is the first work that puts forth a framework providing support for all these query types.
[RACTI-RU1-2003-31] Triantafillou, Peter, Xiruhaki, C., Koubarakis, Manolis and Ntarmos, Nikos, Towards High Performance Peer-to-Peer Content and Resource Sharing Systems, in: Conference on Innovative Data Systems Research (CIDR 2003), Asilomar, CA, 2003.
Abstract: Peer-to-peer sharing systems are becoming increasingly popular and an exciting new class of innovative, internet-based data management systems. In these systems, users contribute their own resources (processing units and storage devices) and content (i.e., documents) to the P2P community. We focus on the management of content and resources in such systems. Our goal is to harness all available resources in the P2P network so that the users can access all available content efficiently. Efficiency is taken both from (i) the point of view of the system, in that we strive to ensure fair load distribution among all peer nodes, and (ii) from the point of view of the users, in that we strive to ensure low user-request response times. We propose a novel architecture for this new class of applications, which differs drastically from what is either found currently in existing products or proposed in academia. We contribute and study novel solutions that achieve our goals, while at the same time addressing the formidable challenges due to the autonomy of peers, their heterogeneous processing and storage capacities, their different content contributions, the huge system scale, and the highly dynamic system environment.
[RACTI-RU1-2005-9] Weikum, Gerhard, Hales, David, Schindelhauer, Christian and Triantafillou, Peter, Towards Self-Organizing Query Routing and Processing for Peer-to-Peer Web Search, in: European Conference on Complex Systems (ECCS 2005), 2005.
Abstract: The peer-to-peer computing paradigm is an intriguing alternative to Google-style search engines for querying and ranking Web content. In a network with many thousands or millions of peers the storage and access load requirements per peer are much lighter than for a centralized Google-like server farm; thus more powerful techniques from information retrieval, statistical learning, computational linguistics, and ontological reasoning can be employed on each peer�s local search engine for boosting the quality of search results. In addition, peers can dynamically collaborate on advanced and particularly difficult queries. Moroever, a peer-to-peer setting is ideally suited to capture local user behavior, like query logs and click streams, and disseminate and aggregate this information in the network, at the discretion of the corresponding user, in order to incorporate richer cognitive models. This paper gives an overview of ongoing work in the EU Integrated Project DELIS that aims to develop foundations for a peer-to-peer search engine with Google-or-better scale, functionality, and quality, which will operate in a completely decentralized and self-organizing manner. The paper presents the architecture of such a system and the Minerva prototype testbed, and it discusses various core pieces of the approach: efficient execution of top-k ranking queries, strategies for query routing when a search request needs to be forwarded to other peers, maintaining a self-organizing semantic overlay network, and exploiting and coping with user and community behavior.
[RACTI-RU1-2008-65] Koninis, Christos, Mylonas, Georgios and Chatzigiannakis, Ioannis, Webdust: A Platform for Supporting Development of Large Wireless Sensor & Actor Networks Applications, pages 19-21, 1st Student Workshop on Wireless Sensor Networks (SWWSN 2008), Athens - Greece, 2008.
Abstract: We present the basic concepts behind the design and implementation of WebDust, a peer-to-peer platform for organizing, monitoring and controlling wireless sensor networks, along with a discussion of its application regarding an actual testbed. Our software architecture provides a range of services that allow to create customized applications with relatively low implementation overhead. WebDust aims to allow heterogeneous components to operate in the same sensor network, and give the ability to manage and control large numbers of such networks, possibly on a global scale. We also give insight to several applications that can be implemented using our platform, and a description of our current testbed.