Aigaion: RACTI / RU1 Technical Report Series (Web Based)

[RACTI-RU1-2005-20] Chatzigiannakis, Ioannis and Nikoletseas, Sotiris, A Forward Planning Situated Protocol for Data Propagation in Wireless Sensor Networks based on Swarm Intelligence Techniques, in: 17th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2005), pages 214, ACM Press, ACM, Las Vegas, NV, USA, 2005.
Abstract: We here present the Forward Planning Situated Protocol (FPSP), for scalable, energy efficient and fault tolerant data propagation in situated wireless sensor networks. To deal with the increased complexity of such deeply networked sensor systems, instead of emphasizing on a particular aspect of the services provided, i.e. either for low-energy periodic, or low-latency event-driven, or high-success query-based sensing, FPSP uses two novel mechanisms that allow the network operator to adjust the performance of the protocol in terms of energy, latency and success rate on a per-task basis. We emphasize on distributedness, direct or indirect interactions among relatively simple agents, flexibility and robustness. The protocol operates by employing a series of plan & forward phases through which devices self-organize into forwarding groups that propagate data over discovered paths. FPSP performs a limited number of long range, high power data transmissions to collect information regarding the neighboring devices. The acquired information, allows to plan a (parameterizable long by {\"e}) sequence of short range, low power transmissions between nearby particles, based on certain optimization criteria. All particles that decide to respond (based on local criteria) to these long range transmissions enter the forwarding phase during which information is propagated via the acquired plan. Clearly, the duration of the forwarding phases is characterized by the parameter {\"e}, the transmission medium and the processing speed of the devices. In fact the parameter {\"e} provides a mechanism to adjust the protocol performance in terms of the latency--energy trade-off. By reducing {\"e} the latency is reduced at the cost of spending extra energy, while by increasing {\"e}, the energy dissipation is reduced but the latency is increased. To control the success rate--energy trade-off, particles react locally on environment and context changes by using a set of rules that are based on response thresholds that relate individual-level plasticity with network-level resiliency, motivated by the nature-inspired method for dividing labor, a metaphor of social insect behavior for solving problems [1]. Each particle has an individual response threshold {\`E} that is related to the "local" density (as observed by the particle, [2]); particles engage in propagation of events when the level of the task-associated stimuli exceeds their thresholds. Let s be the intensity of a stimulus associated with a particular sensing task, set by the human authorities. We adopt the response function T_{\`e}(s) = sⁿover sⁿ + {\`e}ⁿ, the probability of performing the task as a function of s, where n > 1 determines the steepness of the threshold. Thus, when {\`e} is small (i.e. the network is sparse) then the response probability increases; when s increases (i.e. for critical sensing tasks) the response probability increases as well. This role-based approach where a selective number of devices do the high cost planning and the rest of the network operates in a low cost state leads to systems that have increased energy efficiency and high fault-tolerance since these long range planning phases allow to bypass obstacles (where no sensors are available) or faulty sensors (that have been disabled due to power failure or other natural events).
[RACTI-RU1-2006-10] Ntarmos, Nikos, Triantafillou, Peter and Weikum, Gerhard, Counting at large: Efficient cardinality estimation in Internet-scale data networks, in: 22nd International Conference on Data Engineering (ICDE 2006), 2006.
Abstract: Counting in general, and estimating the cardinality of (multi-) sets in particular, is highly desirable for a large variety of applications, representing a foundational block for the efficient deployment and access of emerging internet-scale information systems. Examples of such applications range from optimizing query access plans in internet-scale databases, to evaluating the significance (rank/score) of various data items in information retrieval applications. The key constraints that any acceptable solution must satisfy are: (i) efficiency: the number of nodes that need be contacted for counting purposes must be small in order to enjoy small latency and bandwidth requirements; (ii) scalability, seemingly contradicting the efficiency goal: arbitrarily large numbers of nodes nay need to add elements to a (multi-) set, which dictates the need for a highly distributed solution, avoiding server-based scalability, bottleneck, and availability problems; (iii) access and storage load balancing: counting and related overhead chores should be distributed fairly to the nodes of the network; (iv) accuracy: tunable, robust (in the presence of dynamics and failures) and highly accurate cardinality estimation; (v) simplicity and ease of integration: special, solution-specific indexing structures should be avoided. In this paper, first we contribute a highly-distributed, scalable, efficient, and accurate (multi-) set cardinality estimator. Subsequently, we show how to use our solution to build and maintain histograms, which have been a basic building block for query optimization for centralized databases, facilitating their porting into the realm of internet-scale data networks.
[RACTI-RU1-2009-90] Neumann, Thomas, Bender, Matthias, Michel, Sebastian, Schenkel, Ralf, Triantafillou, Peter and Weikum, Gerhard, Distributed top-k aggregation queries at large, in: Distributed and Parallel Databases, DAPD, 2009.
Abstract: Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network.