[RACTI-RU1-2011-22] Giannakopoulos, Yiannis and Koninis, Christos, DataAggregation, chapter in the book"Distributed Self-organized Societies of Tiny Artefacts: Design & Implementation", Lulu Publishers, ISBN 5800059245538, 2011.
Abstract: With the proliferation of wireless sensor net-
works and mobile technologies in general, it is possible to
provide improved medical services and also to reduce costs
as well as to manage the shortage of specialized personnel.
Monitoring a person’s health condition using sensors pro-
vides a lot of benefits but also exposes personal sensitive
information to a number of privacy threats. By recording
user-related data, it is often feasible for a malicious or
negligent data provider to expose these data to an unau-
thorized user. One solution is to protect the patient’s pri-
vacy by making difficult a linkage between specific
measurements with a patient’s identity. In this paper we
present a privacy-preserving architecture which builds
upon the concept of
k
-anonymity; we present a clustering-
based anonymity scheme for effective network manage-
ment and dataaggregation, which also protects user’s
privacy by making an entity indistinguishable from other
k
similar entities. The presented algorithm is resource
aware, as it minimizes energy consumption with respect to
other more costly, cryptography-based approaches. The
system is evaluated from an energy-consuming and net-
work performance perspective, under different simulation
scenarios.
Abstract: Counting items in a distributed system, and estimating the cardinality of multisets in particular,
is important for a large variety of applications and a fundamental building block for emerging Internet-scale information systems. Examples of such applications range from optimizing query access plans in peer-to-peer data sharing, to computing the significance (rank/score) of data items in distributed information retrieval. The general formal problem addressed in this article is computing the network-wide distinct number of items with some property (e.g., distinct files with file name
containing “spiderman”) where each node in the network holds an arbitrary subset, possibly overlapping the subsets of other nodes. The key requirements that a viable approach must satisfy are:
(1) scalability towards very large network size, (2) efficiency regarding messaging overhead, (3) load
balance of storage and access, (4) accuracy of the cardinality estimation, and (5) simplicity and easy
integration in applications. This article contributes the DHS (Distributed Hash Sketches) method
for this problem setting: a distributed, scalable, efficient, and accurate multiset cardinality estimator.
DHSis based on hash sketches for probabilistic counting, but distributes the bits of each counter
across network nodes in a judicious manner based on principles of Distributed Hash Tables, paying
careful attention to fast access and aggregation as well as update costs. The article discusses various
design choices, exhibiting tunable trade-offs between estimation accuracy, hop-count efficiency, and
load distribution fairness. We further contribute a full-fledged, publicly available, open-source implementation of all our methods, and a comprehensive experimental evaluation for various settings.
Abstract: Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network.
Abstract: We propose new burst assembly schemes and fast reservation (FR) protocols for Optical Burst Switched (OBS) networks that are based on traffic prediction. The burst assembly schemes aim at minimizing (for a given burst size) the average delay of the packets incurred during the burst assembly process, while the fast reservation protocols aim at further reducing the end-to-end delay of the data bursts. The burst assembly techniques use a linear prediction filter to estimate the number of packet arrivals at the ingress node in the following interval, and launch a new burst into the network when a certain criterion, different for each proposed scheme, is met. The fast reservation protocols use prediction filters to estimate the expected length of the burst and the time needed for the burst assembly process to complete. A Burst Header Packet (BHP) packet carrying these estimates is sent before the burst is completed, in order to reserve bandwidth at intermediate nodes for the time interval the burst is expected to pass from these nodes. Reducing the packet aggregation delay and the time required to perform the reservations, reduces the total time needed for a packet to be transported over an OBS network and is especially important for real-time applications. We evaluate the performance of the proposed burst assembly schemes and show that a number of them outperform the previously proposed timer-based, length-based and average delay-based burst assembly schemes. We also look at the performance of the fast reservation (FR) protocols in terms of the probability of successfully establishing the reservations required to transport the burst.
Abstract: Switching in core optical networks is currently being
performed using high-speed electronic or all-optical
circuit switches. Switching with high-speed electronics
requires optical-to-electronic (O/E) conversion of the
data stream, making the switch a potential bottleneck
of the network: any effort (including parallelization) for
electronics to approach the optical speeds seems to be
already reaching its practical limits. Furthermore, the
store-and-forward approach of packet-switching does
not seem suitable for all-optical implementation due to
the lack of practical optical random-access-memories
to buffer and resolve contentions. Circuit switching on
the other hand, involves a pre-transmission delay for
call setup and requires the aggregation of microlows
into circuits, sacriicing the granularity and the control
over individual lows, and is ineficient for bursty traf-
ic. Optical burst switching (OBS) has been proposed
by Qiao and Yoo (1999) to combine the advantages of
both packet and circuit switching and is considered a
promising technology for the next generation optical
internet.
Abstract: Wireless sensor network research usually focuses on the reliable and efficient collection of data. In
this paper we focus on the next step in the lifetime of traces: we aim at investigating and evaluating, by
qualitative and quantitative means, data repositories of already collected measurements. Concerning the
collected datasets, several important topics arise like the need of exchanging traces between researchers
using a common representation of the traces and the need for common classication of the traces based on
a commonly-agreed set of statistical characteristics for in retrospect utilization. In order to qualitatively
address these issues, we propose the use of a novel set of metrics focusing on the in-network dataaggregation
problem class. These metrics enable reliable evaluation of algorithms using the same benchmark traces (both
in average cases and \stressful" setups) removing the need for running algorithms in a real testbed, at least
in the initial development stage. We present the results of our research as a rst approach for addressing this
problem, and in order to conrm our method, we characterized several traces with the proposed metrics.
We validate the metrics by predicting the performance of three data-aggregation schemes using the available
traces and checking the results by actually running the algorithms