[RACTI-RU1-2011-22] Giannakopoulos, Yiannis and Koninis, Christos, Data Aggregation, chapter in the book"Distributed Self-organized Societies of Tiny Artefacts: Design & Implementation", Lulu Publishers, ISBN 5800059245538, 2011.
Abstract: With the proliferation of wireless sensor net-
works and mobile technologies in general, it is possible to
provide improved medical services and also to reduce costs
as well as to manage the shortage of specialized personnel.
Monitoring a person’s health condition using sensors pro-
vides a lot of benefits but also exposes personal sensitive
information to a number of privacy threats. By recording
user-related data, it is often feasible for a malicious or
negligent data provider to expose these data to an unau-
thorized user. One solution is to protect the patient’s pri-
vacy by making difficult a linkage between specific
measurements with a patient’s identity. In this paper we
present a privacy-preserving architecture which builds
upon the concept of
k
-anonymity; we present a clustering-
based anonymity scheme for effective network manage-
ment and data aggregation, which also protects user’s
privacy by making an entity indistinguishable from other
k
similar entities. The presented algorithm is resource
aware, as it minimizes energy consumption with respect to
other more costly, cryptography-based approaches. The
system is evaluated from an energy-consuming and net-
work performance perspective, under different simulation
scenarios.
Abstract: Counting items in a distributed system, and estimating the cardinality of multisets in particular,
is important for a large variety of applications and a fundamental building block for emerging Internet-scale information systems. Examples of such applications range from optimizing query access plans in peer-to-peer data sharing, to computing the significance (rank/score) of data items in distributed information retrieval. The general formal problem addressed in this article is computing the network-wide distinct number of items with some property (e.g., distinct files with file name
containing “spiderman”) where each node in the network holds an arbitrary subset, possibly overlapping the subsets of other nodes. The key requirements that a viable approach must satisfy are:
(1) scalability towards very large network size, (2) efficiency regarding messaging overhead, (3) load
balance of storage and access, (4) accuracy of the cardinality estimation, and (5) simplicity and easy
integration in applications. This article contributes the DHS (Distributed Hash Sketches) method
for this problem setting: a distributed, scalable, efficient, and accurate multiset cardinality estimator.
DHSis based on hash sketches for probabilistic counting, but distributes the bits of each counter
across network nodes in a judicious manner based on principles of Distributed Hash Tables, paying
careful attention to fast access and aggregation as well as update costs. The article discusses various
design choices, exhibiting tunable trade-offs between estimation accuracy, hop-count efficiency, and
load distribution fairness. We further contribute a full-fledged, publicly available, open-source implementation of all our methods, and a comprehensive experimental evaluation for various settings.
Abstract: Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network.
Abstract: Evolutionary dynamics have been traditionally studied in the context of homogeneous populations, mainly described by the Moran process [15]. Recently, this approach has been generalized in [13] by arranging individuals on the nodes of a network (in general, directed). In this setting, the existence of directed arcs enables the simulation of extreme phenomena, where the fixation probability of a randomly placed mutant (i.e. the probability that the offsprings of the mutant eventually spread over the whole population) is arbitrarily small or large. On the other hand, undirected networks (i.e. undirected graphs) seem to have a smoother behavior, and thus it is more challenging to find suppressors/amplifiers of selection, that is, graphs with smaller/greater fixation probability than the complete graph (i.e. the homogeneous population). In this paper we focus on undirected graphs. We present the first class of undirected graphs which act as suppressors of selection, by achieving a fixation probability that is at most one half of that of the complete graph, as the number of vertices increases. Moreover, we provide some generic upper and lower bounds for the fixation
probability of general undirected graphs. As our main contribution, we introduce the natural alternative of the model proposed in [13]. In our new evolutionary model, all individuals interact simultaneously and the result is a compromise between aggressive and non-aggressive individuals. That is, the behavior of the individuals in our new model and in the model of [13] can be interpreted as an “aggregation” vs. an “all-or-nothing” strategy, respectively. We prove that our new model of mutual influences admits a potential function, which guarantees the convergence of the system for any graph topology and any initial fitness vector of the individuals. Furthermore, we prove fast convergence to the stable state for the case of the complete graph, as well as we provide almost tight bounds on the limit fitness of the individuals. Apart from being important on its own, this new evolutionary model appears to be useful also in the abstract modeling of control mechanisms over invading populations in networks. We demonstrate this by introducing and analyzing two alternative control approaches, for which we bound the time needed to stabilize to the “healthy” state of the system.
Abstract: We propose new burst assembly schemes and fast reservation (FR) protocols for Optical Burst Switched (OBS) networks that are based on traffic prediction. The burst assembly schemes aim at minimizing (for a given burst size) the average delay of the packets incurred during the burst assembly process, while the fast reservation protocols aim at further reducing the end-to-end delay of the data bursts. The burst assembly techniques use a linear prediction filter to estimate the number of packet arrivals at the ingress node in the following interval, and launch a new burst into the network when a certain criterion, different for each proposed scheme, is met. The fast reservation protocols use prediction filters to estimate the expected length of the burst and the time needed for the burst assembly process to complete. A Burst Header Packet (BHP) packet carrying these estimates is sent before the burst is completed, in order to reserve bandwidth at intermediate nodes for the time interval the burst is expected to pass from these nodes. Reducing the packet aggregation delay and the time required to perform the reservations, reduces the total time needed for a packet to be transported over an OBS network and is especially important for real-time applications. We evaluate the performance of the proposed burst assembly schemes and show that a number of them outperform the previously proposed timer-based, length-based and average delay-based burst assembly schemes. We also look at the performance of the fast reservation (FR) protocols in terms of the probability of successfully establishing the reservations required to transport the burst.
Abstract: We propose new burst assembly techniques that aim at reducing the average delay experienced by the packets during the burstification process in optical burst switched (OBS) networks, for a given average size of the bursts produced. These techniques use a linear prediction filter to estimate the number of packet arrivals at the ingress node in the following interval, and launch a new burst into the network when a certain criterion, which is different for each proposed scheme, is met. Reducing the packet burstification delay, for a given average burst size, is essential for real-time applications; correspondingly, increasing the average burst size for a given packet burstification delay is important for reducing the number of bursts injected into the network and the associated overhead imposed on the core nodes. We evaluate the performance of the proposed schemes and show that two of them outperform the previously proposed timer - based, length - based and average delay-based burst aggregation schemes in terms of the average packet burstification delay for a given average burst size.
Abstract: This article studies the transmission control
protocol (TCP) synchronization effect in optical burst
switched networks.Synchronization of TCP flows appears
when optical bursts with segments from different flows inside
are dropped in the network causing flow congestion windows decreasing simultaneously. In this article,this imminent
effect is studied with different assembly schemes and network scenarios.Different metrics are applied to quantitatively assess synchronization with classical assembly
schemes.A new burst assembly scheme is proposed that
statically or dynamically allocates flows to multiple assembly queues to control flow aggregation within the assembly
cycle.The effectiveness of the scheme has been evaluated,
showing a good improvement in optical link utilization
Abstract: Switching in core optical networks is currently being
performed using high-speed electronic or all-optical
circuit switches. Switching with high-speed electronics
requires optical-to-electronic (O/E) conversion of the
data stream, making the switch a potential bottleneck
of the network: any effort (including parallelization) for
electronics to approach the optical speeds seems to be
already reaching its practical limits. Furthermore, the
store-and-forward approach of packet-switching does
not seem suitable for all-optical implementation due to
the lack of practical optical random-access-memories
to buffer and resolve contentions. Circuit switching on
the other hand, involves a pre-transmission delay for
call setup and requires the aggregation of microlows
into circuits, sacriicing the granularity and the control
over individual lows, and is ineficient for bursty traf-
ic. Optical burst switching (OBS) has been proposed
by Qiao and Yoo (1999) to combine the advantages of
both packet and circuit switching and is considered a
promising technology for the next generation optical
internet.
Abstract: Wireless sensor network research usually focuses on the reliable and efficient collection of data. In
this paper we focus on the next step in the lifetime of traces: we aim at investigating and evaluating, by
qualitative and quantitative means, data repositories of already collected measurements. Concerning the
collected datasets, several important topics arise like the need of exchanging traces between researchers
using a common representation of the traces and the need for common classication of the traces based on
a commonly-agreed set of statistical characteristics for in retrospect utilization. In order to qualitatively
address these issues, we propose the use of a novel set of metrics focusing on the in-network data aggregation
problem class. These metrics enable reliable evaluation of algorithms using the same benchmark traces (both
in average cases and \stressful" setups) removing the need for running algorithms in a real testbed, at least
in the initial development stage. We present the results of our research as a rst approach for addressing this
problem, and in order to conrm our method, we characterized several traces with the proposed metrics.
We validate the metrics by predicting the performance of three data-aggregation schemes using the available
traces and checking the results by actually running the algorithms
Abstract: We propose information aggregation as a method for summarizing the resource-related information, used by the task scheduler. Through this method the information of a set of resources can be uniformly represented, reducing at the same time the amount of information transferred in a Grid network. A number of techniques are described for aggregating the information of the resources belonging to a hierarchical Grid domain. This information includes the cpu and storage capacities at a site, the number of tasks queued, and other resource-related parameters. The quality of the aggregation scheme affects the efficiency of the scheduler{\^a}€™s decisions. We use as a metric of aggregation efficiency the Stretch Factor (SF), defined as the ratio of the task delay when the task is scheduled using complete resource information over the task delay when an aggregation scheme is used. The simulation experiments performed show that the proposed aggregation schemes achieve large information reduction, while enabling good task scheduling decisions as indicated by the SF achieved.
Abstract: We consider information aggregation as a method for reducing the information exchanged in a Grid network and used by the resource manager in order to make scheduling decisions. In this way, information is summarized across nodes and sensitive or detailed information can be kept private, while resources are still publicly available for use. We present a general framework for information aggregation, trying to identify issues that relate to aggregation in Grids. In this context, we describe a number of techniques, including single point and intra-domain aggregation, define appropriate grid-specific domination relations and operators for aggregating static and dynamic resource information, and discuss resource selection optimization functions. The quality of an aggregation scheme is measured both by its effects on the efficiency of the scheduler¢s decisions and also by the reduction it brings on the amount of resource information recorded, a tradeoff that we examine in detail. Simulation experiments demonstrate that the proposed schemes achieve significant information reduction, either in the amount of information exchanged, or in the frequency of the updates, while at the same time maintaining most of the value of the original information as expressed by a stretch factor metric we introduce.