Abstract: We describe the design and implementation
of a secure and robust architectural data management
model suitable for cultural environments. Usage and exploitation
of the World Wide Web is a critical requirement
for a series of administrative tasks such as collecting, managing
and distributing valuable cultural and artistic information.
This requirement introduces a great number of
Internet threats for cultural organizations that may cause
huge data and/or financial losses, harm their reputation
and public acceptance as well as people’s trust on them.
Our model addresses a list of fundamental operational
and security requirements. It utilizes a number of cryptographic
primitives and techniques that provide data safety
and secure user interaction on especially demanding online
collaboration environments. We provide a reference
implementation of our architectural model and discuss
the technical issues. It is designed as a standalone solution
but it can be flexibly adapted in broader management
infrastructures.
Abstract: Counting in general, and estimating the cardinality of (multi-) sets in particular, is highly desirable for a large variety of applications, representing a foundational block for the efficient deployment and access of emerging internet-scale information systems. Examples of such applications range from optimizing query access plans in internet-scale databases, to evaluating the significance (rank/score) of various data items in information retrieval applications. The key constraints that any acceptable solution must satisfy are: (i) efficiency: the number of nodes that need be contacted for counting purposes must be small in order to enjoy small latency and bandwidth requirements; (ii) scalability, seemingly contradicting the efficiency goal: arbitrarily large numbers of nodes nay need to add elements to a (multi-) set, which dictates the need for a highly distributed solution, avoiding server-based scalability, bottleneck, and availability problems; (iii) access and storage load balancing: counting and related overhead chores should be distributed fairly to the nodes of the network; (iv) accuracy: tunable, robust (in the presence of dynamics and failures) and highly accurate cardinality estimation; (v) simplicity and ease of integration: special, solution-specific indexing structures should be avoided. In this paper, first we contribute a highly-distributed, scalable, efficient, and accurate (multi-) set cardinality estimator. Subsequently, we show how to use our solution to build and maintain histograms, which have been a basic building block for query optimization for centralized databases, facilitating their porting into the realm of internet-scale data networks.
Abstract: A fundamental approach in finding efficiently best routes or optimal itineraries in traffic information
systems is to reduce the search space (part of graph visited) of the most commonly used
shortest path routine (Dijkstra¢s algorithm) on a suitably defined graph. We investigate reduction
of the search space while simultaneously retaining datastructures, created during a preprocessing
phase, of size linear (i.e., optimal) to the size of the graph. We show that the search space of
Dijkstra¢s algorithm can be significantly reduced by extracting geometric information from a given
layout of the graph and by encapsulating precomputed shortest-path information in resulted geometric
objects (containers). We present an extensive experimental study comparing the impact of
different types of geometric containers using test data from real-world traffic networks. We also
present new algorithms as well as an empirical study for the dynamic case of this problem, where
edge weights are subject to change and the geometric containers have to be updated and show that
our new methods are two to three times faster than recomputing everything from scratch. Finally,
in an appendix, we discuss the software framework that we developed to realize the implementations
of all of our variants of Dijkstra¢s algorithm. Such a framework is not trivial to achieve as our
goal was to maintain a common code base that is, at the same time, small, efficient, and flexible,
as we wanted to enhance and combine several variants in any possible way.
Abstract: We describe algorithms for finding shortest paths and distances in outerplanar and planar digraphs
that exploit the particular topology of the input graph. An important feature of our algorithms is that they can
work in a dynamic environment, where the cost of any edge can be changed or the edge can be deleted. In the
case of outerplanar digraphs, our datastructures can be updated after any such change in only logarithmic time.
A distance query is also answered in logarithmic time. In the case of planar digraphs, we give an interesting
tradeoff between preprocessing, query, and update times depending on the value of a certain topological
parameter of the graph. Our results can be extended to n-vertex digraphs of genus O.n1¡"/ for any " > 0.
Abstract: Abstract— Numerous smart city testbeds and system deployments have surfaced around the world, aiming to provide services over unified large heterogeneous IoT infrastructures. Although we have achieved new scales in smart city installations and systems, so far the focus has been to provide diverse sources of data to smart city services consumers, while neglecting to provide ways to simplify making good use of them. We believe that knowledge creation in smart cities through data annotation, supported in both an automated and a crowdsourced manner, is an aspect that will bring additional value to smart cities. We present here our approach, aiming to utilize an existing smart city deployment and the OrganiCity software ecosystem. We discuss key challenges along with characteristic use cases, and report on our design and implementation, along with preliminary results.
Abstract: The promises inherent in users coming together to form data
sharing network communities, bring to the foreground new problems formulated
over such dynamic, ever growing, computing, storage, and networking
infrastructures. A key open challenge is to harness these highly
distributed resources toward the development of an ultra scalable, efficient
search engine. From a technical viewpoint, any acceptable solution
must fully exploit all available resources dictating the removal of any
centralized points of control, which can also readily lead to performance
bottlenecks and reliability/availability problems. Equally importantly,
however, a highly distributed solution can also facilitate pluralism in informing
users about internet content, which is crucial in order to preclude
the formation of information-resource monopolies and the biased visibility
of content from economically-powerful sources. To meet these challenges,
the work described here puts forward MINERVA{\^a}{\"i}¿½{\"i}¿½, a novel search
engine architecture, designed for scalability and efficiency. MINERVA{\^a}{\"i}¿½{\"i}¿½
encompasses a suite of novel algorithms, including algorithms for creating
data networks of interest, placing data on network nodes, load balancing,
top-k algorithms for retrieving data at query time, and replication algorithms
for expediting top-k query processing. We have implemented the
proposed architecture and we report on our extensive experiments with
real-world, web-crawled, and synthetic data and queries, showcasing the
scalability and efficiency traits of MINERVA{\^a}{\"i}¿½{\"i}¿½.
Abstract: This paper studies the data gathering problem in wireless networks, where data generated at the nodes has to be collected at a single sink. We investigate the relationship between routing optimality and fair resource management. In particular, we prove that for energy balanced data propagation, Pareto optimal routing and flow maximization are equivalent, and also prove that flow maximization is equivalent to maximizing the network lifetime. We algebraically characterize the network structures in which energy balanced data flows are maximal. Moreover, we algebraically characterize communication links which are not used by an optimal flow. This leads to the characterization of minimal network structures supporting the maximal flows.
We note that energy balance, although implying global optimality, is a local property that can be computed efficiently and in a distributed manner. We suggest online distributed algorithms for energy balance in different optimal network structures and numerically show their stability in particular setting. We remark that although the results obtained in this paper have a direct consequence in energy saving for wireless networks they do not limit themselves to this type of networks neither to energy as a resource. As a matter of fact, the results are much more general and can be used for any type of network and different type of resources.
Abstract: This paper studies the data gathering problem in wireless networks, where data generated at the nodes has to be collected at a single sink. We investigate the relationship between routing optimality and fair resource management. In particular, we prove that for energy-balanced data propagation, Pareto optimal routing and flow maximization are equivalent, and also prove that flow maximization is equivalent to maximizing the network lifetime. We algebraically characterize the network structures in which energy-balanced data flows are maximal. Moreover, we algebraically characterize communication links which are not used by an optimal flow. This leads to the characterization of minimal network structures supporting the maximal flows.
We note that energy-balance, although implying global optimality, is a local property that can be computed efficiently and in a distributed manner. We suggest online distributed algorithms for energy-balance in different optimal network structures and numerically show their stability in particular setting. We remark that although the results obtained in this paper have a direct consequence in energy saving for wireless networks they do not limit themselves to this type of networks neither to energy as a resource. As a matter of fact, the results are much more general and can be used for any type of network and different types of resources.
Abstract: The content-based publish/subscribe (pub/sub) paradigm for system design is becoming increasingly popular, offering
unique benefits for a large number of data-intensive applications. Coupled with the peer-to-peer technology, it can serve as
a central building block for developing data-dissemination applications deployed over a large-scale network infrastructure.
A key open problem towards the creation of large-scale content-based pub/sub infrastructures relates to efficiently and accu-
rately matching subscriptions with substring predicates to incoming events. This work addresses this issue.
Abstract: The content-based publish/subscribe (pub/sub)paradigm for system design is becoming increasingly popular, offering unique benefits for a large number of data-intensive applications. Coupled with the peer-to-peer technology, it can serve as a central building block for such applications
deployed over a large-scale network infrastructure. A key problem toward the creation of large-scale contentbased pub/sub infrastructures relates to dealing efficiently with continuous queries (subscriptions) with rich predicates on string attributes; In particular, efficiently and accurately
matching substring queries to incoming events is an open problem. In this work we study this problem. We provide and analyze novel algorithms for processing subscriptions with substring predicates and events in a variety of environments. We provide experimental data demonstrating the
relative performance behavior of the proposed algorithms using as key metrics the network bandwidth requirements, number of messages, load balancing, as well as requirements for extra routing state (and related maintenance) and design flexibility.
Abstract: We consider the problem of preprocessing an n-vertex digraph with real edge weights so that subsequent queries for the shortest path or distance between any two vertices can be efficiently answered. We give parallel algorithms for the EREW PRAM model of computation that depend on the treewidth of the input graph. When the treewidth is a constant, our algorithms can answer distance queries in O({\'a}(n)) time using a single processor, after a preprocessing of O(log2n) time and O(n) work, where {\'a}(n) is the inverse of Ackermann's function. The class of constant treewidth graphs contains outerplanar graphs and series-parallel graphs, among others. To the best of our knowledge, these are the first parallel algorithms which achieve these bounds for any class of graphs except trees. We also give a dynamic algorithm which, after a change in an edge weight, updates our datastructures in O(log n) time using O(n{\^a}) work, for any constant 0 < {\^a} < 1. Moreover, we give an algorithm of independent interest: computing a shortest path tree, or finding a negative cycle in O(log2n) time using O(n) work.
Abstract: A key issue when designing and implementing large-scale publish/subscribe systems is how to efficiently propagate subscriptions among the brokers of the system. Brokers require this information in order to forward incoming events only to interested users, filtering out unrelated events, which can save significant overheads (particularly network bandwidth and processing time at the brokers). In this paper we contribute the notion of subscription summaries, a mechanism appropriately compacting subscription information. We develop the associated datastructures and matching algorithms. The proposed mechanism can handle event/subscription schemata that are rich in terms of their attribute types and powerful in terms of the allowed operations on them. Our major results are that the proposed mechanism (i) is scalable, with the bandwidth required to propagate subscriptions increasing only slightly, even at huge-scales, and (ii) is significantly more efficient, up to orders of magnitude, depending on the scale, with respect to the bandwidth requirements for propagating subscriptions.
Abstract: The content-based publish/subscribe (pub/sub)
paradigm for system design is becoming increasingly
popular, offering unique benefits for a large number of
data-intensive applications. Coupled with the peer-to-peer
technology, it can serve as a central building block for
such applications deployed over a large-scale network
infrastructure. A key problem toward the creation of
large-scale content-based pub/sub infrastructures relates to
dealing efficiently with continuous queries (subscriptions)
with rich predicates on string attributes; in this work we
study the problem of efficiently and accurately matching
substring queries to incoming events.
Abstract: The Internet of Things (IoT) and smart cities are two of the most popular directions the research community is pursuing very actively. But although we have made great progress in many fields, we are still trying to figure out how we can utilize our smart city and IoT infrastructures, in order to produce reliable, economically sustainable solutions that create public value, and even more so in the field of education.
GAIA1, a Horizon2020 EC-funded project, has developed an IoT infrastructure across school buildings in Europe. Its primary aim has been to raise awareness about energy consumption and sustainability, based on real-world sensor data produced inside the school buildings where students and teachers live and work. Today's students are the citizens of tomorrow, and they should have the skills to understand and respond to challenges like climate change. Currently, 25 educational building sites participate in GAIA, located in Sweden, Italy, and Greece. An IoT infrastructure [1] is installed in these buildings, monitoring in real-time their power consumption, as well as several indoor and outdoor environmental parameters.
Abstract: There exists a great amount of algorithms for wireless sensor networks (WSNs) that have never been tried in practice. This is due to the fact that programming sensor nodes still happens on a very technical level. We remedy the situation by introducing our algorithm library Wiselib, which allows for simple implementations of algorithms. It can adopt to a large variety of hardware and software. This is achieved by employing advanced C++ techniques such as templates and inline functions, which allow to write generic code that is resolved and bound at compile time, resulting in virtually no memory or computation overhead at run time. The Wiselib runs on different host operating systems such as Contiki, iSense OS, and ScatterWeb. Furthermore, it runs on virtual nodes simulated by Shawn. The Wiselib provides an algorithm with datastructures that suit the specific properties of the target platform. Algorithm code does not contain any platform-specific specializations, allowing a single implementation to run natively on heterogeneous networks. In this paper, we describe the building blocks of the Wiselib, analyze the overhead, and show how cryptographically secured routing algorithms can be implemented. We also report on results from experiments with real sensor node hardware.