Abstract: The paper is concerned with the time efficient
processing of spatiotemporal predicates, i.e. spatial
predicates associated with an exact temporal
constraint. A set of such predicates forms a buffer
query or a Spatio-temporal Pattern (STP) Query with
time. In the more general case of an STP query, the
temporal dimension is introduced via the relative order
of the spatial predicates (STP queries with order).
Therefore, the efficient processing of a spatiotemporal
predicate is crucial for the efficient implementation of
more complex queries of practical interest. We propose
an extension of a known approach, suitable for
processing spatial predicates, which has been used for
the efficient manipulation of STP queries with order.
The extended method is supported by efficient indexing
structures. We also provide experimental results that
show the efficiency of the technique.
Abstract: An ever growing emphasis is put nowadays in developing personalized journey planning and renewable mobility services in smart cities. These services combine means of scheduled-based public transport and electric vehicles or bikes, using crowdsourcing techniques for collecting real-time traffic information and for assessing the recommended routes. The goal is to develop an information system that will allow the fast, real-time computation of best routes.
The main challenges in developing such an information system are both technological and algorithmic. The technological challenge concerns the collection, storage, management, and updating of a huge volume of transport data that are usually time-dependent, and the provision (through these data) of personalized renewable mobility services in smartphones. This challenge is typically confronted by creating a cloud infrastructure that on the one hand will support the storage, management, and updating of data, while on the other hand it will handle the necessary data feed to the smartphone applications for providing the users with the requested best routes.
The algorithmic challenge concerns the development of innovative algorithms for the efficient provision of journey planning services in smartphones, based on data they will receive from the cloud infrastructure. These services guarantee the computation of realistic and useful best routes, as well as the updating of the precomputed (route and timetable) information, in case of delays of scheduled public transport vehicles, so that the users can online update their routes to destination. The goal is to develop an algorithmic basis for supporting modern renewable mobility services (information systems), such as "mobility on demand'' (where the next leg of a journey is decided in real-time) and "door-to-door'' personalized mobility, in urban scheduled-based public transport environments. Scheduled-based public transport information systems should not only compute in real-time end-user queries requesting best routes, but also to update the timetable information in case of delays.
The core algorithmic issues of mobility and journey planning (regarding the computation of optimal routes under certain criteria) in scheduled-based public transport systems concern the efficient solution of the fundamental earlier arrival (EA) problem (compute a journey from station S to station T minimizing the overall traveling time required to complete the journey), the minimum number of
transfers (MNT) problem (compute a journey from station S to station T minimizing the number of times a passenger is required to change vehicle), and the efficient updating of timetable information system in case of vehicle delays. The EA and MNT problems have been extensively studied in the literature under two main approaches: the array-based modeling (where the timetable is represented as an array) and the graph-based modeling (where the timetable is represented as a graph). Experimental results have shown so far that the array-based approaches are faster in terms of query time than graph-based ones, as they are able to better exploit data locality and do not rely on priority queues. On the other hand, the array-based approaches have not been theoretically or experimentally studied as far as the efficient updating of timetable information, in case of delays, is concerned.
In this thesis, new graph-based models are being developed that solve efficiently the aforementioned fundamental algorithmic mobility problems in urban scheduled-based public transport information systems, along with a mobile application (journey planner) running on Android-based smartphones that includes a service for the evaluation of the recommended routes by the users. In particular:
(a) An extensive comparative evaluation was conducted on graph-based dynamic models that represent big data volumes regarding their suitability for representing timetable information. The study confirmed that the realistic time-expanded model is the most suitable for representing timetable information.
(b) Two new graph-based models have been developed for representing timetable information (in a timetable information system), the reduced time-expanded model and the dynamic timetable model (DTM), both of which are more space-efficient with respect to the realistic time-expanded model. For both of the new models, new efficient algorithms were developed for fast answering of EA and MNT queries, as well as for updating the timetable information representation in case of delays.
(c)An experimental evaluation was conducted with the new graph-based models and their associated query and update algorithms on a set of 14 real-world scheduled-based transportation systems, including the metropolitan areas of Berlin, Athens, London, Rome, and Madrid. The experimental results showed that the query algorithms of the reduced time-expanded model are superior to those of the DTM model, while the reverse is true regarding the update algorithms. In addition, the experimental study showed that the query algorithms of the new graph-based models compete favorably with those of the best array-based models.
(d) A mobile, cloud-based, journey planner (information system) was developed whose core algorithmic engine builds upon the new graph-based models. The mobile application is accompanied by a service that allows the users to assess the recommended journeys. The journey planner demonstrates the practicality of the new graph-based models and their associated query and update algorithms.
Abstract: Urban road networks are represented as directed graphs, accompanied by a metric which assigns cost functions (rather than scalars) to the arcs, e.g. representing time-dependent arc-traversal-times. In this work, we present oracles for providing time-dependent min-cost route plans, and conduct their experimental evaluation on a real-world data set (city of Berlin). Our oracles are based on precomputing all landmark-to-vertex shortest travel-time functions, for properly selected landmark sets. The core of this preprocessing phase is based on a novel, quite efficient and simple oneto-all approximation method for creating approximations of shortest travel-time functions. We then propose three query algorithms, including a PTAS, to efficiently provide mincost route plan responses to arbitrary queries. Apart from the purely algorithmic challenges, we deal also with several
implementation details concerning the digestion of raw traffic data, and we provide heuristic improvements of both the preprocessing phase and the query algorithms. We conduct an extensive, comparative experimental study with all query algorithms and six landmark sets. Our results are quite encouraging, achieving remarkable speedups (at least by two orders of magnitude) and quite small approximation guarantees, over the time-dependent variant of Dijkstra¢s algorithm.
Abstract: Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network.
Abstract: Using a set of geometric containers to speed up shortest path queries in a weighted graph has been proven a useful tool for dealing with large sparse graphs. Given a layout of a graph G=(V,E), we store, for each edge (u,v)set membership, variantE, the bounding box of all nodes tset membership, variantV for which a shortest u-t-path starts with (u,v). Shortest path queries can then be answered by DijkstraImage restricted to edges where the corresponding bounding box contains the target.
In this paper, we present new algorithms as well as an empirical study for the dynamic case of this problem, where edge weights are subject to change and the bounding boxes have to be updated. We evaluate the quality and the time for different update strategies that guarantee correct shortest paths in an interesting application to railway information systems, using real-world data from six European countries.
Abstract: Many efforts have been done in the last years to model public transport timetables in order to
find optimal routes. The proposed models can be classified into two types: those representing the
timetable as an array, and those representing it as a graph. The array-based models have been
shown to be very effective in terms of query time, while the graph-based models usually answer
queries by computing shortest paths, and hence they are suitable to be used in combination with
speed-up techniques developed for road networks.
In this paper, we focus on the dynamic behavior of graph-based models considering the case
where transportation systems are subject to delays with respect to the given timetable. We
make three contributions: (i) we give a simplified and optimized update routine for the wellknown
time-expanded model along with an engineered query algorithm; (ii) we propose a new
graph-based model tailored for handling dynamic updates; (iii) we assess the effectiveness of
the proposed models and algorithms by an experimental study, which shows that both models
require negligible update time and a query time which is comparable to that required by some
array-based models.
Abstract: Building efficient internet-scale data management services is the main focus of this chapter. In particular, we aim to show how to leverage DHT technology and extend it with novel algorithms and architectures in order to (i) improve efficiency and reliability for traditional DHT (exact-match) queries, particularly exploiting the abundance of altruism witnessed in real-life P2P networks, (ii) speedup range queries for data stored on DHTs, and (iii) support efficiently and scalably the publish/subscribe paradigm over DHTs, which crucially depends on algorithms for supporting rich queries on string-attribute data.
Abstract: Information retrieval (IR) in peer-to-peer (P2P) networks,
where the corpus is spread across many loosely coupled
peers, has recently gained importance. In contrast to IR
systems on a centralized server or server farm, P2P IR faces
the additional challenge of either being oblivious to global
corpus statistics or having to compute the global measures
from local statistics at the individual peers in an efficient,
distributed manner. One specific measure of interest is the
global document frequency for different terms, which would
be very beneficial as term-specific weights in the scoring and
ranking of merged search results that have been obtained
from different peers.
This paper presents an efficient solution for the problem
of estimating global document frequencies in a large-scale
P2P network with very high dynamics where peers can join
and leave the network on short notice. In particular, the
developed method takes into account the fact that the lo-
cal document collections of autonomous peers may arbitrar-
ily overlap, so that global counting needs to be duplicate-
insensitive. The method is based on hash sketches as a
technique for compact data synopses. Experimental stud-
ies demonstrate the estimator?s accuracy, scalability, and
ability to cope with high dynamics. Moreover, the benefit
for ranking P2P search results is shown by experiments with
real-world Web data and queries.
Abstract: With this work we aim to make a three-fold contribution.
We first address the issue of supporting efficiently queries
over string-attributes involving prefix, suffix, containment,
and equality operators in large-scale data networks. Our
first design decision is to employ distributed hash tables
(DHTs) for the data network?s topology, harnessing their
desirable properties. Our next design decision is to derive
DHT-independent solutions, treating DHT as a black box.
Second, we exploit this infrastructure to develop efficient
content based publish/subscribe systems. The main con-
tribution here are algorithms for the efficient processing of
queries (subscriptions) and events (publications). Specifi-
cally, we show that our subscription processing algorithms
require O(logN) messages for a N-node network, and our
event processing algorithms require O(l ? logN) messages
(with l being the average string length).
Third, we develop algorithms for optimizing the proces-
sing of multi-dimensional events, involving several string at-
tributes. Further to our analysis, we provide simulation-
based experiments showing promising performance results
in terms of number of messages, required bandwidth, load
balancing, and response times.
Abstract: Information retrieval (IR) in peer-to-peer (P2P) networks,
where the corpus is spread across many loosely coupled
peers, has recently gained importance. In contrast to IR
systems on a centralized server or server farm, P2P IR faces
the additional challenge of either being oblivious to global
corpus statistics or having to compute the global measures
from local statistics at the individual peers in an efficient,
distributed manner. One specific measure of interest is the
global document frequency for different terms, which would
be very beneficial as term-specific weights in the scoring and
ranking of merged search results that have been obtained
from different peers.
This paper presents an efficient solution for the problem
of estimating global document frequencies in a large-scale
P2P network with very high dynamics where peers can join
and leave the network on short notice. In particular, the
developed method takes into account the fact that the lo-
cal document collections of autonomous peers may arbitrar-
ily overlap, so that global counting needs to be duplicate-
insensitive. The method is based on hash sketches as a
technique for compact data synopses. Experimental stud-
ies demonstrate the estimator?s accuracy, scalability, and
ability to cope with high dynamics. Moreover, the benefit
for ranking P2P search results is shown by experiments with
real-world Web data and queries.
Abstract: Wireless sensor networks can be very useful in applications that require the detection of crucial events, in physical environments subjected to critical conditions, and the propagation of data reporting their realization to a control center. In this paper we propose jWebDust, a generic and modular application environment for developing and managing applications that are based on wireless sensor networks. Our software architecture provides a range of services that allow to create customized applications with minimum implementation effort that are easy to administrate. We move beyond the ?networking-centric? view of sensor network research and focus on how the end user (administrator, control center supervisor, etc.) will visualize and interact with the system.
We here present its open architecture, the most important design decisions, and discuss its distinct features and functionalities. jWebDust allows heterogeneous components to interoperate (real world sensor networks will rarely be homogeneous) and allows the integrated management and control of multiple such networks by also defining web-based mechanisms to visualize the network state, the results of queries, and a means to inject queries in the network. The architecture also illustrates how existing protocols for various services can interoperate in a bigger framework - such as the tree construction, query routing, etc.
Abstract: This paper addresses the efficient processing of
top-k queries in wide-area distributed data
repositories where the index lists for the attribute
values (or text terms) of a query are distributed
across a number of data peers and the
computational costs include network latency,
bandwidth consumption, and local peer work.
We present KLEE, a novel algorithmic
framework for distributed top-k queries,
designed for high performance and flexibility.
KLEE makes a strong case for approximate top-k
algorithms over widely distributed data sources.
It shows how great gains in efficiency can be
enjoyed at low result-quality penalties. Further,
KLEE affords the query-initiating peer the
flexibility to trade-off result quality and expected
performance and to trade-off the number of
communication phases engaged during query
execution versus network bandwidth
performance. We have implemented KLEE and
related algorithms and conducted a
comprehensive performance evaluation. Our
evaluation employed real-world and synthetic
large, web-data collections, and query
benchmarks. Our experimental results show that
KLEE can achieve major performance gains in
terms of network bandwidth, query response
times, and much lighter peer loads, all with small
errors in result precision and other result-quality
measures
Abstract: We consider the problem of searching for a piece of
information in a fully interconnected computer network
(also called a complete network or clique) by exploiting
advice about its location from the network nodes. Each
node contains a database that ?knows? what kind of
documents or information are stored in other nodes
(e.g., a node could be a Web server that answers queries
about documents stored on the Web). The databases in
each node, when queried, provide a pointer that leads to
the node that contains the information. However, this
information is up-to-date (or correct) with some
bounded probability. While, in principle, one may always
locate the information by simply visiting the network
nodes in some prescribed ordering, this requires a time
complexity in the order of the number of nodes of the
network. In this paper, we provide algorithms for locating
an information node in the complete communication
network, which take advantage of advice given from
network nodes. The nodes may either give correct advice,
by pointing directly to the information node, or give
wrong advice, by pointing elsewhere. On the lowerbounds?
side, we show that no fixed-memory (i.e., with
memory independent of the network size) deterministic
algorithm may locate the information node in a constant
(independent of the network size) expected number of
steps. Moreover, if p (1/n) is the probability that a
node of an n-node clique gives correct advice, we show
that no algorithm may locate the information node in an
expected number of steps less than 1/p o(1). To study
how the expected number of steps is affected by the
amount of memory allowed to the algorithms, we give a
memoryless randomized algorithm with expected number
of steps 4/p o(1/p) o(1) and a 1-bit randomized
algorithm requiring on the average at most 2/p o(1)
steps. In addition, in the memoryless case, we also
prove a 4/p lower bound for the expected number of
steps in the case where the nodes giving faulty advice
may decide on the content of this advice in any possible
way and not merely at random (adversarial fault model).
Finally, for the case where faulty nodes behave randomly,
we give an optimal, unlimited memory deterministic
algorithm with expected number of steps bounded
from above by 1/p o(1/p) 1.
Abstract: We consider the problem of searching for a piece of information in a fully interconnected computer network or clique by exploiting
advice about its location from the network nodes Each node contains a
database that knows what kind of documents or information are stored
in other nodes e.g. a node could be a Web server that answers queries
about documents stored on the Web. The databases in each node when
queried provide a pointer that leads to the node that contains the information. However this information is up to date or correct with some
bounded probability. While in principle one may always locate the information by simply visiting the network nodes in some prescribed ordering
this requires a time complexity in the order of the number of nodes of the
network. In this paper we provide algorithms for locating an information node in the complete communication network that take advantage
of advice given from network nodes The nodes may either give correct
advice by pointing directly to the information node or give wrong advice
by pointing elsewhere We show that on the averageif the probability that a node provides correct advice is asymptotically larger than
where is the number of the computer nodes then the average time complexity for locating the information node is asymptotically or depending on the available memory.The probability may in general be a function of the number of network nodes . On the lower bounds
side we prove that noxed memory deterministic algorithm can locate
the information node in nite expected number of steps. We also prove
a lower bound of
for the expected number of steps of any algorithm
that locates the information node in the complete network.
Abstract: The promises inherent in users coming together to form data
sharing network communities, bring to the foreground new problems formulated
over such dynamic, ever growing, computing, storage, and networking
infrastructures. A key open challenge is to harness these highly
distributed resources toward the development of an ultra scalable, efficient
search engine. From a technical viewpoint, any acceptable solution
must fully exploit all available resources dictating the removal of any
centralized points of control, which can also readily lead to performance
bottlenecks and reliability/availability problems. Equally importantly,
however, a highly distributed solution can also facilitate pluralism in informing
users about internet content, which is crucial in order to preclude
the formation of information-resource monopolies and the biased visibility
of content from economically-powerful sources. To meet these challenges,
the work described here puts forward MINERVA{\^a}{\"i}¿½{\"i}¿½, a novel search
engine architecture, designed for scalability and efficiency. MINERVA{\^a}{\"i}¿½{\"i}¿½
encompasses a suite of novel algorithms, including algorithms for creating
data networks of interest, placing data on network nodes, load balancing,
top-k algorithms for retrieving data at query time, and replication algorithms
for expediting top-k query processing. We have implemented the
proposed architecture and we report on our extensive experiments with
real-world, web-crawled, and synthetic data and queries, showcasing the
scalability and efficiency traits of MINERVA{\^a}{\"i}¿½{\"i}¿½.
Abstract: Recently, an interest has arisen for network worms that propagate using Domain Name Servers (DNS) in order to discover victim hosts.These worms generate random strings, as possible network domain names, and then query Domain Name Servers in order to discover the corresponding IP addresses. In this paper we present models for the dynamics of the co-evolution of worm agents in the presence of anti-worm agents that move in the network in order to stop worm propagation. The proposed models consider anti-worm agents who know the network and anti-worm agents that do not know it and need to issue queries in order to discover valid IP addresses. We,further, introduce "honeypot'' domain name servers that attempt to lure worms, introducing only a delay and providing no answer.We show that by simply delaying the response to DNS queries issued by the worm has little positive effect on the worms propagation.
Abstract: The proliferation of peertopeer
(P2P) systems has come with various
compelling applications including file sharing based on distributed
hash tables (DHTs) or other kinds of overlay networks.
Searching the content of files (especially Web Search) requires
multikeyword
querying with scoring and ranking. Existing approaches
have no way of taking into account the correlation between
the keywords in the query. This paper presents our solution
that incorporates the queries and behavior of the users in the P2P
network such that interesting correlations can be inferred.
Abstract: MINERVA1 is a novel approach towards P2P Web search
that connects an a-priori unlimited number of peers, each of which maintains
a personal local database and a local search facility. Each peer posts
a small amount of metadata to a physically distributed directory layered
on top of a DHT-based overlay network that is used to efficiently select
promising peers from across the peer population that can best locally execute
a query. This paper proposes a live demonstration of MINERVA,
showcasing the full information lifecycle: crawling web pages, disseminating
metadata to a distributed directory, and executing queries online. We
additionally invite all visitors to instantly join the network by executing
a small piece of software.
Abstract: In this work we propose and develop a comprehensive
infrastructure, coined PastryStrings, for supporting rich
queries on both numerical (with range, and comparison
predicates) and string attributes, (accommodating equality,
prefix, suffix, and containment predicates) over DHT networks
utilising prefix-based routing. As event-based, publish/
subscribe information systems are a champion application
class, we formulate our solution in terms of this environment.
Abstract: In this work we propose and develop a comprehensive infrastructure, coined PastryStrings, for supporting rich queries on
both numerical (with range, and comparison predicates) and string attributes, (accommodating equality, prefix, suffix, and
containment predicates) over DHT networks utilising prefix-based routing. As event-based, publish/subscribe information
systems are a champion application class, we formulate our solution in terms of this environment.
Abstract: The content-based publish/subscribe (pub/sub)paradigm for system design is becoming increasingly popular, offering unique benefits for a large number of data-intensive applications. Coupled with the peer-to-peer technology, it can serve as a central building block for such applications
deployed over a large-scale network infrastructure. A key problem toward the creation of large-scale contentbased pub/sub infrastructures relates to dealing efficiently with continuous queries (subscriptions) with rich predicates on string attributes; In particular, efficiently and accurately
matching substring queries to incoming events is an open problem. In this work we study this problem. We provide and analyze novel algorithms for processing subscriptions with substring predicates and events in a variety of environments. We provide experimental data demonstrating the
relative performance behavior of the proposed algorithms using as key metrics the network bandwidth requirements, number of messages, load balancing, as well as requirements for extra routing state (and related maintenance) and design flexibility.
Abstract: In this work we address the issue of efficient processing of range queries in DHT-based P2P data networks. The novelty of the proposed approach lies on architectures, algorithms, and mechanisms for identifying and appropriately exploiting powerful nodes in such networks. The existence of such nodes has been well documented in the literature and plays a key role in the architecture of most successful real-world P2P applications. However, till now, this heterogeneity has not been taken into account when architecting solutions for complex query processing, especially in DHT networks. With this work we attempt to fill this gap for optimizing the processing of range queries. Significant performance improvements are achieved due to (i) ensuring a much smaller hop count performance for range queries, and (ii) avoiding the dangers and inefficiencies of relying for range query processing on weak nodes, with respect to processing, storage, and communication capacities, and with intermittent connectivity. We present detailed experimental results validating our performance claims.
Abstract: We consider the conflicting problems of ensuring data-access load balancing and efficiently processing range queries on peer-to-peer data networks maintained over Distributed Hash Tables (DHTs). Placing consecutive data values in neighboring peers is frequently used in DHTs since it accelerates range query processing. However, such a placement is highly susceptible to load imbalances, which are preferably handled by replicating data (since replication also introduces fault tolerance benefits). In this paper, we present HotRoD, a DHT-based architecture that deals effectively with this combined problem through the use of a novel locality-preserving hash function, and a tunable data replication mechanism which allows trading off replication costs for fair load distribution. Our detailed experimentation study shows strong gains in both range query processing efficiency and data-access load balancing, with low replication overhead. To our knowledge, this is the first work that concurrently addresses the two conflicting problems using data replication.
Abstract: We consider the conflicting problems of ensuring data-access load balancing and efficiently processing range queries on peer-to-peer data networks maintained over Distributed Hash Tables (DHTs). Placing consecutive data values in neighboring peers is frequently used in DHTs since it accelerates range query processing. However, such a placement is highly susceptible to load imbalances, which are preferably handled by replicating data (since replication also introduces fault tolerance benefits). In this paper, we present HotRoD, a DHT-based architecture that deals effectively with this combined problem through the use of a novel locality-preserving hash function, and a tunable data replication mechanism which allows trading off replication costs for fair load distribution. Our detailed experimentation study shows strong gains in both range query processing efficiency and data-access load balancing, with low replication overhead. To our knowledge, this is the first work that concurrently addresses the two conflicting problems using data replication.
Abstract: Our position is that a key to research efforts on ensuring high
performance in very large scale sharing networks is the idea of
volunteering; recognizing that such networks are comprised of
largely heterogeneous nodes in terms of their capacity and
behaviour, and that, in many real-world manifestations, a few
nodes carry the bulk of the request service load. In this paper we
outline how we employ volunteering as the basic idea using
which we develop altruism-endowed self-organizing sharing
networks to help solve two open problems in large-scale peer-topeer
networks: (i) to develop an overlay topology structure that
enjoys better performance than DHT-structured networks and,
specifically, to offer O(log log N) routing performance in a
network of N nodes, instead of O(log N), and (ii) to efficiently
process complex queries and range queries, in particular.
Abstract: We consider the problem of preprocessing an n-vertex digraph with real edge weights so that
subsequent queries for the shortest path or distance between any two vertices can be efficiently answered. We
give algorithms that depend on the treewidth of the input graph. When the treewidth is a constant, our algorithms
can answer distance queries in O ({\'a}(n) ) time after O.n/ preprocessing. This improves upon previously known
results for the same problem.We also give a dynamic algorithm which, after a change in an edge weight, updates
the data structure in time O.n¯ /, for any constant 0 < ¯ < 1. Furthermore, an algorithm of independent interest
is given: computing a shortest path tree, or finding a negative cycle in linear time.
Abstract: We consider the problem of preprocessing an n-vertex digraph with real edge weights so that subsequent queries for the shortest path or distance between any two vertices can be efficiently answered. We give parallel algorithms for the EREW PRAM model of computation that depend on the treewidth of the input graph. When the treewidth is a constant, our algorithms can answer distance queries in O({\'a}(n)) time using a single processor, after a preprocessing of O(log2n) time and O(n) work, where {\'a}(n) is the inverse of Ackermann's function. The class of constant treewidth graphs contains outerplanar graphs and series-parallel graphs, among others. To the best of our knowledge, these are the first parallel algorithms which achieve these bounds for any class of graphs except trees. We also give a dynamic algorithm which, after a change in an edge weight, updates our data structures in O(log n) time using O(n{\^a}) work, for any constant 0 < {\^a} < 1. Moreover, we give an algorithm of independent interest: computing a shortest path tree, or finding a negative cycle in O(log2n) time using O(n) work.
Abstract: In emerging pervasive scenarios, data is collected by sensing devices in streams that occur at several distributed points of observation. The size of the data typically far exceeds the storage and computational capabilities of the tiny devices that have to collect and process them. A general and challenging task is to allow (some of) the nodes of a pervasive network to collectively perform monitoring of a neighbourhood of interest by issuing continuous aggregate queries on the streams observed in its vicinity. This class of algorithms is fully decentralized and diffusive in nature: collecting all the data at a few central nodes of the network is unfeasible in networks of low capability devices or in the presence of massive data sets. Two main problems arise in this scenario: (i) the intrinsic complexity of maintaining statistics over a data stream whose size greatly exceeds the capabilities of the device that performs the computation; (ii) composing the partial outcomes computed at different points of observation into an accurate, global statistic over a neighbourhood of interest, which entails coping with several problems, last but not least the receipt of duplicate information along multiple paths of diffusion.
Streaming techniques have emerged as powerful tools to achieve the general goals described above, in the first place because they assume a computational model in which computational and storage resources are assumed to be far exceeded by the amount of data on which computation occurs. In this contribution, we review the main streaming techniques and provide a classification of the computational problems and the applications they effectively address, with an emphasis on decentralized scenarios, which are of particular interest in pervasive networks
Abstract: In this paper, we present and study solutions for the efficient processing of queries over string attributes in a large P2P data network implemented with DHTs. The proposed solutions support queries with equality, prefix, suffix, and containment predicates over string attributes. Currently, no known solution to this problem exists.
We propose and study algorithms for processing such queries and their optimizations. As event-based, Publish/Subscribe information systems are a champion application class where string attribute (continuous) queries are very common, we pay particular attention to this type of data networks, formulating our solution in terms of this environment. A major design decision behind the proposed solution is our intention to provide a solution that is general (DHT-independent), capable of being implemented on top of any particular DHT.
Abstract: The content-based publish/subscribe (pub/sub)
paradigm for system design is becoming increasingly
popular, offering unique benefits for a large number of
data-intensive applications. Coupled with the peer-to-peer
technology, it can serve as a central building block for
such applications deployed over a large-scale network
infrastructure. A key problem toward the creation of
large-scale content-based pub/sub infrastructures relates to
dealing efficiently with continuous queries (subscriptions)
with rich predicates on string attributes; in this work we
study the problem of efficiently and accurately matching
substring queries to incoming events.
Abstract: In this paper we present the design of jWebDust, a generic and modular application environment for developing and managing applications based on wireless sensor networks that are accessible via the internet. Our software architecture provides a range of services that allow to create customized web-based applications with minimum implementation effort that are easy to administrate. We here present its open architecture, the most important design decisions, and discuss its distinct features and functionalities. jWebDust allows heterogeneous components to interoperate and the integrated management and control of multiple such networks by defining web-based mechanisms to visualize the network state, the results of queries, and a means to inject queries in the network.
Abstract: In this work we present the design of jWebDust, a
software environment for monitoring and controlling sensor networks via a web interface. Our software architecture provides a range of services that allow to create customized applications with minimum implementation effort that are easy to administrate. We present its open architecture, the most important design decisions, and discuss its distinct features and functionalities. jWebDust will allow heterogeneous components to operate in the same sensor network, and the integrated management and control of multiple such networks by defining web-based mechanisms to visualize the network state, the results of queries, and a means to inject queries in the network.
Abstract: Content integration of web data sources is becoming increasingly
important for the next generation information
systems. However, all proposed solutions are faced with
the same performance bottleneck: the network overhead
incurred when contacting the integrated e-sites. With this
demo paper we shall demonstrate the functionality of HyperHotel.
HyperHotel is used for finding appropriate hotel
rooms when travelling. Its novetlies are that it is designed
and implemented as an internet web-hotel content integration
application and that it is built on top of D.I.C.E. and
Co.In.S.; a novel content integration infrastructure consisting
of a domain-independent COntent INtegration System
and its Data Integration Cache Engine. We¢ll show how the
infrastructure of D.I.C.E. and Co.In.S. can be applied and
exploited in HyperHotel in order to improve the response
time of complex user queries. This exemplifies the significance
of this infrastructure since HyperHotel is representative
of a large class of e-commerce, content integration
applications.
Abstract: In this work we study how to process complex queries in DHT-based
Peer-to-Peer (P2P) data networks. Queries are made over tuples and relations
and are expressed in a query language, such as SQL. We describe existing
research approaches for query processing in P2P systems, we suggest
improvements and enhancements, and propose a unifying framework that
consists of a modified DHT architecture, data placement and search algorithms,
and provides efficient support for processing a variety of query types, including
queries with one or more attributes, queries with selection operators (involving
equality and range queries), and queries with join operators. To our knowledge,
this is the first work that puts forth a framework providing support for all these
query types.
Abstract: We consider optimal itinerary problems in time-table information systems supporting a vast number of on-line queries. We exhibit two important extensions of the time-dependent approach to model realistic versions of the Earliest Arrival and Minimum Number of Transfer problems, as well as of a combination of them, that could not be modeled by the original version of the time-dependent approach. We also provide heuristics that speed up implementations and present preliminary experimental results with real-world data.
Abstract: The peer-to-peer computing paradigm is an intriguing alternative to Google-style search
engines for querying and ranking Web content. In a network with many thousands or
millions of peers the storage and access load requirements per peer are much lighter
than for a centralized Google-like server farm; thus more powerful techniques from information
retrieval, statistical learning, computational linguistics, and ontological reasoning
can be employed on each peer¢s local search engine for boosting the quality
of search results. In addition, peers can dynamically collaborate on advanced and particularly
difficult queries. Moroever, a peer-to-peer setting is ideally suited to capture
local user behavior, like query logs and click streams, and disseminate and aggregate
this information in the network, at the discretion of the corresponding user, in order to
incorporate richer cognitive models.
This paper gives an overview of ongoing work in the EU Integrated Project DELIS
that aims to develop foundations for a peer-to-peer search engine with Google-or-better
scale, functionality, and quality, which will operate in a completely decentralized and
self-organizing manner. The paper presents the architecture of such a system and the
Minerva prototype testbed, and it discusses various core pieces of the approach: efficient
execution of top-k ranking queries, strategies for query routing when a search request
needs to be forwarded to other peers, maintaining a self-organizing semantic overlay
network, and exploiting and coping with user and community behavior.
Abstract: In many fields of application, shortest path finding problems
in very large graphs arise. Scenarios where large numbers of on-line
queries for shortest paths have to be processed in real-time appear for example
in traffic information systems. In such systems, the techniques considered
to speed up the shortest path computation are usually based on
precomputed information. One approach proposed often in this context
is a space reduction, where precomputed shortest paths are replaced by
single edges with weight equal to the length of the corresponding shortest
path. In this paper, we give a first systematic experimental study of
such a space reduction approach. We introduce the concept of multi-level
graph decomposition. For one specific application scenario from the field
of timetable information in public transport, we perform a detailed analysis
and experimental evaluation of shortest path computations based
on multi-level graph decomposition.