Aigaion: RACTI / RU1 Technical Report Series (Web Based)

[RACTI-RU1-2004-2] Ntarmos, Nikos and Triantafillou, Peter, AESOP: Altruism-Endowed Self-Organizing Peers, in: 2nd International Workshop on Databases, Information Systems, and Peer-to-Peer Computing (DBISP2P 2004), pages 151-165, Toronto, Canada, 2004.
Abstract: We argue the case for a new paradigm for architecting structured P2P overlay networks, coined AESOP. AESOP consists of 3 layers: (i) an architecture, PLANES, that ensures significant performance speedups, assuming knowledge of altruistic peers; (ii) an accounting/auditing layer, AltSeAl, that identifies and validates altruistic peers; and (iii) SeAledPLANES, a layer that facilitates the coordination/collaboration of the previous two components. We briefly present these components along with experimental and analytical data of the promised significant performance gains and the related overhead. In light of these very encouraging results, we put this three-layer architecture paradigm forth as the way to structure the P2P overlay networks of the future.
[RACTI-RU1-2012-25] Michail, Othon, Chatzigiannakis, Ioannis and Spirakis, Paul, Brief Announcement: Naming and Counting in Anonymous Unknown Dynamic Networks, in: 26th international conference on Distributed Computing (DISC), pages 437-438, Springer-Verlag Berlin Heidelberg, Salvador, Brazil, 2012. [DOI]
Abstract: We study the fundamental naming and counting problems in networks that are anonymous, unknown, and possibly dynamic. Network dynamicity is modeled by the 1-interval connectivity model [KLO10]. We first prove that on static networks with broadcast counting is impossible to solve without a leader and that naming is impossible to solve even with a leader and even if nodes know n. These impossibilities carry over to dynamic networks as well. With a leader we solve counting in linear time. Then we focus on dynamic networks with broadcast. We show that if nodes know an upper bound on the maximum degree that will ever appear then they can obtain an upper bound on n. Finally, we replace broadcast with one-to-each, in which a node may send a different message to each of its neighbors. This variation is then proved to be computationally equivalent to a full-knowledge model with unique names.
[RACTI-RU1-2012-26] Michail, Othon, Chatzigiannakis, Ioannis and Spirakis, Paul, Causality, Influence, and Computation in Possibly Disconnected Synchronous Dynamic Networks, in: 16th International Conference On Principles Of DIstributed Systems (OPODIS), pages 269-283, Springer-Verlag Berlin Heidelberg, Rome, Italy, 2012. [DOI]
Abstract: In this work, we study the propagation of influence and computation in dynamic networks that are possibly disconnected at every instant. We focus on a synchronous message passing communication model with broadcast and bidirectional links. To allow for bounded end-to-end communication we propose a set of minimal temporal connectivity conditions that bound from the above the time it takes for information to make progress in the network. We show that even in dynamic networks that are disconnected at every instant information may spread as fast as in networks that are connected at every instant. Further, we investigate termination criteria when the nodes know some upper bound on each of the temporal connectivity conditions. We exploit our termination criteria to provide efficient protocols (optimal in some cases) that solve the fundamental counting and all-to-all token dissemination (or gossip) problems. Finally, we show that any protocol that is correct in instantaneous connectivity networks can be adapted to work in temporally connected networks.
[RACTI-RU1-2014-6] Michail, Othon, Chatzigiannakis, Ioannis and Spirakis, Paul, Causality, Influence, and Computation in Possibly Disconnected Synchronous Dynamic Networks, in: Journal of Parallel and Distributed Computing (JPDC), volume 74, number 1, pages 2016-2026, 2014. [DOI]
Abstract: In this work, we study the propagation of influence and computation in dynamic distributed computing systems that are possibly disconnected at every instant. We focus on a synchronous message-passing communication model with broadcast and bidirectional links. Our network dynamicity assumption is a worst-case dynamicity controlled by an adversary scheduler, which has received much attention recently. We replace the usual (in worst-case dynamic networks) assumption that the network is connected at every instant by minimal temporal connectivity conditions. Our conditions only require that another causal influence occurs within every time window of some given length. Based on this basic idea, we define several novel metrics for capturing the speed of information spreading in a dynamic network. We present several results that correlate these metrics. Moreover, we investigate termination criteria in networks in which an upper bound on any of these metrics is known. We exploit our termination criteria to provide efficient (and optimal in some cases) protocols that solve the fundamental counting and all-to-all token dissemination (or gossip) problems.
[RACTI-RU1-2008-35] Chatzigiannakis, Ioannis, Communication Algorithms for Ad Hoc Mobile Networks Using Random Walks, Encyclopedia of Algorithms, pages 161-165, Springer Verlag, ISBN 978-0-387-30770-1, 2008. [DOI]
Abstract: The problem of communication among mobile nodes is one of the most fundamental problems in ad hoc mobile networks and is at the core of many algorithms, such as for counting the number of nodes, electing a leader, data processing etc. For an exposition of several important problems in ad hoc mobile networks. The work of Chatzigiannakis, Nikoletseas and Spirakis focuses on wireless mobile networks that are subject to highly dynamic structural changes created by mobility, channel fluctuations and device failures. These changes affect topological connectivity, occur with high frequency and may not be predictable in advance. Therefore, the environment where the nodes move (in three-dimensional space with possible obstacles) as well as the motion that the nodes perform are \textit{input} to any distributed algorithm.
[RACTI-RU1-2015-5] Michail, Othon, Chatzigiannakis, Ioannis and Spirakis, Paul, Computing in Dynamic Networks, in: Computational Network Theory: Theoretical Foundations and Applications, First Edition, Wiley-VCH Verlag GmbH & Co. KGaA, 2015.
Abstract: In this chapter, our focus is on computational network analysis from a theoretical point of view. In particular, we study the \emph{propagation of influence and computation in dynamic distributed computing systems}. We focus on a \emph{synchronous message passing} communication model with bidirectional links. Our network dynamicity assumption is a \emph{worst-case dynamicity} controlled by an adversary scheduler, which has received much attention recently. We first study the fundamental \emph{naming} and \emph{counting} problems (and some variations) in networks that are \emph{anonymous}, \emph{unknown}, and possibly dynamic. Network dynamicity is modeled here by the \emph{1-interval connectivity model}, in which communication is synchronous and a (worst-case) adversary chooses the edges of every round subject to the condition that each instance is connected. We then replace this quite strong assumption by minimal \emph{temporal connectivity} conditions. These conditions only require that \emph{another causal influence occurs within every time-window of some given length}. Based on this basic idea we define several novel metrics for capturing the speed of information spreading in a dynamic network. We present several results that correlate these metrics. Moreover, we investigate \emph{termination criteria} in networks in which an upper bound on any of these metrics is known. We exploit these termination criteria to provide efficient (and optimal in some cases) protocols that solve the fundamental \emph{counting} and \emph{all-to-all token dissemination} (or \emph{gossip}) problems. Finally, we propose another model of worst-case temporal connectivity, called \emph{local communication windows}, that assumes a fixed underlying communication network and restricts the adversary to allow communication between local neighborhoods in every time-window of some fixed length. We prove some basic properties and provide a protocol for counting in this model.
[RACTI-RU1-2014-1] Chatzigiannakis, Ioannis, di Luna, G. A, Baldoni, Roberto and Bonomi, Silvia, Conscious and Unconscious Counting on Anonymous Dynamic Networks, 2014.
Abstract: This paper addresses the problem of counting the size of a network where (i) processes have the same identifiers (anonymous nodes) and (ii) the et- work topology constantly changes (dynamic network). Changes are riven by a powerful adversary that can look at internal process states and add and remove edges in order to contrast the convergence of the algorithm to the correct count. The paper proposes two leader-based counting algorithms. Such algorithms are based on a technique that mimics an energy-transfer between network nodes. The first algorithm assumes that the adversary cannot generate either disconnected network graphs or network graphs where nodes have degree greater than D. In such algorithm, the leader can count the size of the network and detect the counting termination in a finite time (i.e., conscious counting algorithm). The second algorithm assumes that the adversary only keeps the network graph connected at any time and we prove that the leader can still converge to a correct count in a finite number of rounds, but it is not conscious when this convergence happens.
[RACTI-RU1-2006-10] Ntarmos, Nikos, Triantafillou, Peter and Weikum, Gerhard, Counting at large: Efficient cardinality estimation in Internet-scale data networks, in: 22nd International Conference on Data Engineering (ICDE 2006), 2006.
Abstract: Counting in general, and estimating the cardinality of (multi-) sets in particular, is highly desirable for a large variety of applications, representing a foundational block for the efficient deployment and access of emerging internet-scale information systems. Examples of such applications range from optimizing query access plans in internet-scale databases, to evaluating the significance (rank/score) of various data items in information retrieval applications. The key constraints that any acceptable solution must satisfy are: (i) efficiency: the number of nodes that need be contacted for counting purposes must be small in order to enjoy small latency and bandwidth requirements; (ii) scalability, seemingly contradicting the efficiency goal: arbitrarily large numbers of nodes nay need to add elements to a (multi-) set, which dictates the need for a highly distributed solution, avoiding server-based scalability, bottleneck, and availability problems; (iii) access and storage load balancing: counting and related overhead chores should be distributed fairly to the nodes of the network; (iv) accuracy: tunable, robust (in the presence of dynamics and failures) and highly accurate cardinality estimation; (v) simplicity and ease of integration: special, solution-specific indexing structures should be avoided. In this paper, first we contribute a highly-distributed, scalable, efficient, and accurate (multi-) set cardinality estimator. Subsequently, we show how to use our solution to build and maintain histograms, which have been a basic building block for query optimization for centralized databases, facilitating their porting into the realm of internet-scale data networks.
[RACTI-RU1-2013-10] di Luna, G. A, Baldoni, Roberto, Bonomi, Silvia and Chatzigiannakis, Ioannis, Counting the Number of Homonyms in Dynamic Networks, in: 15th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS 2013), Springer, Osaka, Japan, 2013.
Abstract: We consider a synchronous distributed system with n processes that communicate through a dynamic network guaranteeing 1-interval connectivity i.e., the network topology graph might change at each interval while keeping the graph connected at any time. The processes belonging to the distributed system are identified through a set of labels L = {l1 , l2 . . . , lk } (with 1 ≤ k < n). In this challenging system model, the paper addresses the following problem: �counting the number of processes with the same label�. We provide a distributed algorithm that is able solve the problem based on the notion of energy transfer. Each process owns a fixed energy charge, and tries to discharge itself exchanging, at each round, at most half of its charge with neighbors. The paper also discusses when such counting is possible in the presence of failures. Counting processes with the same label in dynamic networks with homonyms is of great importance because it is as difficult as computing generic aggregating functions.
[RACTI-RU1-2009-87] Ntarmos, Nikos, Triantafillou, Peter and Weikum, Gerhard, Distributed Hash Sketches: Scalable, Efficient, and Accurate Cardinality Estimation for Distributed Multisets, in: ACM Transactions on Computer Systems, ACM TOCS, 2009.
Abstract: Counting items in a distributed system, and estimating the cardinality of multisets in particular, is important for a large variety of applications and a fundamental building block for emerging Internet-scale information systems. Examples of such applications range from optimizing query access plans in peer-to-peer data sharing, to computing the significance (rank/score) of data items in distributed information retrieval. The general formal problem addressed in this article is computing the network-wide distinct number of items with some property (e.g., distinct files with file name containing “spiderman”) where each node in the network holds an arbitrary subset, possibly overlapping the subsets of other nodes. The key requirements that a viable approach must satisfy are: (1) scalability towards very large network size, (2) efficiency regarding messaging overhead, (3) load balance of storage and access, (4) accuracy of the cardinality estimation, and (5) simplicity and easy integration in applications. This article contributes the DHS (Distributed Hash Sketches) method for this problem setting: a distributed, scalable, efficient, and accurate multiset cardinality estimator. DHSis based on hash sketches for probabilistic counting, but distributes the bits of each counter across network nodes in a judicious manner based on principles of Distributed Hash Tables, paying careful attention to fast access and aggregation as well as update costs. The article discusses various design choices, exhibiting tunable trade-offs between estimation accuracy, hop-count efficiency, and load distribution fairness. We further contribute a full-fledged, publicly available, open-source implementation of all our methods, and a comprehensive experimental evaluation for various settings.
[RACTI-RU1-2009-18] Resvanis, Michalis and Chatzigiannakis, Ioannis, Experimental Evaluation of Duplicate Insensitive Counting Algorithms, in: 13th Panhellenic Conference on Informatics (PCI 2009), IEEE, PCI, Corfu, Greece, 2009.
Abstract: Online and Realtime counting and estimating the cardinality of sets is highly desirable for a large variety of applications, representing a foundational block for the efficient deployment and access of emerging internet scale information systems. In this work we implement three well known duplicate insensitive counting algorithms and evaluate their performance in a testbed of resource-limited commercial off-the-shelf hardware devices. We focus on devices that can be used in wireless mobile and sensor applications and evaluate the memory complexity, time complexity and absolute error of the algorithms under different realistic scenaria. Our findings indicate the suitability of each algorithm depending on the application characteristics.
[RACTI-RU1-2006-15] Michel, Sebastian, Bender, Matthias, Triantafillou, Peter and Weikum, Gerhard, Global Document Frequency Estimation in Peer-to-Peer Web Search, in: 9th International Workshop on the Web and Databases (WebDB 2006), pages 62-67, 2006.
Abstract: Information retrieval (IR) in peer-to-peer (P2P) networks, where the corpus is spread across many loosely coupled peers, has recently gained importance. In contrast to IR systems on a centralized server or server farm, P2P IR faces the additional challenge of either being oblivious to global corpus statistics or having to compute the global measures from local statistics at the individual peers in an efficient, distributed manner. One specific measure of interest is the global document frequency for different terms, which would be very beneficial as term-specific weights in the scoring and ranking of merged search results that have been obtained from different peers. This paper presents an efficient solution for the problem of estimating global document frequencies in a large-scale P2P network with very high dynamics where peers can join and leave the network on short notice. In particular, the developed method takes into account the fact that the lo- cal document collections of autonomous peers may arbitrar- ily overlap, so that global counting needs to be duplicate- insensitive. The method is based on hash sketches as a technique for compact data synopses. Experimental stud- ies demonstrate the estimator?s accuracy, scalability, and ability to cope with high dynamics. Moreover, the benefit for ranking P2P search results is shown by experiments with real-world Web data and queries.
[RACTI-RU1-2006-17] Michel, Sebastian, Bender, Matthias, Triantafillou, Peter and Weikum, Gerhard, IQN Routing: Integrating Quality and Novelty in P2P Querying and Ranking, in: 10th International Conference on Extending Database Technology (EDBT 2006), pages 62-67, 2006.
Abstract: Information retrieval (IR) in peer-to-peer (P2P) networks, where the corpus is spread across many loosely coupled peers, has recently gained importance. In contrast to IR systems on a centralized server or server farm, P2P IR faces the additional challenge of either being oblivious to global corpus statistics or having to compute the global measures from local statistics at the individual peers in an efficient, distributed manner. One specific measure of interest is the global document frequency for different terms, which would be very beneficial as term-specific weights in the scoring and ranking of merged search results that have been obtained from different peers. This paper presents an efficient solution for the problem of estimating global document frequencies in a large-scale P2P network with very high dynamics where peers can join and leave the network on short notice. In particular, the developed method takes into account the fact that the lo- cal document collections of autonomous peers may arbitrar- ily overlap, so that global counting needs to be duplicate- insensitive. The method is based on hash sketches as a technique for compact data synopses. Experimental stud- ies demonstrate the estimator?s accuracy, scalability, and ability to cope with high dynamics. Moreover, the benefit for ranking P2P search results is shown by experiments with real-world Web data and queries.
[RACTI-RU1-2013-8] Michail, Othon, Chatzigiannakis, Ioannis and Spirakis, Paul, Naming and Counting in Anonymous Unknown Dynamic Networks, in: 15th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS), pages 281-295, Springer International Publishing Switzerland, Osaka, Japan, 2013.
Abstract: In this work, we study the fundamental naming and counting problems (and some variations) in networks that are anonymous, unknown, and possibly dynamic. In counting, nodes must determine the size of the network n and in naming they must end up with unique identities. By anonymous we mean that all nodes begin from identical states apart possibly from a unique leader node and by unknown that nodes have no a priori knowledge of the network (apart from some minimal knowledge when necessary) including ignorance of n. Network dynamicity is modeled by the 1-interval connectivity model [KLO10], in which communication is synchronous and a (worst-case) adversary chooses the edges of every round subject to the condition that each instance is connected. We first focus on static networks with broadcast where we prove that, without a leader, counting is impossible to solve and that naming is impossible to solve even with a leader and even if nodes know n. These impossibilities carry over to dynamic networks as well. We also show that a unique leader suffices in order to solve counting in linear time. Then we focus on dynamic networks with broadcast. We conjecture that dynamicity renders nontrivial computation impossible. In view of this, we let the nodes know an upper bound on the maximum degree that will ever appear and show that in this case the nodes can obtain an upper bound on n. Finally, we replace broadcast with one-to-each, in which a node may send a different message to each of its neighbors. Interestingly, this natural variation is proved to be computationally equivalent to a full-knowledge model, in which unique names exist and the size of the network is known.
[RACTI-RU1-2001-7] Chatzigiannakis, Ioannis, Nikoletseas, Sotiris and Spirakis, Paul, On the Average and Worst-case Efficiency of Some New Distributed Communication and Control Algorithms for Ad-hoc Mobile Networks, in: 1st ACM International Workshop on Principles of Mobile Computing (POMC), pages 1-19, ACM Press, Rhode Island, USA, 2001.
Abstract: An ad-hoc mobile network is a collection of mobile hosts, with wireless communication capabilities, forming a temporary network without the aid of any established fixed infrastructure. In such networks, topological connectivity is subject to frequent, unpredictable change. Our work focuses on networks with high rate of such changes to connectivity. For such dynamic changing networks we propose protocols which exploit the co-ordinated (by the protocol) motion of a small part of the network. We show that such protocols can be designed to work correctly and efficiently even in the case of arbitrary (but not malicious) movements of the hosts not affected by the protocol. We also propose a methodology for the analysis of the expected behaviour of protocols for such networks, based on the assumption that mobile hosts (whose motion is not guided by the protocol) conduct concurrent random walks in their motion space. Our work examines some fundamental problems such as pair-wise communication, election of a leader and counting, and proposes distributed algorithms for each of them. We provide their proofs of correctness, and also give rigorous analysis by combinatorial tools and also via experiments.
[RACTI-RU1-2010-64] Kokkinos, Panagiotis, Manousakis, Kostas and Varvarigos, Emmanouel, Path Protection in WDM Networks with Quality of Transmission Limitations, in: International Conference on Communications, ICC 2010., 2010.
Abstract: We consider path protection in the routing and wavelength assignment (RWA) problem for impairment constrained WDM optical networks. The proposed multicost RWA algorithms select the primary and the backup lightpaths by accounting for physical layer impairments. The backup lightpath may either be activated (1+1 protection) or it may be reserved and not activated, with activation taking place when/if needed (1:1 protection). In case of 1:1 protection the period of time where the quality of its transmission (QoT) is valid, despite the possible establishment of future connections, should be preserved, so as to be used in case the primary lightpath fails. We show that, by using the multicost approach for solving the RWA with protection problem, great benefits can be achieved both in terms of the connection blocking rate and in terms of the validity period of the backup lightpath. Moreover the multicost approach, by providing a set of candidate lightpaths for each source destination pair, instead of a single one, offers ease and flexibility in selecting the primary and the backup lightpaths.
[RACTI-RU1-2008-54] Christodoulopoulos, Konstantinos and Varvarigos, Emmanouel, Routing and Scheduling in Grids, in: 10th Anniversary International Conference on Transparent Optical Networks, pages 170-174, ICTON 3, Athens, Greece, 2008. [DOI]
Abstract: We propose QoS-aware scheduling algorithms for Grid Networks that are capable of optimally or near-optimally assigning computation and communication tasks to grid resources. The routing and scheduling algorithms to be presented take as input the resource utilization profiles and the task characteristics and QoS requirements, and co-allocate resources while accounting for the dependencies between communication and computation tasks. Keywords: communication and computation utilization profiles, multicost routing and scheduling, grid computing.
[RACTI-RU1-2004-41] Ntarmos, Nikos and Triantafillou, Peter, SeAl: Managing Accesses and Data in Peer-to-Peer Sharing Networks, in: 3rd Hellenic Data Management Symposium (HDMS 2004), 2004.
Abstract: We present SeAl1, a novel data/resource and data-access management infrastructure designed for the purpose of addressing a key problem in P2P data sharing networks, namely the problem of wide-scale selfish peer behavior. Selfish behavior has been manifested and well documented and it is widely accepted that unless this is dealt with, the scalability, efficiency, and the usefulness of P2P sharing networks will be diminished. SeAl essentially consists of a monitoring/accounting subsystem, an auditing/verification subsystem, and incentive mechanisms. The monitoring subsystem facilitates the classification of peers into selfish/altruistic. The auditing/verification layer provides a shield against perjurer/slandering and colluding peers that may try to cheat the monitoring subsystem. The incentives mechanisms efectively utilize these layers so to increase the computational/networking and data resources that are available to the community. Our extensive performance results show that SeAl performs its tasks swiftly, while the overhead introduced by our accounting and auditing mechanisms in terms of response time, network, and storage overheads are very small.
[RACTI-RU1-2004-3] Ntarmos, Nikos and Triantafillou, Peter, SeAl: Managing Accesses and Data in Peer-to-Peer Sharing Networks, in: 4th IEEE International Conference on Peer-to-Peer Computing (P2P 2004), Zurich, Switzerland, 2004.
Abstract: We present SeAl, a novel data/resource and data-access management infrastructure designed for the purpose of addressing a key problem in P2P data sharing networks, namely the problem of wide-scale selfish peer behavior. Selfish behavior has been manifested and well documented and it is widely accepted that unless this is dealt with, the scalability, efficiency, and the usefulness of P2P sharing networks will be diminished. SeAl essentially consists of a monitoring/accounting subsystem, an auditing/verification subsystem, and incentive mechanisms. The monitoring subsystem facilitates the classification of peers into selfish/altruistic. The auditing/verification layer provides a shield against perjurer/slandering and colluding peers that may try to cheat the monitoring subsystem. The incentives mechanisms effectively utilize these layers so to increase the computational/networking and data resources that are available to the community. Our extensive performance results show that SeAl performs its tasks swiftly, while the overhead introduced by our accounting and auditing mechanisms in terms of response time, network, and storage overheads are very small.
[RACTI-RU1-2015-4] Michail, Othon, Terminating Distributed Construction of Shapes and Patterns in a Fair Solution of Automata, in: 34th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC), pages 37-46, ACM, Donostia-San Sebastián, Spain, 2015.
Abstract: In this work, we consider a \emph{solution of automata} similar to \emph{Population Protocols} and \emph{Network Constructors}. The automata (also called \emph{nodes}) move passively in a well-mixed solution without being capable of controlling their movement. However, the nodes can \emph{cooperate} by interacting in pairs. Every such interaction may result in an update of the local states of the nodes. Additionally, the nodes may also choose to connect to each other in order to start forming some required structure. We may think of such nodes as the \emph{smallest possible programmable pieces of matter}, like tiny nanorobots or programmable molecules. The model that we introduce here is a more applied version of Network Constructors, imposing \emph{physical} (or \emph{geometrical}) \emph{constraints} on the connections that the nodes are allowed to form. Each node can connect to other nodes only via a very limited number of \emph{local ports}, which implies that at any given time it has only a \emph{bounded number of neighbors}. Connections are always made at \emph{unit distance} and are \emph{perpendicular to connections of neighboring ports}. Though such a model cannot form abstract networks like Network Constructors, it is still capable of forming very practical \emph{2D or 3D shapes}. We provide direct constructors for some basic shape construction problems, like \emph{spanning line}, \emph{spanning square}, and \emph{self-replication}. We then develop \emph{new techniques} for determining the computational and constructive capabilities of our model. One of the main novelties of our approach, concerns our attempt to overcome the inability of such systems to detect termination. In particular, we exploit the assumptions that the system is well-mixed and has a unique leader, in order to \emph{give terminating protocols that are correct with high probability}. This allows us to develop terminating subroutines that can be \emph{sequentially composed} to form larger \emph{modular protocols} (which has not been the case in the relevant literature). One of our main results is a \emph{terminating protocol counting the size $n$ of the system} with high probability. We then use this protocol as a subroutine in order to develop our \emph{universal constructors}, establishing that \emph{it is possible for the nodes to become self-organized with high probability into arbitrarily complex shapes while still detecting termination of the construction}.
[RACTI-RU1-2003-35] Triantafillou, Peter and Aekaterinidis, Ioannis, Web Proxy Cache Replacement: Dos, Donts, and Expectations, in: 2nd International IEEE Symposium on Network Computing and Applications, pages 59-66, 2003.
Abstract: Numerous research efforts have produced a large number of algorithms and mechanisms for web proxy caches. In order to build powerful web proxies and understand their performance, one must be able to appreciate the impact and significance of earlier contributions and how they can be integrated. To do this we employ a cache replacement algorithm, 'CSP', which integrates key knowledge from previous work. CSP utilizes the communication Cost to fetch web objects, the objects' Sizes, their Popularities, an auxiliary cache and a cache admission control algorithm. We study the impact of these components with respect to hit ratio, latency, and bandwidth requirements. Numerous research efforts have produced a large number of algorithms and mechanisms for web proxy caches. In order to build powerful web proxies and understand their performance, one must be able to appreciate the impact and significance of earlier contributions and how they can be integrated To do this we employ a cache replacement algorithm, 'CSP, which integrates key knowledge from previous work. CSP utilizes the communication Cost to fetch web objects, the objects' Sizes, their Popularifies, an auxiliary cache and a cache admission control algorithm. We study the impact of these components with respect to hit ratio, latency, and bandwidth requirements. Our results show that there are clear performance gains when utilizing the communication cost, the popularity of objects, and the auxiliary cache. In contrast, the size of objects and the admission controller have a negligible performance impact. Our major conclusions going against those in related work are that (i) LRU is preferable to CSP for important parameter values, (ii) accounting for the objects' sizes does not improve latency and/or bandwidth requirements, and (iii) the collaboration of nearby proxies is not very beneficial. Based on these results, we chart the problem solution space, identifying which algorithm is preferable and under which conditions. Finally, we develop a dynamic replacement algorithm that continuously utilizes the best algorithm as the problem-parameter values (e.g., the access distributions) change with time.