Abstract: Implementation of a commercial application to a
grid infrastructure introduces new challenges in managing the
quality-of-service (QoS) requirements, most stem from the fact
that negotiation on QoS between the user and the service provider
should strictly be satisfied. An interesting commercial application
with a wide impact on a variety of fields, which can benefit from
the computational grid technologies, is three–dimensional (3-D)
rendering. In order to implement, however, 3-D rendering to a
grid infrastructure, we should develop appropriate scheduling
and resource allocation mechanisms so that the negotiated (QoS)
requirements are met. Efficient scheduling schemes require
modeling and prediction of rendering workload. In this paper
workload prediction is addressed based on a combined fuzzy
classification and neural network model. Initially, appropriate
descriptors are extracted to represent the synthetic world. The
descriptors are obtained by parsing RIB formatted files, which
provides a general structure for describing computer-generated
images. Fuzzy classification is used for organizing rendering
descriptor so that a reliable representation is accomplished which
increases the prediction accuracy. Neural network performs
workload prediction by modeling the nonlinear input-output
relationship between rendering descriptors and the respective
computational complexity. To increase prediction accuracy, a
constructive algorithm is adopted in this paper to train the neural
network so that network weights and size are simultaneously
estimated. Then, a grid scheduler scheme is proposed to estimate
the queuing order that the tasks should be executed and the
most appopriate processor assignment so that the demanded
QoS are satisfied as much as possible. A fair scheduling policy is
considered as the most appropriate. Experimental results on a real
grid infrastructure are presented to illustrate the efficiency of the
proposed workload prediction — scheduling algorithm compared
to other approaches presented in the literature.

Abstract: In this paper we present a platform for developing mobile, locative and collaborative distributed games comprised of small programmable object technologies (e.g., wireless sensor networks) and traditional networked processors.
The platform is implemented using a combination of JAVA
Standard and Mobile editions, targeting also mobile phones
that have some kind of sensors installed. We brieﬂy present
the architecture of our platform and demonstrate its capabilities by reporting two pervasive multiplayer games. The key
characteristic of these games is that players interact with each
other and their surrounding environment by moving, running
and gesturing as a means to perform game related actions, using small programmable object technologies.

Abstract: In this paper, we present a Programmable Packet Processing Engine suitable for deep header processing in high-speed networking systems.
The engine, which has been – fabricated as part of a complete networkprocessor, consists of a typical RISC-CPU, whose register
Wle has been modiWed in order to support eYcient context switching, and two simple special-purpose processing units. The engine can be
used in a number of network processing units (NPUs), as an alternative to the typical design practice of employing a large number of simple
general purpose processors, or in any other embedded system designed to process mainly network protocols. To assess the performance
of the engine, we have proWled typical networking applications and a series of experiments were carried out. Further, we have
compared the performance of our processing engine to that of two widely used NPUs and show that our proposed packet-processing
engine can run speciWc applications up to three times faster. Moreover, the engine is simpler to be fabricated, less complex in terms of
hardware complexity, while it can still be very easily programmed.

Abstract: In this paper we describe a new simulation platform for heterogeneous distributed systems comprised of small programmable objects (e.g., wireless sensor networks) and traditional networked processors. Simulating such systems is complicated because of the need to coordinate compilers and simulators, often with very different interfaces, options, and fidelities.
Our platform (which we call ADAPT) is a flexible and extensible environment that provides a highly scalable simulator with unique characteristics. While the platform provides advanced functionality such as real-time simulation monitoring, custom topologies and scenarios, mixing real and simulated nodes, etc., the effort required by the user and the impact to her code is minimal. We here present its architecture, the most important design decisions, and discuss its distinct features and functionalities. We integrate our simulator to the Sun SPOT platform to enable simulation of sensing applications that employ both low-end and high-end devices programmed with different languages that are internetworked with heterogeneous technologies. We believe that ADAPT will make the development of applications that use small programmable objects more widely accessible and will enable researchers to conduct a joint research approach that combines both theory and practice.

Abstract: Andrews et al. [Automatic method for hiding latency in high bandwidth networks, in: Proceedings of the ACM Symposium on Theory of Computing, 1996, pp. 257–265; Improved methods for hiding latency in high bandwidth networks, in: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, 1996, pp. 52–61] introduced a number of techniques for automatically hiding latency when performing simulations of networks with unit delay links on networks with arbitrary unequal delay links. In their work, they assume that processors of the host network are identical in computational power to those of the guest network being simulated. They further assume that the links of the host are able to pipeline messages, i.e., they are able to deliver P packets in time O(P+d) where d is the delay on the link.
In this paper we examine the effect of eliminating one or both of these assumptions. In particular, we provide an efficient simulation of a linear array of homogeneous processors connected by unit-delay links on a linear array of heterogeneous processors connected by links with arbitrary delay. We show that the slowdown achieved by our simulation is optimal. We then consider the case of simulating cliques by cliques; i.e., a clique of heterogeneous processors with arbitrary delay links is used to simulate a clique of homogeneous processors with unit delay links. We reduce the slowdown from the obvious bound of the maximum delay link to the average of the link delays. In the case of the linear array we consider both links with and without pipelining. For the clique simulation the links are not assumed to support pipelining.
The main motivation of our results (as was the case with Andrews et al.) is to mitigate the degradation of performance when executing parallel programs designed for different architectures on a network of workstations (NOW). In such a setting it is unlikely that the links provided by the NOW will support pipelining and it is quite probable the processors will be heterogeneous. Combining our result on clique simulation with well-known techniques for simulating shared memory PRAMs on distributed memory machines provides an effective automatic compilation of a PRAM algorithm on a NOW.

Abstract: A constraint network is arc consistent if any value of any of its variables is compatible with at
least one value of any other variable. The Arc Consistency Problem (ACP) consists in filtering out values of
the variables of a given network to obtain one that is arc consistent, without eliminating any solution. ACP is
known to be inherently sequential, or P-complete, so in this paper we examine some weaker versions of it and
their parallel complexity. We propose several natural approximation schemes for ACP and show that they are also
P-complete. In an attempt to overcome these negative results, we turn our attention to the problem of filtering
out values from the variables so that each value in the resulting network is compatible with at least one value of
not necessarily all, but a constant fraction of the other variables. We call such a network partially arc consistent.
We give a parallel algorithm that, for any constraint network, outputs a partially arc consistent subnetwork of it in
sublinear (O.pn log n/) parallel time using O.n2/ processors. This is the first (to our knowledge) sublinear-time
parallel algorithm with polynomially many processors that guarantees that in the resulting network every value is
compatible with at least one value in at least a constant fraction of the remaining variables. Finally, we generalize
the notion of partiality to the k-consistency problem.

Abstract: Partitioned Optimal Passive Stars network, POPS(d,g), is an optical interconnection network of N processors (N=dg) which uses g2 optical passive star couplers. The processors of this network are partitioned into g groups of d processors each and the g2 couplers are used for connecting each group with each of the groups, including itself. In this paper, we present an optimal embedding of the hypercube on this network for all combinations of values of d and g. Specifically, we show how to optimally simulate the most common hypercube communication pattern where each hypercube node sends a packet along the same dimension. Optimal simulation of this communication on the POPS(d,g) network has already been presented for d {\^a}‰¤ g in the literature, but for the case d> g, the optimality remained an open problem. Now, we show that an optimal simulation is feasible in this case too.

Abstract: Partitioned Optimal Passive Stars network, POPS(d,g), is an optical interconnection network of N processors (N=dg) with g 2 optical passive star couplers. In this network, there are g groups of d processors each and the g 2 couplers are used for connecting each group with each of the groups, including itself. In this paper, we present a technique for optimally simulating a frequently arising hypercube communication pattern on this network for all combinations of values of d and g. Specifically, we show that one-hop movements on the hypercube along the same dimension can be simulated on the POPS(d,g) network in $\lceil \frac{d}{g}\rceil$ slots for d≠g and in 2 slots for d=g.

Abstract: In this paper we present an efficient general simulation strategy for
computations designed for fully operational BSP machines of n ideal processors,
on n-processor dynamic-fault-prone BSP machines. The fault occurrences are failstop
and fully dynamic, i.e., they are allowed to happen on-line at any point of the
computation, subject to the constraint that the total number of faulty processors
may never exceed a known fraction. The computational paradigm can be exploited
for robust computations over virtual parallel settings with a volatile underlying
infrastructure, such as a NETWORK OF WORKSTATIONS (where workstations may be
taken out of the virtual parallel machine by their owner).
Our simulation strategy is Las Vegas (i.e., it may never fail, due to backtracking
operations to robustly stored instances of the computation, in case of locally
unrecoverable situations). It adopts an adaptive balancing scheme of the workload
among the currently live processors of the BSP machine.
Our strategy is efficient in the sense that, compared with an optimal off-line
adversarial computation under the same sequence of fault occurrences, it achieves an O
¡
.log n ¢ log log n/2¢
multiplicative factor times the optimal work (namely, this
measure is in the sense of the “competitive ratio” of on-line analysis). In addition,
our scheme is modular, integrated, and considers many implementation points.
We comment that, to our knowledge, no previous work on robust parallel computations
has considered fully dynamic faults in the BSP model, or in general distributed
memory systems. Furthermore, this is the first time an efficient Las Vegas
simulation in this area is achieved.