In this paper we present an efficient general simulation strategy for
computations designed for fully operational BSP machines of n ideal processors,
on n-processor dynamic-fault-prone BSP machines. The fault occurrences are failstop
and fully dynamic, i.e., they are allowed to happen on-line at any point of the
computation, subject to the constraint that the total number of faulty processors
may never exceed a known fraction. The computational paradigm can be exploited
for robust computations over virtual parallel settings with a volatile underlying
infrastructure, such as a NETWORK OF WORKSTATIONS (where workstations may be
taken out of the virtual parallel machine by their owner).
Our simulation strategy is Las Vegas (i.e., it may never fail, due to backtracking
operations to robustly stored instances of the computation, in case of locally
unrecoverable situations). It adopts an adaptive balancing scheme of the workload
among the currently live processors of the BSP machine.
Our strategy is efficient in the sense that, compared with an optimal off-line
adversarial computation under the same sequence of fault occurrences, it achieves an O
.log n ¢ log log n/2¢
multiplicative factor times the optimal work (namely, this
measure is in the sense of the “competitive ratio” of on-line analysis). In addition,
our scheme is modular, integrated, and considers many implementation points.
We comment that, to our knowledge, no previous work on robust parallel computations
has considered fully dynamic faults in the BSP model, or in general distributed
memory systems. Furthermore, this is the first time an efficient Las Vegas
simulation in this area is achieved.