L. Gasieniec, A. Pelc, Adaptive broadcasting with faulty nodes,
*Parallel Computing* 22 (1996), 903-912.

Abstract:

We consider broadcasting from a fault-free source to all nodes of a completely
connected *n*-node network in the presence of *k* faulty nodes.
Every node can communicate with at most one other node in a unit
of time and during this period every pair of communicating nodes can exchange information packets. Faulty nodes cannot send information.
Broadcasting is adaptive, i.e., a node schedules its next communication
on the basis of information currently available to it.
Assuming that the fraction of faulty nodes is bounded by a constant
smaller than 1, we construct a broadcasting algorithm working in worst-case
time *O(log ^{2} n)*.

A. Pelc, Fault-tolerant broadcasting and gossiping in
communication networks,
*Networks* 28 (1996), 143-156.

A preliminary version appeared as: A. Pelc, Fast fault-tolerant broadcasting and gossiping (invited paper), Proc. 2nd Colloquium on Structural Information and Communication Complexity, SIROCCO'95, June 1995, Olympia, Greece, pp. 159-172.

Abstract:

Broadcasting and gossiping are fundamental tasks in network communication. In broadcasting, or one-to-all communication, information originally held in one node of the network (called the source) must be transmitted to all other nodes. In gossiping, or all-to-all communication, every node holds a message which has to be transmitted to all other nodes. As communication networks grow in size, they become increasingly vulnerable to component failures. Thus, capabilities for fault-tolerant broadcasting and gossiping gain importance. The present paper is a survey of the fast growing area of research investigating these capabilities. We focus on two most important efficiency measures of broadcasting and gossiping algorithms: running time and number of elementary transmissions required by the communication process. We emphasize the unifying thread in most results from the research in fault-tolerant communication: the trade-offs between efficiency of communication schemes and their fault-tolerance.

B. S. Chlebus, K. Diks, A. Pelc, Reliable broadcasting in hypercubes with random link and node failures,
*Combinatorics, Probability & Computing* 5 (1996), 337-350.

Abstract:

We consider the problem of broadcasting in an *n*-node hypercube
whose links and nodes fail independently with given probabilities
*p < 1* and *q < 1*, respectively. Information held in a fault-free
node, called the * source *, has to reach all other fault-free nodes.
Messages may be directly transmitted to adjacent nodes only, and every node
may communicate with at most one neighbor in a unit of time. A message
can be transmitted only if both communicating neighbors and the link joining
them are fault free. For parameters *p * and * q* satisfying
*(1-p)(1-q) > 0.99 * (e.g., *p = q = 0.5% *), we give an algorithm
working in time *O(log n) * and broadcasting source information to all fault-free nodes with probability exceeding *1-cn ^{-d} *,
for some positive constants

K. Diks, A. Pelc, Fault-tolerant linear broadcasting,
*Nordic Journal of Computing* 3 (1996), 188-201.

A preliminary version appeared in: Proc. First Canada-France Conference on Parallel and Distributed Computing, Theory and Practice, Montreal, Canada, May 1994, LNCS 805, pp. 207-217.

Abstract:

In linear broadcasting, packets originally stored in one node, called the source,
have to visit all other nodes of the network. Every packet has a predetermined
route indicating in which order it visits the nodes. A faulty link or node
of the network destroys all packets passing through it. A linear broadcasting
scheme consisting of packets' routes is *f*-fault-tolerant if every fault-free
node is visited by at least one packet for any configuration of at most
*f* link or node failures. We estimate the minimum number of packets for
which there exists an *f*-fault-tolerant linear broadcasting scheme in complete
networks, and we construct schemes using few packets. Variations of this
problem when faults can occur only in links or only in nodes are also
considered.

K. Diks, A. Pelc, Reliable computations on faulty EREW PRAM,
*Theoretical Computer Science* 164 (1996), 107-122.

Abstract:

We consider the problem of efficient and reliable computing on EREW PRAM
whose processors are subject to random independent stop-failures with constant
probability * p < 1*. An algorithm for such a fault-prone machine is called safe
if it solves a problem of size *n* with probability exceeding *1 -
d/n*,
for some constant *d* independent of *n*. Our main contribution is a safe algorithm
for the well known list ranking problem, working in time *O(log n)* on an
*O(n log n)*-processor EREW PRAM. We also show an optimal safe algorithm
for computing prefix sums, which works in time *O(log n)* on an
*O(n/log n)*-processor EREW PRAM. The methods presented in this
paper can be applied to a wide class of EREW PRAM algorithms making them
safe and simultaneously preserving their complexity.

B. S. Chlebus, K. Diks, A. Pelc, Broadcasting in synchronous
networks with dynamic faults,
*Networks* 27 (1996), 309-318.

Abstract:

The problem of broadcasting in a network is to disseminate information
from one node to all other nodes by transmitting it over
communication links that connect nodes.
We consider the time of broadcasting in the presence of at most *k* dynamic
link failures.
If a node knows source information, then in the next step
all its neighbors connected by operational links also get to know it.
Faults are dynamic, in the sense that a link may alternate arbitrarily
between being operational or faulty, provided that, at every time step,
the number of faulty links does not exceed *k*.
The time bounds on broadcasting are considered with respect to two parameters:
the number *k* of faulty links and the diameter *d* of the underlying graph.
Broadcasting is guaranteed to be successful if and only if the edge
connectivity of the network exceeds *k*; and we consider only such networks.
For a fixed *k*, it is shown that broadcasting is
always completed in time *O(d ^{k+1})*, where the bound is a function
of diameter

K. Diks, A. Pelc, Broadcasting with universal lists,
* Networks* 27 (1996), 183-196.

A preliminary version appeared in: Proc. 28th Annual Hawaii International Conference on System Sciences, January 1995, Vol.2, pp. 564-573.

Abstract:

In broadcasting, information originally held in one node of a communication network (called the source) has to be transmitted to all other nodes. In a unit of time, every node which already received the source message can transmit it to one neighbor. In classical broadcasting, the choice of neighbors to be informed by a node and the order in which they are informed may depend on the source. Thus, nodes need to store many transmission lists corresponding to different possible sources and need to know the source to adapt their behavior accordingly. In this paper we consider a variant of broadcasting in which every node is given a priori a single ordered list containing some of its neighbors. This list is meant to be universal for all possible sources. Upon obtaining the source message, a node transmits it to the neighbors from its list in prescribed order and then stops. This requires substantially less local memory devoted to schedule communication but usually increases broadcasting time. We compare broadcasting time in this and in the classical model and design optimal broadcasting schemes in the universal-list model for trees, rings and grids. For tori and for complete graphs we give upper bounds on broadcasting time.

K. Diks, A. Pelc, Efficient gossiping by packets in networks with random faults,
* SIAM Journal on Discrete Mathematics* 9 (1996), pp. 8-17.

Abstract:

Every node of a communication network has a constant size value which should be
made known to all other nodes. Nodes and links fail independently with constant probabilities *p < 1* and *q < 1*, respectively. Faults are
permanent and of crash type: a faulty link does not transmit messages and
a faulty node neither sends nor receives messages. In a unit of time every
node can send a packet of information to at most one neighbor and
receive a packet from at most one neighbor. The size of each packet does not
exceed *b(n)*, where *n* is the number of nodes. For every
*d > 0* we present an algorithm to exchange values between all fault-free
nodes of an *n*-node network in time *O(n/b(n) + log n) *, with
probability exceeding *1-n ^{-d}*, for sufficiently large

A. Pelc, Efficient fault location with small risk, Proc. 3rd Coll. on Structural Information and Communication Complexity, SIROCCO'96, Certosa di Pontignano, Siena, June 1996, p. 292-300.

Abstract:

We consider the problem of fault diagnosis in multiprocessor systems.
Processors perform tests on their neighbors; fault-free
testers correctly identify the fault status of tested neighbors, while
faulty testers can give arbitrary test results. Processors fail
independently with constant probability * p < 1/2 * and the goal is to identify
correctly the status of all processors, based on the set of test results.
For * 0 < q < 1 *, *q*-diagnosis is a fault diagnosis algorithm whose probability of
error does nor exceed *q*. Since testing is costly in terms of local
computation time and communication between neighboring processors, it is
important to design diagnosis algorithms combining high reliability with a small
number of tests.
We show that the minimum number of tests to perform
*q*-diagnosis in a *n*-processor system is of the order of
*n log (1/q)* and show *q*-diagnosis achieving this bound.
We also prove that the minimum degree of a multiprocessor system
permitting *q*-diagnosis is of the order of *log n + log (1/q)*
and show *q*-diagnosis with a low number of tests that can work
in such systems.
Our results show precisely the trade-off between reliability of
system diagnosis and the
cost of the testing process.