Abstracts of papers published in 1993


D. Blough, A. Pelc, Optimal communication in networks with randomly distributed Byzantine faults, Networks 23 (1993), pp. 691-701.

Abstract:

We consider the problem of efficient information exchange in a communication network whose nodes and/or links are subject fo Byzantine faults that are randomly and independently distributed through the network. The goal is almost safe communication, i.e. getting to every fault-free node information about every other fault-free node, with probability converging to one as the number of nodes grows. We present nonadaptive almost-safe communication schemes working for various networks in asymptotically optimal time and using an asymptotically optimal number of message bits.

C. Kenyon, P. Fraigniaud, A. Pelc, Finding a target subnetwork in sparse networks with random faults, Information Processing Letters 48 (1993), pp. 297-303.

Abstract:

Given a network G = (V, E) we look for sparse extensions G* = (V*, E*) of G with |V*|=O(|V|) and such that if nodes of G* fail independently with constant probability p, G* almost certainly contains a fault-free isomorphic copy of G.

K. Diks, A. Pelc, Fast diagnosis of multiprocessor systems with random faults, RAIRO, Theoretical Informatics and Applications 27 (1993), pp. 391-401.

Abstract:

Processors in a multiprocessor system fail independently with constant probability 0 < p < 1/2. They can test one another and faulty processors have to be identified on the basis of test results. Fault-free processors diagnose other faulty-free processors correctly and find faults in faulty ones with probability q > 0 in each test. Faulty testers are unreliable: they may even behave maliciously. Tests are independent and in every time unit a processor can be involved in at most one test. We propose testing schemes which are fast, use few tests and are correct with probability converging to 1 as the size of the system grows.

D. Blough, A. Pelc, A clustered failure model for the memory array reconfiguration problem, IEEE Transactions on Computers 42 (1993), pp. 518-528.

The results of this paper were announced in: D. Blough, A. Pelc, Some results and open problems concerning memory reconfiguration under clustered fault models, Proc. IEEE International Workshop on Defect and Fault Tolerance on VLSI Systems, Hidden Valley, Pennsylvania, November 1991, pp. 292-295.

Abstract:

Reconfiguration of memory arrays using spare rows and spare columns has been shown to be a useful technique for yield enhancement of memories. This problem is NP-hard in general and hence, previous work has focused on branch-and-bound algorithms for smaller problems and approximation algorithms for larger problems. Recently, the performances of several heuristics have been evaluated by Shi and Fuchs under a probabilistic model for memory defects that assumes faults in memory cells occur independently. In this paper, we propose a clustered failure model under which the problem of array reconfiguration is studied. This model adopts the center-satellite approach of Meyer and Pradhan. We utilize this model to show that the total number of faulty cells that can be tolerated when clustering occurs is larger than when faults are independent. We also show that an optimal solution to the reconfiguration problem can be found in polynomial time for a special case of our clustering model. We give an efficient approximation algorithm for the general case of our probabilistic model. Finally, we show, through simulation, that the computation time required by this algorithm to repair large arrays containing a significant number of clustered faults is small.

D. Blough, A. Pelc, Diagnosis and repair in multiprocessor systems, IEEE Transactions on Computers 42 (1993), pp. 205-217.

The preliminary version appeared as: D. Blough, A. Pelc, Reliable diagnosis and repair in constant-degree multiprocessor systems, Proc. 20-th Annual International Symposium on Fault-Tolerant Computing, Newcastle upon Tyne, UK, June 1990, pp. 316-323.

Abstract:

Diagnosis of multiprocessor systems in which faulty processors can be replaced by spares or repaired is known as sequential diagnosis. This paper considers a generalization of classical sequential diagnosis, referred to as diagnosis and repair, under a probabilistic model for the faults and test outcomes in a system. It is shown that correct diagnosis and repair of all faulty processors can be achieved with high probability in a large class of systems including, for example, rings, grids, meshes, tori, and hypercubes. These results show, for the first time without restrictive assumptions on the behavior of faulty processors, that correct diagnosis can be achieved in these widely-used, low-degree systems when a fixed percentage of the processors in the system are faulty.

A. Pelc, Efficient distributed diagnosis in the presence of random faults, Proc. 23rd Annual International Symposium on Fault-Tolerant Computing, Toulouse, France, June 1993, pp. 462-469.

Abstract:

We study a probabilistic setting for distributed system-level diagnosis. Processors in a multiprocessor system fail independently with constant probability 0 < p < 1/2. They can test their neighbors and a fault-free processor has probability 0 < q < 1 of detecting a fault of a failed processor in a single test. The aim of distributed diagnosis is for every fault-free processor to know the status of every other processor. This is achieved by communicating test results throughout the system. Faulty processors can behave arbitrarily, even maliciously, both as testers and as message transmitters. We propose a diagnosis scheme using O(n) time, O(nlogn) tests, O(n2) message bits and working correctly with probability converging to 1, as n grows. We show that each of these three characteristics has the lowest possible order of magnitude.