CALDI

Introduction

The Research Chair in Distributed Computing (the acronym CALDI comes from the words ``calcul distribué'' which is the French name of Distributed Computing) was created at the Université du Québec à Hull in February 2001, in order to promote the specialization in a research domain consistent with the technological environment of the National Capital Region of Canada. The research work of the Chair concentrates on information processing in distributed systems and on related problems of communication in networks.

Distributed computing is a domain of computer science whose growth significantly accelerated in recent years. This is due to the increasing role that interconnection networks and telecommunication play in the life of modern society. Distributed computing is a foundation of these two fields of application of computer science. It concerns environments where several processors situated in different sites have to collaborate in order to perform a computational task. The nature of tasks to be accomplished in a distributed way may vary from numerical computations too complex for one computer, to the coordination of the work of all branches of a bank, spread over a continent, to the search for data in the Internet, and many others. Collaborating processors may be computers of a local network of a university, situated close to each other, but may also be Internet servers at distances of thousands of kilometers. Clearly, each of these situations yields different problems and challenges concerning organization and division of work.

Every processor has a certain degree of autonomy and can carry out parts of the task locally, independently of other processors. It has a memory and a control unit distinct from others. However, in order to accomplish the entire task, processors must share some resources, which requires coordination. This is done by means of communication among processors which exchange messages during the execution of the distributed task. Hence problems of communication in networks are an important issue in distributed computing. Sharing resources among processors concerns the following aspects.

Data of the problem are initially distributed among processors and must be exchanged between them. For example, bank operations performed by a client in different branches must be known at each of them.

Partial results of computations of one processor must be communicated to others to be used in turn as data in their computations. For example, large tasks of matrix computations are performed in a distributed way, assigning computations concerning different parts of the matrix to different processors. Since these computations may be mutually dependent, processors must exchange partial results interactively in order to accomplish the global task.

Final results of computation are distributed among processors, each of them often keeping only a part of the global result. This is due either to the size of the result, too large for the local memory of each processor, or to the fact that each processor will need only a part of the result relevant to it in performing future tasks. For example, after accomplishing the fundamental task of finding a minimum spanning tree of the network in a distributed fashion, every processor keeps in its local memory only incident edges of the tree.

The above indicated crucial features of distributed systems explain specific challenges of distributed computing, as compared to centralized computing, performed by a single processor. New problems, which do not appear in a centralized environment, concern sharing tasks among processors in order to take maximum advantage of the computational power of each of them, organizing communication among processors in order to coordinate their work efficiently, and maintaining a balance between the load of local computations in each of the processors and the load of communication in the entire system. These new challenges often increase the complexity of distributed problems as compared to those concerning centralized processing: many tasks for which centralized solutions are well known present major difficulties when they are to be solved in a distributed way. However, the additional conceptual work invested in overcoming these new obstacles is usually very profitable: distributed methods are in general much faster than those using centralized computing. In our time, when the amount of processed information grows exponentially, distributed processing becomes unavoidable: no individual computer has sufficient ressources to accomplish huge information processing tasks required now and in the future. Distributed computing becomes an indispensable tool and it is almost sure that its role will increase in time.

Research directions

The research activity of the Chair CALDI concentrates on algorithmic and software aspects of distributed computing. It is important to underline the relations between these two domains, as well as relations between each of them and hardware issues. On the one hand, research results in distributed algorithms can be used, e.g., in the development of communication protocols or in the construction of distributed data bases. The complexity analysis of communication algorithms conceived for various network topologies may and should influence the choice of topologies for communication networks to be used in the future. These examples show the impact of distributed algorithms on software and hardware issues in distributed systems. On the other hand, the impact is also reciprocal. New technologies in the domain of telecommunication, both in its software and hardware part, significantly influence theoretical models of communication and algorithmic problems to be considered. Two particularly convincing examples of the impact of technology on distributed algorithms are the following. Increasing use of optic fiber in communication networks yields growing interest in communication models that consider characteristics of optical transmission channels. Likewise, growing popularity of the Internet in modern society contributes to the creation of new subdomains of distributed algorithms, such as exploration of unknown networks by mobile agents, or searching for data on the Web. This shows that distributed algorithms which form a focus of the research activity of the Chair, are tightly related to other aspects of distributed computing, as well as to problems of its two main application domains: interconnection networks and telecommunication.

Below we describe the main research directions of the Chair CALDI. Each of them comprises a research project spanning over several years. Most of them are tightly connected and will be developed in parallel. This description should not be considered in a restrictive way: technological progress to be foreseen in the next decade and the development of distributed computing following it, will doubtless modify some of these directions and will add others. However, the following description gives representative examples of types of problems that will be investigated in the Chair.

Communication in partially unknown environments
In many applications, the topology of the communication network and its parameters, such as the diameter, the size, or the maximum degree, may be partially or totally unknown to processors. Often, information available to every processor is limited to the part of the network situated in its vicinity. In such situations, communication algorithms that can be used may only depend on locally available information. The main problem that we want to study concerns optimization of communication algorithms working in partially unknown networks, and the influence of the amount of available information on communication efficiency.

Communication in faulty networks
The growing size of communication networks and the huge amount of transmitted information increases the vulnerability of these networks to component failures. Faults of links or nodes of the network may be permanent or intermittent, may be restricted to a part of the network or randomly distributed throughout it, and may alter the work of the components in various ways. The main problem in this context is to design communication algorithms which work efficiently and reliably in spite of the faults and without knowing a priori their location. One of important aspects of such algorithms is fault diagnosis whose aim is to locate faults using distributed methods, without relying on a central monitor coordonating actions of processors. These methods use results of tests that processors perform on their neighbors. The difficulty of this task is augmented by the fact that information generated by faulty processors is not reliable and it is not known a priori which processors are faulty and which are functional. Refined methods of selection of tests and of their analysis permit, however, to extract from test results information necessary to perform reliable diagnosis.

Network exploration
In many applications it is important to learn the topology of an unknown network, such as a part of the Internet or a particular mobile network. In such cases, one or more mobile agents (which are programs specifically designed for network exploration) have to construct a map of the network, (i.e., to find the underlying graph), by traversing all nodes and links of the network. The problem is to organize the work of the agents in the most efficient way, i.e., so that they complete the task in minimum time, sharing work and avoiding redundant traversals. Scheduling trajectories of these agents in an unknown environment and scheduling communication between them are important challenges that need to be faced.

Communication in wireless and mobile networks
Growing popularity of cellular phones yields increasing interest of the research community in a particular type of communication networks: wireless and mobile networks. This type of networks whose main application until quite recently was limited to military needs, became indispensable in everyday life and its role will further grow when wireless access to the Internet becomes popular. Wireless and mobile networks are a source of new important algorithmic problems. The choice of wavelengths minimizing interference, dynamic localizing of nodes in a mobile network, communication in an unknown environment that changes frequently, are only some examples. Distributed algorithms are particularly well adapted to solve these problems. Indeed, classic networks are relatively stable and their topology can be learned by nodes of the network after some time of functioning. Hence in this environment, centralized algorithms can be sometimes used. On the other hand, the structure of mobile networks changes frequently during the execution of a communication protocol. This does not permit the use of algorithms requiring global knowledge of the network, and hence distributed methods based only on local knowledge must be applied.

Optimization of distributed object-oriented systems
Advanced computing applications are more and more reliant on two fundamental technologies: object-orientation and distribution (e.g., client/server). Existing object-oriented design methods are of little help in the systematic and efficient development of large distributed systems. One of the main impediments to an effective distribution of objects is the high level of network overload resulting from the numerous message exchanges characterizing the object-oriented paradigm. One of our research subjects tackles the quantification of coupling among objects and the uncovering of appropriate object permutations and processor-centered groupings in order to minimize interprocessor traffic.

Behavioral modeling of network components
Physical components of current networks are more and more complex, rendering their modeling rather difficult and posing problems during the planning and management of such networks. Thus modeling and management of these networks require more suitable tools and methodologies in order to obtain a realistic outcome. Our goal in this research is to define a new representation methodology to model these components and their behaviors in a realistic way. We focus first on protocol specifications to be used to define architectures and communications of these components in order to be specified formally and validated. Examples of components considered include switches and routers since they are directly responsible for the network performance and enable sustained quality of services.

Global performance management in complex networks
Existing communication networks are made of components distributed over large spaces. These components must communicate in a continuous way to guarantee the required performance even in dynamic contexts. The global performance of communication networks operating in a dynamic environment is still an open problem. Our goal is to define the most suitable strategies to deal with this problem in the context of network changes. We will consider different routers as network components, and will study the problem of load balancing between them, taking into account different network topologies.