Fault tolerance protocols for parallel programs based on replication
Abstract:
In this work we propose a fault-tolerant mechanism for parallel programs based on task replication. We use a sequential discrete-event simulator of a distributed system subject to failures to compare a semi-active approach and a passive approach of the protocol. In our model, each time a task of a given parallel program is allocated, a copy of it is stored in a second processor, called the buddy processor. If the original processor fails, the copies of the tasks at the buddy processor will be processed, providing fault tolerance. Some performance measures, such as program execution times and processor utilization factors, are given for the different versions of the mechanism. The perfor-mance has been studied as a function of processor degradation, and program and system sizes.
Año de publicación:
2000
Keywords:
Fuente:

Tipo de documento:
Conference Object
Estado:
Acceso restringido
Áreas de conocimiento:
- Ciencias de la computación
Áreas temáticas:
- Programación informática, programas, datos, seguridad
- Ciencias de la computación
- Métodos informáticos especiales