Fault tolerance protocols for parallel programs based on replication


Abstract:

In this work we propose a fault-tolerant mechanism for parallel programs based on task replication. We use a sequential discrete-event simulator of a distributed system subject to failures to compare a semi-active approach and a passive approach of the protocol. In our model, each time a task of a given parallel program is allocated, a copy of it is stored in a second processor, called the buddy processor. If the original processor fails, the copies of the tasks at the buddy processor will be processed, providing fault tolerance. Some performance measures, such as program execution times and processor utilization factors, are given for the different versions of the mechanism. The perfor-mance has been studied as a function of processor degradation, and program and system sizes.

Año de publicación:

2000

Keywords:

    Fuente:

    scopusscopus

    Tipo de documento:

    Conference Object

    Estado:

    Acceso restringido

    Áreas de conocimiento:

    • Ciencias de la computación

    Áreas temáticas:

    • Programación informática, programas, datos, seguridad
    • Ciencias de la computación
    • Métodos informáticos especiales