NMR-MPar: A fault-tolerance approach for multi-core and many-core processors


Abstract:

Multi-core and many-core processors are a promising solution to achieve high performance by maintaining a lower power consumption. However, the degree of miniaturization makes them more sensitive to soft-errors. To improve the system reliability, this work proposes a fault-tolerance approach based on redundancy and partitioning principles called N-Modular Redundancy and M-Partitions (NMR-MPar). By combining both principles, this approach allows multi-/many-core processors to perform critical functions in mixed-criticality systems. Benefiting from the capabilities of these devices, NMR-MPar creates different partitions that perform independent functions. For critical functions, it is proposed that N partitions with the same configuration participate of an N-modular redundancy system. In order to validate the approach, a case study is implemented on the KALRAY Multi-Purpose Processing Array (MPPA)-256 many-core processor running two parallel benchmark applications. The traveling salesman problem and matrix multiplication applications were selected to test different device's resources. The effectiveness of NMR-MPar is assessed by software-implemented fault-injection. For evaluation purposes, it is considered that the system is intended to be used in avionics. Results show the improvement of the application reliability by two orders of magnitude when implementing NMR-MPar on the system. Finally, this work opens the possibility to use massive parallelism for dependable applications in embedded systems.

Año de publicación:

2018

Keywords:

  • fault injection
  • reliability
  • Multi-core
  • redundancy
  • partitioning
  • Many-core
  • Fault tolerance

Fuente:

scopusscopus
googlegoogle

Tipo de documento:

Article

Estado:

Acceso abierto

Áreas de conocimiento:

  • Arquitectura de computadoras
  • Ciencias de la computación

Áreas temáticas:

  • Ciencias de la computación