Performance Analysis of Matrix Multiplication for Deep Learning on the Edge


Abstract:

The devices designed for the Internet-of-Things encompass a large variety of distinct processor architectures, forming a highly heterogeneous zoo. In order to tackle this, we employ a simulator to estimate the performance of the matrix-matrix multiplication (gemm) kernel on processors designed to operate at the edge. Our simulator adheres to the modern implementations of gemm, advocated by GotoBLAS2, BLIS, OpenBLAS, etc., to carefully account for the amount of data transfers across the memory hierarchy of different algorithmic variants of the kernel. A small collection of experiments provide the necessary data to calibrate the simulator and deliver highly accurate estimations of the execution time for a given processor architecture.

Año de publicación:

2022

Keywords:

  • IoT processors
  • Performance Analysis
  • Matrix multiplication
  • high performance

Fuente:

scopusscopus

Tipo de documento:

Conference Object

Estado:

Acceso restringido

Áreas de conocimiento:

  • Aprendizaje automático
  • Ciencias de la computación
  • Ciencias de la computación

Áreas temáticas:

  • Ciencias de la computación