Improving Memory Utilization by Sharing DNN Models for Serverless Inference


Abstract:

Serving deep neural network (DNN) models requires a lot of memory and has a relatively long response time. Researchers nowadays try to integrate DNN inference services with the serverless architecture, which workload is suited for the serverless, because of its short-lived and burst characteristics. In addition, the cases of deploying DNN services to edge clouds are also increasing, to reduce the response time. However, serving DNN models in a serverless manner has a problem of excessive memory usage deriving from the data duplication, which is a more serious problem on strongly resource-constrained edge cloud. To address this problem, we designed ShmFaas on the open-source serverless platform running on the Kubernetes with minimal code changes. First, we implemented the serverless system with lightweight memory isolation by sharing DNN models in-memory, in order to avoid the model duplication problem. Also, we designed an LRU-based model eviction algorithm for efficient memory usage on the edge cloud. As a result, in our experiment, the system's memory usage is reduced by more than 29.4% compared to the existing system, and it shows that the overhead due to the proposed system is negligible enough to be used for real-world workloads.

Año de publicación:

2023

Keywords:

  • serverless
  • DNN Inference
  • CLOUD COMPUTING

Fuente:

scopusscopus

Tipo de documento:

Conference Object

Estado:

Acceso restringido

Áreas de conocimiento:

  • Aprendizaje automático
  • Ciencias de la computación

Áreas temáticas:

  • Programación informática, programas, datos, seguridad