Cached Machine Learning - A Local Caching System to Reduce Cloud-Communication Dependency During Inference Phase

Sergio Branco; João Carvalho; Marco S. Reis; Jorge Cabral

doi:10.36227/techrxiv.21405120.v1

loading page

Cached Machine Learning - A Local Caching System to Reduce Cloud-Communication Dependency During Inference Phase

Sergio Branco ,
João Carvalho ,
Marco S. Reis ,
Jorge Cabral

Abstract

The continuous research on TinyML has increased the number of Machine Learning (ML) models capable of running their inference phase in a Resource-Scarce Embedded System (RSES). Therefore, part of the intelligence services run now in the devices at the end of the network. However, many ML models are too complex to run in such tiny devices, and a Cloud system is necessary to implement the network’s intelligence inference layer. Every communication between a RSES and the Cloud is expensive in terms of power consumption, money, and time. The following work tries to answer how to reduce the number of times a RSES communicates with the Cloud system while achieving the same ML inference rate, and without reducing the model’s accuracy. The results show how by building a cache system that allows the RSES to store previous samples and their predictions, the RSES can use this information to avoid Cloud communication. The solution has proven to work and to accomplish a communication reduction between the cloud system and the RSES by 30%.