Cached Machine Learning - A Local Caching System to Reduce
Cloud-Communication Dependency During Inference Phase
Abstract
The continuous research on TinyML has increased the number of Machine
Learning (ML) models capable of running their inference phase in a
Resource-Scarce Embedded System (RSES). Therefore, part of the
intelligence services run now in the devices at the end of the network.
However, many ML models are too complex to run in such tiny devices, and
a Cloud system is necessary to implement the network’s intelligence
inference layer. Every communication between a RSES and the Cloud is
expensive in terms of power consumption, money, and time. The following
work tries to answer how to reduce the number of times a RSES
communicates with the Cloud system while achieving the same ML inference
rate, and without reducing the model’s accuracy. The results show how by
building a cache system that allows the RSES to store previous samples
and their predictions, the RSES can use this information to avoid Cloud
communication. The solution has proven to work and to accomplish a
communication reduction between the cloud system and the RSES by 30%.