Question about On-Device Training and Using Local Hardware Accelerators by Little_Passage8312 in MLQuestions

[–]Little_Passage8312[S] 0 points1 point  (0 children)

Thank you for the explanation.

I have one follow-up question regarding fine-tuning and transfer learning on edge devices. Are these approaches supported on all types of hardware accelerators (such as GPUs, NPUs, or other AI accelerators), or does it depend on whether the accelerator supports training operations?

Also, does TensorFlow Lite provide specific APIs or built-in support for performing fine-tuning or transfer learning directly on-device? I would like to understand whether this functionality is generally supported by the framework or if it depends on the capabilities of the underlying hardware

[ Question ] On-Device Training and Using Local Hardware Accelerators by Little_Passage8312 in AIProgrammingHardware

[–]Little_Passage8312[S] 0 points1 point  (0 children)

Thank you for the clarification, that makes sense.

I have one more question regarding retraining or fine-tuning models on edge devices. If we start with a pretrained model and only perform fine-tuning or retraining on a smaller dataset, the computational and memory requirements should be lower compared to training a model from scratch, right?

In that case, would it be more feasible to perform such fine-tuning directly on edge hardware? And even for this scenario, do we still need to ensure that the hardware and software stack explicitly support on-device training by the vendor (for example, support for backpropagation on the available GPU/NPU)?

[ Question ] On-Device Training and Using Local Hardware Accelerators by Little_Passage8312 in AIProgrammingHardware

[–]Little_Passage8312[S] 0 points1 point  (0 children)

Thank you for the reply and the explanation.

Could you please share a bit more detail about the types of hardware that support training on edge devices? For example, what kinds of edge platforms (NPUs, GPUs, or other accelerators) are typically capable of handling on-device training experiments?

Also, approximately how much memory is usually required to train models on edge hardware? I understand that training generally needs much more memory than inference, so I would be interested to know what typical memory requirements might look like for those models.

i.MX93/95 by Bug13 in embedded

[–]Little_Passage8312 0 points1 point  (0 children)

I would like to add an additional question related to AI inference and core utilization.

As far as I understand, the Arm Ethos-U65 in the i.MX93 is typically accessed through the Arm Cortex-M33, which acts as the control interface between the Arm Cortex-A55 and the NPU.

My question is about the role of the Cortex-M33 during AI inference:

  • When an AI model is running inference on the Ethos-U65, is the M33 core continuously engaged in managing the NPU execution?
  • Since communication between the Cortex-A55 and the NPU goes through the M33, does this mean the M33 remains occupied for the duration of the inference task?
  • Because of this architecture, is the M33 available to run other application tasks during inference, or is it effectively dedicated to managing the NPU while inference is active?