Product

Benchmarking on the MLPerf™ Tiny suite

Date

21-Jun-2024

Revision

1.0

Changes

Initial release

Even on a minimal SoC, ComputeRAM™ achieves up to 30x faster processing and 32x greater energy efficiency than the best public hardware configuration under standard testing conditions.

For each benchmarks, the best performing hardware is a dedicated AI accelerator.

The integration of ComputeRAM™ eliminates the need for dedicated AI and DSP accelerators in embedded applications, simplifying system design and lowering silicon costs, accelerating device development and production cycles.

In our previous article, we demonstrated how integrating ComputeRAM™ with an Arm Cortex-M0 could enhance both speed and energy efficiency by over 100 times for the MVM primitive. This leap in performance stems from three critical enhancements:

  1. Almost all (~99%) operations are offloaded from the CPU to ComputeRAM;
  2. Bus activity is reduced by approximately two orders of magnitude, allowing an AHB-like bus to outperform more complex systems;
  3. The arithmetic engine in ComputeRAM delivers higher efficiency and throughput than what is achievable within the CPU;

Building on this foundation, our latest application note focuses on benchmarking ComputeRAM™ within a set of representative machine learning workloads, using the MLPerf™ Tiny suite to compare its performance against other low-power, compact embedded microcontrollers and accelerators.

Employing our in-house system-level modeling framework, we simulated a minimal ComputeRAM™-enabled SoC, which includes a microcontroller, an AMBA AHB-like 32-bit bus, a vector addition accumulator, and DMA. The results were striking: ComputeRAM™ enabled the system to achieve up to 32x better energy efficiency and 30x lower latency compared to the best-performing MLPerf™ Tiny submissions under standard testing conditions. Furthermore, by simply modifying the underlying network architecture for one of the benchmarks resulted in a 55% reduction in latency and 35% improvement in energy efficiency without a loss of accuracy. These results are all the more important as the best performing hardware configuration, for each of the benchmarks, is a dedicated and specialized AI-accelerator.

This application note not only reaffirms the transformative potential of ComputeRAM™ in embedded chips, but also highlights its capacity to accelerate AI, signal processing, and communication algorithms. By harnessing its advanced in-memory computing capabilities, ComputeRAM™ boosts performance and energy efficiency substantially. Moreover, its programmable interface allows developers to adapt to various workloads easily without extensive software modifications. By eliminating the need for specialized accelerators, ComputeRAM™ reduces silicon area and system complexity, broadening the application range for microcontroller-based SoCs.

If you would like to know more, reach out for a free copy of our TinyML benchmarking application note using the contact form below.

Want to learn more?

Fill out this form to download our MLPerf™ Tiny benchmarking application note: