Dr. Dominic Binks is the Vice-President of Technology at Audio Analytic. He has recently talked about his decision to add sound-recognition AI software to the Arm Cortex-M0+ processor. Binks explained that the ai3 software relies on AuditoryNET, which is a deep neural network made for modeling the temporal and acoustic features of sounds. He said that they wanted to grant a sense of hearing to all machines, even the tiniest ones with constrained processing and power.
For the M0+ implementation, the ai3 software called for 181kB across the ROM and RAM. This is much less than the 224kB of RAM and ROM available on the M0+-based chip by NXP that they used. Additional acceleration strategies further bring down computational demands and subsequent footprint.
According to him, their technology was always designed to do these things. They impact data collection, labeling, model training, compression, and evaluation. Their software was also designed to be flexible when they work on very constrained devices.
Since ai3 functions as a signal-processing application, it was a no-brainer to run it on the Cortex M4. The main issues were small instruction set architecture, small RAM, and lack of floating-point support.
By using Armv6-M architecture, they had to program more tasks into the software and use labour-intensive mathematical calculations. The chips are made for devices with constrained processing needs, so it was tricky to develop and debug. They resolved these issues by using fixed-point implementation and the right chip with enough Flash and worked with the appropriate NXP evaluation board to address this issue.
Due to the scalable and flexible nature of the ai3 architecture, they did not have to change much code. They only had to optimize the existing code, which actually functioned within the limits of the platform already!