Published 2025-07-30
How to Cite

This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract
With the proliferation of intelligent workloads in embedded systems, there is an increasing demand for customized, low-power, and open-source processor architectures capable of efficiently executing inference tasks close to the data source. This paper presents a comprehensive hardware-software co-design methodology for optimizing RISC-V-based processors specifically tailored for inference at the edge. We describe a scalable pipeline architecture that incorporates lightweight matrix operation units, SIMD extensions, and a high-efficiency memory subsystem, all designed to address the unique requirements of embedded and resource-constrained environments. Extensive benchmarking using the MLPerf Tiny suite demonstrates that the proposed optimized core achieves a 35% reduction in energy consumption and up to 2.4× improvement in computational throughput compared to standard RISC-V implementations. These results highlight the significant potential of domain-specific enhancements and open instruction set architectures in advancing the performance and efficiency of next-generation edge computing platforms.