Architectural Support for the Orchestration of Fine-Grained Multiprocessing for Portable Streaming Applications

Jani Boutellier1,  Alessandro Cevrero2,  Philip Brisk2,  Paolo Ienne2
1University of Oulu, Finland, 2École Polytechnique Fédérale de Lausanne, Switzerland


Abstract

Handheld devices are expected to start using fine-grained ASIC accelerators to meet energy-efficiency requirements of increasingly complex applications, e.g., video decoding and reconfigurable radio. To avoid overhead, static multiprocessor schedules are preferable for orchestrating fine-grained accelerators. However, as modern applications use accelerators in irregular patterns, static scheduling leads to low hardware utilization.

Run-time scheduling for fine-grained accelerators solves the utilization problem, but easily produces significant overhead. We propose an efficient Accelerator Management Unit (AMU), implemented in hardware. E.g., in video decoding, the AMU takes 3 to 18 cycles to compute a macroblock decoding schedule. The CPU may perform useful work, as the AMU does independent task dispatching.

Two experiments are performed, where the AMU is integrated into an FPGA-based multiprocessing prototype system. One experiment does AMU-orchestrated MPEG-4 video decoding and the other demonstrates that the AMU enables low-overhead dynamic scheduling and produces a significant performance advantage over static scheduling.