Hardware Implementation and Quantization of Tiny-Yolo-v2 using OpenCL
Yap June Wai1, Zulkalnain bin Mohd Yussof2, Sani Irwan bin Md Salim3

1Yap June Wai, Center for Telecommunication Research and Innovation, Faculty of Electronic and Computer Engineering, Universiti Teknikal Malaysia Melaka, Malaysia.
2Zulkalnain bin Mohd Yussof, Center for Telecommunication Research and Innovation, Faculty of Electronic and Computer Engineering, Universiti Teknikal Malaysia Melaka, Malaysia.
3Sani Irwan bin Md Salim, Center for Telecommunication Research and Innovation, Faculty of Electronic and Computer Engineering, Universiti Teknikal Malaysia Melaka, Malaysia.
Manuscript received on 22 August 2019 | Revised Manuscript received on 03 September 2019 | Manuscript Published on 16 September 2019 | PP: 808-813 | Volume-8 Issue-2S6 July 2019 | Retrieval Number: B11500782S619/2019©BEIESP | DOI: 10.35940/ijrte.B1150.0782S619
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The trend of increasingly model size in Deep Neural Network (DNN) algorithms boost the performance of visual recognition tasks. These gains in performance have come at a cost of increase in computational complexity and memory bandwidth. Recent studies have explored the fixed-point implementation of DNN algorithms such as AlexNet and VGG on Field Programmable Gate Array (FPGA) to facilitate the potential of deployment on embedded system. However, there are still lacking research on DNN object detection algorithms on FPGA. Consequently, we propose the implementation of Tiny-Yolo-v2 on Cyclone V PCIe FPGA board using the High-Level Synthesis Tool: Intel FPGA Software Development Kit (SDK) for OpenCL. In this work, a systematic approach is proposed to convert the floating point Tiny-Yolo-v2 algorithms into 8-bit fixed-point. Our experiments show that the 8-bit fixed-point Tiny-Yolo-v2 have significantly reduce the hardware consumption with only 0.3% loss in accuracy. Finally, our implementation achieves peak performance of 31.34 Giga Operation per Second (GOPS) and comparable performance density of 0.28GOPs/DSP to prior works under 120MHz working frequency.
Keywords: DNN, FPGA, Tiny-Yolo-v2, Quantization.
Scope of the Article: FPGAs