Convolutional Neuronal Nets (CNNs) are state-of-the art Neuronal Networks, which are used in many fields like video analysis, face detection or image classification. Due to high requirements regarding computational resources and memory bandwidth, CNNs are mainly executed on special accelerator hardware which is more powerful and energy efficient than general purpose processors. This paper will give an overview of the usage of FPGAs for the acceleration of computation intensive CNNs with OpenCL, proposing two different implementation alternatives.

The first approach is based on nested loops, which are inspired by the mathematical formula of multidimensional convolutions. The second strategy transforms the computational problem into a matrix multiplication problem on the fly. The approaches are followed by common optimization techniques used for FPGA designs based on high level synthesis (HLS). Afterwards, the proposed implementations are compared to a CNN implementation on an Intel Xeon CPU in order to demonstrate the advantages in terms of performance and energy efficiency.

Excerpt

Inhaltsverzeichnis (Table of Contents)

Introduction
Related Work
Background
- FPGA
- CNNs
Implementation
- OpenCL Stack on FPGA
- Nested Loop Implementation
- Matrix Multiplication Implementation
Optimization Strategies
Results
Conclusion

Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)

This paper investigates the use of Field Programmable Gate Arrays (FPGAs) for accelerating computationally intensive Convolutional Neural Networks (CNNs) using OpenCL. The main objective is to demonstrate the potential of FPGAs for energy-efficient and high-performance CNN execution compared to traditional CPU implementations.

FPGA Acceleration for CNNs
OpenCL Programming Model for FPGA Design
Implementation Strategies: Nested Loops and Matrix Multiplication
Optimization Techniques for FPGA-based Accelerators
Performance and Energy Efficiency Comparison with CPU Implementation

Zusammenfassung der Kapitel (Chapter Summaries)

Introduction: Introduces the concept of Convolutional Neural Networks (CNNs) and their growing use in various applications. Highlights the need for specialized hardware accelerators due to high computational requirements and memory bandwidth demands. The paper focuses on leveraging FPGAs for accelerating CNNs with OpenCL.
Related Work: Discusses existing research and approaches for accelerating CNNs. Emphasizes methods using nested loops and matrix multiplication for implementing convolution operations.
Background: Provides background information on FPGA technology, outlining their architecture, components, and advantages. Discusses CNNs, their structure, and the importance of convolution layers for computational efficiency.
Implementation: Presents two distinct approaches for implementing CNNs on FPGAs: nested loop-based implementation and matrix multiplication-based implementation. Introduces the OpenCL framework for FPGA programming and its advantages in terms of development speed and portability.
Optimization Strategies: Explores common optimization techniques for FPGA designs based on high-level synthesis (HLS). This section delves into data reuse techniques, data representation optimizations, and reduction of floating-point data size to improve performance.

Schlüsselwörter (Keywords)

This work focuses on FPGA-based accelerators for CNNs, utilizing OpenCL for programming. Key themes include: FPGA technology, CNN architecture, nested loop and matrix multiplication implementations, optimization techniques, performance analysis, and energy efficiency comparisons with CPU implementations.

Frequently Asked Questions

Why are FPGAs used for accelerating CNNs?

FPGAs offer high performance and energy efficiency compared to general-purpose CPUs, making them ideal for the computational and memory demands of Convolutional Neural Networks.

What is the role of OpenCL in FPGA design?

OpenCL provides a high-level programming framework that allows developers to implement hardware accelerators on FPGAs without needing deep expertise in hardware description languages like VHDL.

What are the two implementation strategies discussed?

The paper proposes a 'nested loop' approach based on the mathematical formula of convolutions and a strategy that transforms the problem into a 'matrix multiplication'.

How can FPGA designs be optimized using High-Level Synthesis (HLS)?

Optimization techniques include data reuse, optimizing data representation, and reducing floating-point data sizes to improve throughput and resource usage.

How does FPGA performance compare to an Intel Xeon CPU?

The paper demonstrates that FPGAs can achieve better performance and significantly higher energy efficiency for specific CNN tasks than traditional high-end CPUs.

Excerpt out of 16 pages - scroll top

Details

Title: The usage of FPGAs for the acceleration of Convolutional Neuronal Nets (CNNs) with OpenCL. Two alternatives for implementation
College: University of Paderborn
Grade: 1.0
Author: Christian Lienen (Author)
Publication Year: 2018
Pages: 16
Catalog Number: V451366
ISBN (eBook): 9783668861299
ISBN (Book): 9783668861305
Language: English
Tags: fpgas convolutional neuronal nets cnns opencl
Product Safety: GRIN Publishing GmbH

Quote paper: Christian Lienen (Author), 2018, The usage of FPGAs for the acceleration of Convolutional Neuronal Nets (CNNs) with OpenCL. Two alternatives for implementation, Munich, GRIN Verlag, https://www.hausarbeiten.de/document/451366

The usage of FPGAs for the acceleration of Convolutional Neuronal Nets (CNNs) with OpenCL. Two alternatives for implementation