FPGA Plugin

Introducing FPGA Plugin

The FPGA plugin provides an opportunity for high performance scoring of neural networks on Intel® FPGA devices.

NOTE: Before using the FPGA plugin, ensure that you have installed and configured either the Intel® Arria® 10 GX FPGA Development Kit or the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA or Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA. For installation and configuration details, see FPGA installation.

Heterogeneous execution

When your topology contains layers that are not supported by the Intel® FPGA plugin, use Heterogeneous plugin with dedicated fallback device.

If a network has layers that are not supported in the Intel® FPGA plugin or in a fallback plugin, you can implement a custom layer on the CPU/GPU and use the extensibility mechanism described in Inference Engine Kernels Extensibility. In addition to adding custom kernels, you must still point to the CPU plugin or the GPU plugin as fallback devices for heterogeneous plugin.

Supported Networks

The following network topologies are supported in heterogeneous mode, running on FPGA with fallback to CPU or GPU devices.

Important: Use only bitstreams from the current version of the OpenVINO toolkit. Bitstreams from older versions of the OpenVINO toolkit are incompatible with later versions of the OpenVINO toolkit. For example, you cannot use the 1-0-1_A10DK_FP16_Generic bitstream, when the OpenVINO toolkit supports the 4-0_A10DK_FP16_VGG_Generic bitstream.

Network Bitstreams (Intel® Arria® 10 GX FPGA Development Kit) Bitstreams (Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA) Bitstreams (Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA)
AlexNet 4-0_A10DK_FP16_AlexNet_GoogleNet, 4-0_A10DK_FP11_AlexNet_GoogleNet 4-0_RC_FP16_AlexNet_GoogleNet_SqueezeNet, 4-0_RC_FP11_AlexNet 4-0_PL1_FP16_Generic_AlexNet_GoogleNet_VGG, 4-0_PL1_FP11_Generic_AlexNet
GoogleNet v1 4-0_A10DK_FP16_AlexNet_GoogleNet, 4-0_A10DK_FP11_AlexNet_GoogleNet 4-0_RC_FP16_AlexNet_GoogleNet_SqueezeNet, 4-0_RC_FP11_GoogleNet 4-0_PL1_FP16_Generic_AlexNet_GoogleNet_VGG, 4-0_PL1_FP11_GoogleNet
VGG-16 4-0_A10DK_FP16_VGG_Generic, 4-0_A10DK_FP11_MobileNet_ResNet_VGG_Clamp 4-0_RC_FP16_Generic_VGG, 4-0_RC_FP11_MobileNet_ResNet_VGG_Clamp 4-0_PL1_FP16_Generic_AlexNet_GoogleNet_VGG, 4-0_PL1_FP11_MobileNet_ResNet_VGG_Clamp
VGG-19 4-0_A10DK_FP16_VGG_Generic, 4-0_A10DK_FP11_MobileNet_ResNet_VGG_Clamp 4-0_RC_FP16_Generic_VGG, 4-0_RC_FP11_MobileNet_ResNet_VGG_Clamp 4-0_PL1_FP16_Generic_AlexNet_GoogleNet_VGG, 4-0_PL1_FP11_MobileNet_ResNet_VGG_Clamp
SqueezeNet v 1.0 4-0_A10DK_FP16_MobileNet_ResNet_SqueezeNet_Clamp, 4-0_A10DK_FP11_SqueezeNet 4-0_RC_FP16_AlexNet_GoogleNet_SqueezeNet, 4-0_RC_FP11_SqueezeNet 4-0_PL1_FP16_ResNet_SqueezeNet, 4-0_PL1_FP11_SqueezeNet
SqueezeNet v 1.1 4-0_A10DK_FP16_MobileNet_ResNet_SqueezeNet_Clamp, 4-0_A10DK_FP11_SqueezeNet 4-0_RC_FP16_AlexNet_GoogleNet_SqueezeNet, 4-0_RC_FP11_SqueezeNet 4-0_PL1_FP16_ResNet_SqueezeNet, 4-0_PL1_FP11_SqueezeNet
ResNet-18 4-0_A10DK_FP16_MobileNet_ResNet_SqueezeNet_Clamp, 4-0_A10DK_FP11_MobileNet_ResNet_VGG_Clamp 4-0_RC_FP16_ResNet, 4-0_RC_FP11_MobileNet_ResNet_VGG_Clamp 4-0_PL1_FP16_ResNet_SqueezeNet, 4-0_PL1_FP11_MobileNet_ResNet_VGG_Clamp
ResNet-50 4-0_A10DK_FP16_MobileNet_ResNet_SqueezeNet_Clamp, 4-0_A10DK_FP11_MobileNet_ResNet_VGG_Clamp 4-0_RC_FP16_ResNet, 4-0_RC_FP11_MobileNet_ResNet_VGG_Clamp 4-0_PL1_FP16_ResNet_SqueezeNet, 4-0_PL1_FP11_MobileNet_ResNet_VGG_Clamp
ResNet-101 4-0_A10DK_FP16_MobileNet_ResNet_SqueezeNet_Clamp, 4-0_A10DK_FP11_MobileNet_ResNet_VGG_Clamp 4-0_RC_FP16_ResNet, 4-0_RC_FP11_MobileNet_ResNet_VGG_Clamp 4-0_PL1_FP16_ResNet_SqueezeNet, 4-0_PL1_FP11_MobileNet_ResNet_VGG_Clamp
ResNet-152 4-0_A10DK_FP16_MobileNet_ResNet_SqueezeNet_Clamp, 4-0_A10DK_FP11_MobileNet_ResNet_VGG_Clamp 4-0_RC_FP16_ResNet, 4-0_RC_FP11_MobileNet_ResNet_VGG_Clamp 4-0_PL1_FP16_ResNet_SqueezeNet, 4-0_PL1_FP11_MobileNet_ResNet_VGG_Clamp
MobileNet (Caffe) 4-0_A10DK_FP16_MobileNet_ResNet_SqueezeNet_Clamp, 4-0_A10DK_FP11_MobileNet_ResNet_VGG_Clamp 4-0_RC_FP16_MobileNet_Clamp, 4-0_RC_FP11_MobileNet_ResNet_VGG_Clamp 4-0_PL1_FP16_MobileNet_Clamp, 4-0_PL1_FP11_MobileNet_ResNet_VGG_Clamp
MobileNet (TensorFlow) 4-0_A10DK_FP16_MobileNet_ResNet_SqueezeNet_Clamp, 4-0_A10DK_FP11_MobileNet_ResNet_VGG_Clamp 4-0_RC_FP16_MobileNet_Clamp, 4-0_RC_FP11_MobileNet_ResNet_VGG_Clamp 4-0_PL1_FP16_MobileNet_Clamp, 4-0_PL1_FP11_MobileNet_ResNet_VGG_Clamp
SqueezeNet-based variant of the SSD* 4-0_A10DK_FP16_MobileNet_ResNet_SqueezeNet_Clamp, 4-0_A10DK_FP11_SqueezeNet 4-0_RC_FP16_AlexNet_GoogleNet_SqueezeNet, 4-0_RC_FP11_SqueezeNet 4-0_PL1_FP16_ResNet_SqueezeNet, 4-0_PL1_FP11_SqueezeNet
GoogleNet-based variant of SSD 4-0_A10DK_FP16_AlexNet_GoogleNet, 4-0_A10DK_FP11_AlexNet_GoogleNet 4-0_RC_FP16_AlexNet_GoogleNet_SqueezeNet, 4-0_RC_FP11_GoogleNet 4-0_PL1_FP16_Generic_AlexNet_GoogleNet_VGG, 4-0_PL1_FP11_GoogleNet
ResNet-based variant of SSD 4-0_A10DK_FP16_MobileNet_ResNet_SqueezeNet_Clamp, 4-0_A10DK_FP11_MobileNet_ResNet_VGG_Clamp 4-0_RC_FP16_ResNet, 4-0_RC_FP11_MobileNet_ResNet_VGG_Clamp 4-0_PL1_FP16_ResNet_SqueezeNet, 4-0_PL1_FP11_MobileNet_ResNet_VGG_Clamp

In addition to the list above, arbitrary topologies having big continues subgraphs consisting of layers supported by FPGA plugin are recommended to be executed on FPGA plugin.

Translation from architecture to FPGA bitstream files

Various FPGA bitstreams that support CNN are available in the OpenVINO™ toolkit package for FPGA.

To select the correct bitstream (.aocx) file for an architecture, select a network (for example, Resnet-18) from the earlier table for either the Intel® Arria® 10 GX FPGA Development Kit or the Intel® Programmable Acceleration Card (PAC) with Intel® Arria® 10 GX FPGA and note the corresponding architecture.

For example, for the Intel® Arria® 10 GX FPGA Development Kit, the suitable architectures for Resnet-18 are A10DK_FP16_Generic, A10DK_FP16_ResNet and A10DK_FP11_ResNet.

The following table describes several parameters which might help you to select the proper bitstream for your needs.

Name Board Precision LRN Support Leaky ReLU Support PReLU Support Clamp Support ELU Support
4-0_A10DK_FP16_TinyYolo_SSD300 Intel® Arria® 10 GX FPGA Development Kit FP16 Yes Yes No No No
4-0_A10DK_FP16_MobileNet_ResNet_SqueezeNet_Clamp Intel® Arria® 10 GX FPGA Development Kit FP16 No No No Yes No
4-0_A10DK_FP16_ELU Intel® Arria® 10 GX FPGA Development Kit FP16 No No No No Yes
4-0_A10DK_FP16_VGG_Generic Intel® Arria® 10 GX FPGA Development Kit FP16 Yes Yes Yes No No
4-0_A10DK_FP16_AlexNet_GoogleNet Intel® Arria® 10 GX FPGA Development Kit FP16 Yes No No No No
4-0_A10DK_FP11_TinyYolo_SSD300 Intel® Arria® 10 GX FPGA Development Kit FP11 Yes Yes No No No
4-0_A10DK_FP11_SqueezeNet Intel® Arria® 10 GX FPGA Development Kit FP11 No No No No No
4-0_A10DK_FP11_MobileNet_ResNet_VGG_Clamp Intel® Arria® 10 GX FPGA Development Kit FP11 No No No Yes No
4-0_A10DK_FP11_Generic Intel® Arria® 10 GX FPGA Development Kit FP11 Yes Yes Yes No No
4-0_A10DK_FP11_ELU Intel® Arria® 10 GX FPGA Development Kit FP11 No No No No Yes
4-0_A10DK_FP11_AlexNet_GoogleNet Intel® Arria® 10 GX FPGA Development Kit FP11 Yes No No No No
4-0_RC_FP16_TinyYolo_SSD300 Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP16 Yes Yes No No No
4-0_RC_FP16_ResNet Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP16 No Yes No No No
4-0_RC_FP16_AlexNet_GoogleNet_SqueezeNet Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP16 Yes No No No No
4-0_RC_FP16_Generic_VGG Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP16 Yes Yes Yes No No
4-0_RC_FP16_ELU Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP16 No No No No Yes
4-0_RC_FP16_MobileNet_Clamp Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP16 No No No Yes No
4-0_RC_FP11_TinyYolo_SSD300 Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP11 Yes Yes No No No
4-0_RC_FP11_SqueezeNet Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP11 No No No No No
4-0_RC_FP11_AlexNet Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP11 Yes No No No No
4-0_RC_FP11_GoogleNet Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP11 Yes No No No No
4-0_RC_FP11_Generic Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP11 Yes Yes Yes No No
4-0_RC_FP11_ELU Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP11 No No No No Yes
4-0_RC_FP11_MobileNet_ResNet_VGG_Clamp Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP11 No No No Yes No
4-0_PL1_FP16_TinyYolo_SSD300 Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA FP16 Yes Yes No No No
4-0_PL1_FP16_ResNet_SqueezeNet Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA FP16 No Yes No No No
4-0_PL1_FP16_Generic_AlexNet_GoogleNet_VGG Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA FP16 Yes Yes Yes No No
4-0_PL1_FP16_ELU Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA FP16 No No No No Yes
4-0_PL1_FP16_MobileNet_Clamp Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA FP16 No No No Yes No
4-0_PL1_FP11_TinyYolo_SSD300 Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA FP11 Yes Yes No No No
4-0_PL1_FP11_SqueezeNet Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA FP11 No No No No No
4-0_PL1_FP11_GoogleNet Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA FP11 Yes No No No No
4-0_PL1_FP11_Generic_AlexNet Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA FP11 Yes Yes Yes No No
4-0_PL1_FP11_ELU Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA FP11 No No No No Yes
4-0_PL1_FP11_MobileNet_ResNet_VGG_Clamp Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA FP11 No No No Yes No

Set Environment for running the FPGA plugin

To make the FPGA plugin run directly or through the heterogeneous plugin, set up the environment:

  1. Set up environment to access Intel® FPGA RTE for OpenCL:
    source /opt/altera/aocl-pro-rte/aclrte-linux64/init_opencl.sh

  2. Set additional environment variables for the FPGA plugin from the following table:
Variable Setting
DLA_AOCX Path to the bitstream to the bitstream which can be programmed to the card. See section Translation from Architecture to FPGA Bitstream Files for choosing a bitstream for your chosen network and board.

Try to avoid programming the bit stream during run time. Program the FPGA before.

If you want to program the bitstream during run time, set CL_CONTEXT_COMPILER_MODE_INTELFPGA=1
CL_CONTEXT_COMPILER_MODE_INTELFPGA To prevent the host application from programming the FPGA, set this variable to a value of 3.

Program the bitstream in advance.

Refer to the Program a Bitstream section in the Installation Guide
ACL_PCIE_USE_JTAG_PROGRAMMING Set this variable to a value of 1 to force FPGA reprogramming using JTAG

Multiple FPGA Devices Support

The Inference Engine FPGA plugin provides an ability to load different networks on multiple FPGA devices. For example, to load two networks AlexNet and MobileNet v2 on two different FPGA devices, follow the steps below:

  1. Program each FGPA device with a corresponding bitstream:
    aocl program acl0 4-0_A10DK_FP16_Generic_AlexNet_GoogleNet_VGG.aocx
    aocl program acl1 4-0_A10DK_FP16_MobileNet_Clamp.aocx
    For more information about bitstream programming instructions, refer to Installation Guide for Linux* with Support for FPGA
  2. All FPGA devices are enumerated with unique ID starting from 0. By default, all networks are loaded to the default device with ID 0. If you want to load a network on a particular non-default device, specify the KEY_DEVICE_ID parameter for C++ and DEVICE_ID parameter for Python*. The following code snippets demonstrates how to load the AlexNet network on the FPGA device with ID 0 and the MobileNet v2 network on the device with ID 1:
    • With C++:
      // Load AlexNet network on the first FPGA device programmed with bitstream supporting AlexNet
      CNNNetReader reader1;
      reader1.ReadNetwork("alexnet.xml");
      reader1.ReadWeights("alexnet.bin");
      CNNNetwork network1 = reader1.getNetwork();
      IExecutableNetwork::Ptr exeNetwork1;
      StatusCode sts = plugin->LoadNetwork(exeNetwork1, network1, { { KEY_DEVICE_ID, "0" } }, &response);
      // Load MobileNet network on the second FPGA device programmed with MobileNet bitstream
      CNNNetReader reader2;
      reader2.ReadNetwork("mobilenet_v2.xml");
      reader2.ReadWeights("mobilenet_v2.bin");
      CNNNetwork network2 = reader2.getNetwork();
      IExecutableNetwork::Ptr exeNetwork2;
      sts = plugin->LoadNetwork(exeNetwork2, network2, { { KEY_DEVICE_ID, "1" } }, &response);
    • With Python:
      # Load AlexNet network on the first FPGA device programmed with bitstream supporting AlexNet
      net1 = IENetwork(model="alexnet.xml", weights="alexnet.bin")
      plugin.load(network=net1, config={"DEVICE_ID": "0"})
      # Load MobileNet network on the second FPGA device programmed with MobileNet bitstream
      net2 = IENetwork(model="mobilenet_v2.xml", weights="mobilenet_v2.bin")
      plugin.load(network=net2, config={"DEVICE_ID": "1"})
      Note that you have to use asynchronous infer requests to utilize several FPGA devices, otherwise the execution on devices is performed sequentially.

How to Interpret Performance Counters

As a result of collecting performance counters using InferenceEngine::IInferencePlugin::GetPerformanceCounts you can find out performance data about execution on FPGA, pre-processing and post-processing data and data transferring from/to FPGA card.

If network is sliced to two parts that are executed on CPU, you can find performance data about Intel® MKL-DNN kernels, their types, and other useful information.

Limitations of the FPGA Support for CNN

The Inference Engine FPGA plugin has limitations on network topologies, kernel parameters, and batch size.

See Also