FPGA Plugin

Introducing FPGA Plugin

The FPGA plugin provides an opportunity for high performance scoring of neural networks on Intel® FPGA devices.

NOTE: Before using the FPGA plugin, ensure that you have installed and configured either the Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Speed Grade 1), Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) or the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA. For installation and configuration details, see FPGA installation.

Heterogeneous Execution

When your topology contains layers that are not supported by the Intel® FPGA plugin, use Heterogeneous plugin with dedicated fallback device.

If a network has layers that are not supported in the Intel® FPGA plugin or in a fallback plugin, you can implement a custom layer on the CPU/GPU and use the extensibility mechanism described in Inference Engine Kernels Extensibility. In addition to adding custom kernels, you must still point to the CPU plugin or the GPU plugin as fallback devices for heterogeneous plugin.

Supported Networks

The following network topologies are supported in heterogeneous mode, running on FPGA with fallback to CPU or GPU devices.

IMPORTANT: Use only bitstreams from the current version of the OpenVINO toolkit. Bitstreams from older versions of the OpenVINO toolkit are incompatible with later versions of the OpenVINO toolkit. For example, you cannot use the 1-0-1_A10DK_FP16_Generic bitstream, when the OpenVINO toolkit supports the 2019R2_PL2_FP16_InceptionV1_SqueezeNet_VGG_YoloV3.aocx bitstream.

Network Bitstreams (Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 1)) Bitstreams (Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2)) Bitstreams (Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA)
AlexNet 2019R3_PV_PL1_FP16_AlexNet_GoogleNet_InceptionV1_SSD300_Generic, 2019R3_PV_PL1_FP11_AlexNet_GoogleNet_Generic 2019R3_PV_PL2_FP16_AlexNet_GoogleNet_InceptionV1_SSD300_Generic, 2019R3_PV_PL2_FP11_AlexNet_GoogleNet_Generic 2019R3_PV_RC_FP16_AlexNet_GoogleNet_Generic, 2019R3_PV_RC_FP11_AlexNet_GoogleNet_Generic
GoogleNet v1 2019R3_PV_PL1_FP16_AlexNet_GoogleNet_InceptionV1_SSD300_Generic, 2019R3_PV_PL1_FP11_AlexNet_GoogleNet_Generic 2019R3_PV_PL2_FP16_AlexNet_GoogleNet_InceptionV1_SSD300_Generic, 2019R3_PV_PL2_FP11_AlexNet_GoogleNet_Generic 2019R3_PV_RC_FP16_AlexNet_GoogleNet_Generic, 2019R3_PV_RC_FP11_AlexNet_GoogleNet_Generic
VGG-16 2019R3_PV_PL1_FP16_SqueezeNet_VGG, 2019R3_PV_PL1_FP11_ResNet_VGG 2019R3_PV_PL2_FP16_SqueezeNet_VGG, 2019R3_PV_PL2_FP11_ResNet_VGG 2019R3_PV_RC_FP16_InceptionV1_VGG, 2019R3_PV_RC_FP11_InceptionV1_ResNet_SqueezeNet_TinyYolo_VGG
VGG-19 2019R3_PV_PL1_FP16_SqueezeNet_VGG, 2019R3_PV_PL1_FP11_ResNet_VGG 2019R3_PV_PL2_FP16_SqueezeNet_VGG, 2019R3_PV_PL2_FP11_ResNet_VGG 2019R3_PV_RC_FP16_InceptionV1_VGG, 2019R3_PV_RC_FP11_InceptionV1_ResNet_SqueezeNet_TinyYolo_VGG
SqueezeNet v 1.0 2019R3_PV_PL1_FP16_SqueezeNet_VGG, 2019R3_PV_PL1_FP11_InceptionV1_SqueezeNet 2019R3_PV_PL2_FP16_SqueezeNet_VGG, 2019R3_PV_PL2_FP11_InceptionV1_SqueezeNet 2019R3_PV_RC_FP16_SqueezeNet_TinyYolo, 2019R3_PV_RC_FP11_InceptionV1_ResNet_SqueezeNet_TinyYolo_VGG
SqueezeNet v 1.1 2019R3_PV_PL1_FP16_SqueezeNet_VGG, 2019R3_PV_PL1_FP11_InceptionV1_SqueezeNet 2019R3_PV_PL2_FP16_SqueezeNet_VGG, 2019R3_PV_PL2_FP11_InceptionV1_SqueezeNet 2019R3_PV_RC_FP16_SqueezeNet_TinyYolo, 2019R3_PV_RC_FP11_InceptionV1_ResNet_SqueezeNet_TinyYolo_VGG
ResNet-18 2019R3_PV_PL1_FP16_ResNet_TinyYolo_YoloV3, 2019R3_PV_PL1_FP11_ResNet_VGG 2019R3_PV_PL2_FP16_ResNet_TinyYolo_YoloV3, 2019R3_PV_PL2_FP11_ResNet_VGG 2019R3_PV_RC_FP16_ResNet_YoloV3, 2019R3_PV_RC_FP11_InceptionV1_ResNet_SqueezeNet_TinyYolo_VGG
ResNet-50 2019R3_PV_PL1_FP16_ResNet_TinyYolo_YoloV3, 2019R3_PV_PL1_FP11_ResNet_VGG 2019R3_PV_PL2_FP16_ResNet_TinyYolo_YoloV3, 2019R3_PV_PL2_FP11_ResNet_VGG 2019R3_PV_RC_FP16_ResNet_YoloV3, 2019R3_PV_RC_FP11_InceptionV1_ResNet_SqueezeNet_TinyYolo_VGG
ResNet-101 2019R3_PV_PL1_FP16_ResNet_TinyYolo_YoloV3, 2019R3_PV_PL1_FP11_ResNet_VGG 2019R3_PV_PL2_FP16_ResNet_TinyYolo_YoloV3, 2019R3_PV_PL2_FP11_ResNet_VGG 2019R3_PV_RC_FP16_ResNet_YoloV3, 2019R3_PV_RC_FP11_InceptionV1_ResNet_SqueezeNet_TinyYolo_VGG
ResNet-152 2019R3_PV_PL1_FP16_ResNet_TinyYolo_YoloV3, 2019R3_PV_PL1_FP11_ResNet_VGG 2019R3_PV_PL2_FP16_ResNet_TinyYolo_YoloV3, 2019R3_PV_PL2_FP11_ResNet_VGG 2019R3_PV_RC_FP16_ResNet_YoloV3, 2019R3_PV_RC_FP11_InceptionV1_ResNet_SqueezeNet_TinyYolo_VGG
MobileNet (Caffe) 2019R3_PV_PL1_FP16_AlexNet_GoogleNet_InceptionV1_SSD300_Generic, 2019R3_PV_PL1_FP11_MobileNet_TinyYolo_Clamp 2019R3_PV_PL2_FP16_AlexNet_GoogleNet_InceptionV1_SSD300_Generic, 2019R3_PV_PL2_FP11_ResNet_VGG 2019R3_PV_RC_FP16_AlexNet_GoogleNet_Generic, 2019R3_PV_RC_FP11_AlexNet_GoogleNet_Generic
MobileNet (TensorFlow) 2019R3_PV_PL1_FP16_MobileNet_Clamp, 2019R3_PV_PL1_FP11_MobileNet_TinyYolo_Clamp 2019R3_PV_PL2_FP16_MobileNet_Clamp, 2019R3_PV_PL2_FP11_MobileNet_TinyYolo_Clamp 2019R3_PV_RC_FP16_MobileNet_Clamp, 2019R3_PV_RC_FP11_MobileNet_Clamp
SqueezeNet-based variant of the SSD* 2019R3_PV_PL1_FP16_SqueezeNet_VGG, 2019R3_PV_PL1_FP11_InceptionV1_SqueezeNet 2019R3_PV_PL2_FP16_SqueezeNet_VGG, 2019R3_PV_PL2_FP11_InceptionV1_SqueezeNet 2019R3_PV_RC_FP16_SqueezeNet_TinyYolo, 2019R3_PV_RC_FP11_InceptionV1_ResNet_SqueezeNet_TinyYolo_VGG
GoogleNet-based variant of SSD 2019R3_PV_PL1_FP16_AlexNet_GoogleNet_InceptionV1_SSD300_Generic, 2019R3_PV_PL1_FP11_AlexNet_GoogleNet_Generic 2019R3_PV_PL2_FP16_AlexNet_GoogleNet_InceptionV1_SSD300_Generic, 2019R3_PV_PL2_FP11_AlexNet_GoogleNet_Generic 2019R3_PV_RC_FP16_AlexNet_GoogleNet_Generic, 2019R3_PV_RC_FP11_AlexNet_GoogleNet_Generic
ResNet-based variant of SSD 2019R3_PV_PL1_FP16_ResNet_TinyYolo_YoloV3, 2019R3_PV_PL1_FP11_ResNet_VGG 2019R3_PV_PL2_FP16_ResNet_TinyYolo_YoloV3, 2019R3_PV_PL2_FP11_ResNet_VGG 2019R3_PV_RC_FP16_ResNet_YoloV3, 2019R3_PV_RC_FP11_InceptionV1_ResNet_SqueezeNet_TinyYolo_VGG
RMNet 2019R3_PV_PL1_FP16_RMNet, 2019R3_PV_PL1_FP11_RMNet 2019R3_PV_PL2_FP16_RMNet, 2019R3_PV_PL2_FP11_RMNet 2019R3_PV_RC_FP16_RMNet, 2019R3_PV_RC_FP11_RMNet
Yolo v3 2019R3_PV_PL1_FP16_ResNet_TinyYolo_YoloV3, 2019R3_PV_PL1_FP11_YoloV3_ELU 2019R3_PV_PL2_FP16_ResNet_TinyYolo_YoloV3, 2019R3_PV_PL2_FP11_YoloV3_ELU 2019R3_PV_RC_FP16_ResNet_YoloV3, 2019R3_PV_RC_FP11_YoloV3_ELU

In addition to the list above, arbitrary topologies having big continues subgraphs consisting of layers supported by FPGA plugin are recommended to be executed on FPGA plugin.

Translate from Architecture to FPGA Bitstream Files

Various FPGA bitstreams that support CNN are available in the OpenVINO™ toolkit package for FPGA.

To select the correct bitstream (.aocx) file for an architecture, select a network (for example, Resnet-18) from the table above for either the Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Speed Grade 1), Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Speed Grade 2) or the Intel® Programmable Acceleration Card (PAC) with Intel® Arria® 10 GX FPGA and note the corresponding architecture.

The following table describes several parameters that might help you to select the proper bitstream for your needs:

Name Board Precision LRN Support Leaky ReLU Support PReLU Support Clamp Support ELU Support
2019R3_PV_PL1_FP11_AlexNet_GoogleNet_Generic Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 1) FP11 true true true false false
2019R3_PV_PL1_FP11_InceptionV1_SqueezeNet Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 1) FP11 false true true false false
2019R3_PV_PL1_FP11_MobileNet_TinyYolo_Clamp Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 1) FP11 false true true true false
2019R3_PV_PL1_FP11_ResNet_VGG Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 1) FP11 false false false false false
2019R3_PV_PL1_FP11_RMNet Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 1) FP11 false true true false true
2019R3_PV_PL1_FP11_SSD300 Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 1) FP11 true true true false false
2019R3_PV_PL1_FP11_YoloV3_ELU Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 1) FP11 false true true false true
2019R3_PV_PL1_FP16_AlexNet_GoogleNet_InceptionV1_SSD300_Generic Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 1) FP16 true true true false false
2019R3_PV_PL1_FP16_ELU Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 1) FP16 false true true false true
2019R3_PV_PL1_FP16_MobileNet_Clamp Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 1) FP16 false true true true false
2019R3_PV_PL1_FP16_ResNet_TinyYolo_YoloV3 Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 1) FP16 false true true false false
2019R3_PV_PL1_FP16_RMNet Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 1) FP16 false true true false true
2019R3_PV_PL1_FP16_SqueezeNet_VGG Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 1) FP16 false true true false false
2019R3_PV_PL2_FP11_AlexNet_GoogleNet_Generic Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) FP11 true true true false false
2019R3_PV_PL2_FP11_InceptionV1_SqueezeNet Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) FP11 false true true false false
2019R3_PV_PL2_FP11_MobileNet_TinyYolo_Clamp Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) FP11 false true true true false
2019R3_PV_PL2_FP11_ResNet_VGG Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) FP11 false false false false false
2019R3_PV_PL2_FP11_RMNet Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) FP11 false true true false true
2019R3_PV_PL2_FP11_SSD300 Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) FP11 true true true false false
2019R3_PV_PL2_FP11_YoloV3_ELU Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) FP11 false true true false true
2019R3_PV_PL2_FP16_AlexNet_GoogleNet_InceptionV1_SSD300_Generic Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) FP16 true true true false false
2019R3_PV_PL2_FP16_ELU Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) FP16 false true true false true
2019R3_PV_PL2_FP16_MobileNet_Clamp Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) FP16 false true true true false
2019R3_PV_PL2_FP16_ResNet_TinyYolo_YoloV3 Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) FP16 false true true false false
2019R3_PV_PL2_FP16_RMNet Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) FP16 false true true false true
2019R3_PV_PL2_FP16_SqueezeNet_VGG Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) FP16 false true true false false
2019R3_PV_RC_FP11_AlexNet_GoogleNet_Generic Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP11 true true true false false
2019R3_PV_RC_FP11_InceptionV1_ResNet_SqueezeNet_TinyYolo_VGG Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP11 false true true false false
2019R3_PV_RC_FP11_MobileNet_Clamp Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP11 false true true true false
2019R3_PV_RC_FP11_RMNet Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP11 false true true false true
2019R3_PV_RC_FP11_Streaming Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP11 true false false false false
2019R3_PV_RC_FP11_Streaming_Slicing Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP11 true false false false false
2019R3_PV_RC_FP11_YoloV3_ELU Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP11 false true true false true
2019R3_PV_RC_FP16_AlexNet_GoogleNet_Generic Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP16 true true true false false
2019R3_PV_RC_FP16_ELU Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP16 false true true false true
2019R3_PV_RC_FP16_InceptionV1_VGG Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP16 false true true false false
2019R3_PV_RC_FP16_MobileNet_Clamp Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP16 false true true true false
2019R3_PV_RC_FP16_ResNet_YoloV3 Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP16 false true true false false
2019R3_PV_RC_FP16_RMNet Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP16 false true true false true
2019R3_PV_RC_FP16_SqueezeNet_TinyYolo Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA FP16 false true true false false

Set Environment for Running the FPGA Plugin

To make the FPGA plugin run directly or through the heterogeneous plugin, set up the environment:

  1. Set up environment to access Intel® FPGA RTE for OpenCL:
    source /opt/altera/aocl-pro-rte/aclrte-linux64/init_opencl.sh
  2. Set additional environment variables for the FPGA plugin from the following table:
Variable Setting
DLA_AOCX Path to the bitstream to the bitstream which can be programmed to the card. See section Translation from Architecture to FPGA Bitstream Files for choosing a bitstream for your chosen network and board.

Try to avoid programming the bit stream during run time. Program the FPGA before.

If you want to program the bitstream during run time, set CL_CONTEXT_COMPILER_MODE_INTELFPGA=1
CL_CONTEXT_COMPILER_MODE_INTELFPGA To prevent the host application from programming the FPGA, set this variable to a value of 3.

Program the bitstream in advance.

Refer to the Program a Bitstream section in the Installation Guide
ACL_PCIE_USE_JTAG_PROGRAMMING Set this variable to a value of 1 to force FPGA reprogramming using JTAG

Analyzing Heterogeneous Execution

Besides generation of .dot files, you can use the error listening mechanism:

class FPGA_ErrorListener : public InferenceEngine::IErrorListener
{
public:
virtual void onError(const char *msg) noexcept override {
std::cout << msg;
}
};
...
FPGA_ErrorListener err_listener;
core.SetLogCallback(err_listener); // will be used for FPGA device as well

If during network loading some layers are decided to be executed on a fallback plugin, the following message is printed:

Layer (Name: detection_out, Type: DetectionOutput) is not supported:
custom or unknown.
Has (3) sets of inputs, must be 1, or 2.
Input dimensions (2) should be 4.

Multiple FPGA Devices Support

The Inference Engine FPGA plugin provides an ability to load different networks on multiple FPGA devices. For example, to load two networks AlexNet and MobileNet v2 on two different FPGA devices, follow the steps below:

  1. Program each FGPA device with a corresponding bitstream:
    aocl program acl0 2019R3_PV_PL1_FP16_AlexNet_GoogleNet_InceptionV1_SSD300_Generic.aocx
    aocl program acl1 2019R3_PV_PL1_FP16_MobileNet_Clamp.aocx
    For more information about bitstream programming instructions, refer to Installation Guide for Linux* with Support for FPGA
  2. All FPGA devices are enumerated with unique ID starting from 0. By default, all networks are loaded to the default device with ID 0. If you want to load a network on a particular non-default device, specify the KEY_DEVICE_ID parameter for C++ and DEVICE_ID parameter for Python*. The following code snippets demonstrates how to load the AlexNet network on the FPGA device with ID 0 and the MobileNet v2 network on the device with ID 1:
    • With C++:
      // Load AlexNet network on the first FPGA device programmed with bitstream supporting AlexNet
      CNNNetReader reader1;
      reader1.ReadNetwork("alexnet.xml");
      reader1.ReadWeights("alexnet.bin");
      auto exeNetwork1 = core.LoadNetwork(reader1.getNetwork(), "FPGA.0");
      // Load MobileNet network on the second FPGA device programmed with MobileNet bitstream
      CNNNetReader reader2;
      reader2.ReadNetwork("mobilenet_v2.xml");
      reader2.ReadWeights("mobilenet_v2.bin");
      auto exeNetwork2 = core.LoadNetwork(reader2.getNetwork(), "FPGA", { { KEY_DEVICE_ID, "1" } });
    • With Python:
      # Load AlexNet network on the first FPGA device programmed with bitstream supporting AlexNet
      net1 = IENetwork(model="alexnet.xml", weights="alexnet.bin")
      plugin.load(network=net1, config={"DEVICE_ID": "0"})
      # Load MobileNet network on the second FPGA device programmed with MobileNet bitstream
      net2 = IENetwork(model="mobilenet_v2.xml", weights="mobilenet_v2.bin")
      plugin.load(network=net2, config={"DEVICE_ID": "1"})
      Note that you have to use asynchronous infer requests to utilize several FPGA devices, otherwise the execution on devices is performed sequentially.

Import and Export Network Flow

Since the 2019 R4 release, FPGA and HETERO plugins support the export and import flow, which allows to export a compiled network from a plugin to a binary blob by running the command below:

$ ./compile_tool -m resnet.xml -DLA_ARCH_NAME 4x2x16x32_fp16_sb9408_fcd1024_actk4_poolk4_normk1_owk2_image300x300x8192_mbfr -d HETERO:FPGA,CPU
Inference Engine:
API version ............ 2.1
Build .................. 6db44e09a795cb277a63275ea1395bfcb88e46ac
Description ....... API
Done

Once the command is executed, the binary blob named resnet.blob is created at the working directory. Refer to the Compile tool documentation for more details.

A compiled binary blob can be later imported via InferenceEngine::Core::Import:

std::ifstream strm("resnet.blob");
auto execNetwork = core.Import(strm);

How to Interpret Performance Counters

As a result of collecting performance counters using InferenceEngine::InferRequest::GetPerformanceCounts you can find out performance data about execution on FPGA, pre-processing and post-processing data and data transferring from/to FPGA card.

If network is sliced to two parts that are executed on CPU, you can find performance data about Intel® MKL-DNN kernels, their types, and other useful information.

Limitations of the FPGA Support for CNN

The Inference Engine FPGA plugin has limitations on network topologies, kernel parameters, and batch size.

See Also