Dataset Types

Three different dataset types are available to use in DL Workbench. Your dataset does not need to contain images from databases providing these types, but it needs to adhere to the supported dataset formats.

To learn how to download and prepare datasets, refer to Download and Cut Datasets.

NOTE: For Pascal VOC and COCO datasets, only the Object-Detection task is supported.

ImageNet

ImageNet is a well-known dataset used to train classification models. It consists of an annotation file and images:

|-- annotation.txt
|-- 0001.jpg
|-- 0002.jpg
|...
|-- n.jpg

The annotation file is organized as follows:

0001.jpg <label ID>
0002.jpg <label ID>
...
n.jpg <label ID>

Pascal Visual Object Classes (Pascal VOC)

Pascal VOC is a well-known dataset used to train object-detection and semantic-segmentation models. VOC datasets consist of several folders containing annotation files and image indices.

A VOC dataset archive is organized as follows:

|-- VOCdevkit
|-- VOC
|-- Annotations
|-- 0001.xml
|-- 0002.xml
...
|-- n.xml
|-- ImageSets
|-- Layout
|-- test.txt
|-- Main
|-- 0001_test.txt
|-- 0002_test.txt
...
|-- n_test.txt
|-- Segmentation
|-- test.txt
|-- Images
|-- 0001.jpg
|-- 0002.jpg
...
|-- n.jpg
|-- SegmentationClass
|-- 0001.png
|-- 0002.png
...
|-- n.png
|-- SegmentationObject
|-- 0001.png
|-- 0002.png
...
|-- n.png

Common Objects in Context (COCO)

COCO dataset is used for object detection, segmentation, person keypoints detection, stuff segmentation, and caption generation.

A COCO dataset is downloaded as two separate archives, but you have to create one archive based on them as described in the Download COCO Dataset section in Download and Cut Datasets. To upload a COCO dataset to the DL Workbench, make sure the archive contains the following files:

|-- val
|-- 0001.jpg
|-- 0002.jpg
...
|-- n.jpg
|-- annotations
|-- instances_val.json

A .json file with annotations is organized as follows:

{
"info": <info>,
"images": [<images>],
"licenses": [<licenses>],
"annotations": [<annotations>]
}