Download and Cut Datasets

To download original ImageNet, Pascal Visual Object Classes (VOC), and Common Objects in Context (COCO) datasets, follow the instructions for each dataset type below. These datasets are considerably big in size. If you want to save time when loading original datasets into the DL Workbench, cut them as described in the following sections.

To learn more about dataset types supported by the DL Workbench and their structure, refer to Dataset Types.

ImageNet Dataset

Download ImageNet Dataset

To download images from ImageNet, you need to have an account and agree to the Terms of Access. Follow the steps below:

  1. Go to the ImageNet homepage:
    imagenet_register_01-b.png
  2. If you have an account, click Login. Otherwise, click Signup in the right upper corner, provide your data, and wait for a confirmation email:
    imagenet_register_01-m-b.png
  3. Once you receive the confirmation email and log in, go to the Download page:
    imagenet_download_00-m-b.png
  4. Select Download Original Images:
    imagenet_download_01-m-b.png
  5. This will redirect you to the Terms of Access page. If you agree to the Terms, continue by clicking Agree and Sign:
    imagenet_terms_of_access_02-m-b.png
  6. Click one of the links in the Download as one tar file section to select it:
    imagenet_download_02-b.png
  7. Save it to the directory with the name provided below:
    C:\Users\Work\imagenet.zip

Cut ImageNet Dataset

Download a script to cut datasets. In a Python* console, run the following command after specifying the parameters:

python C:/Users/Downloads/cut_dataset.py \
--source_archive_dir=C:\Users\Work\imagenet.zip \
--output_size=20 \
--output_archive_dir=C:\Users\Work\subsets \
--dataset_type=imagenet
--first_image=10

This command runs the script with the following arguments:

Parameter Explanation
--source_archive_dir=C:\Users\Work\imagenet.zip Full path to a downloaded archive
--output_size=20 Number of images to be left in a smaller dataset
--output_archive_dir=C:\Users\Work\subsets Full directory to the smaller dataset, excluding the name
--dataset_type=imagenetType of the source dataset
--first_image=10Optional. The index of the image to start cutting from. Specify if you want to split your dataset into train/val subsets. The default is 0.

Pascal Visual Object Classes (VOC) Dataset

Download Pascal VOC Dataset

To download test data from Pascal VOC, you need to have an account. Follow the steps below:

  1. Go to the PASCAL Visual Object Classes Homepage:
    voc_homepage-b.png
  2. Click PASCAL VOC Evaluation Server under the Pascal VOC data sets heading:
    voc_evaluation_server_01-m-b.png
  3. If you have an account, click Login in the left upper corner. Otherwise, click Registration, provide your data, and wait for a confirmation email:
    voc_login_register-m-b.png
  4. Once you receive the confirmation email and log in, click Downloads:
    voc_download_01-m-b.png
  5. Select a dataset:
    voc_download_02-b.png
  6. Save it to the directory and with the name provided below:
    C:\Users\Work\voc.tar.gz

Cut Pascal VOC Dataset

Download a script to cut datasets. In a Python* console, run the following command after specifying the parameters:

python C:/Users/Downloads/cut_dataset.py \
--source_archive_dir=C:\Users\Work\voc.tar.gz \
--output_size=20 \
--output_archive_dir=C:\Users\Work\subsets \
--dataset_type=voc
--first_image=10

This command runs the script with the following arguments:

Parameter Explanation
--source_archive_dir=C:\Users\Work\voc.tar.gz Full path to a downloaded archive
--output_size=20 Number of images to be left in a smaller dataset
--output_archive_dir=C:\Users\Work\subsets Full directory to the smaller dataset, excluding the name
--dataset_type=vocType of the source dataset
--first_image=10Optional. The index of the image to start cutting from. Specify if you want to split your dataset into train/val subsets. The default is 0.

Common Objects in Context (COCO) Dataset

Download COCO Dataset

To use a dataset from the COCO website, download annotations and images archives separately. Choose one of the options:

Cut COCO Dataset

Download a script to cut COCO datasets. In a Python* console, run the following command after specifying the parameters:

python C:/Users/Downloads/cut_dataset.py \
--source_images_archive_dir=C:\Users\Work\val<year>.zip \
--source_annotations_archive_dir=C:\Users\Work\trainval_annotations<year>.zip \
--output_size=20 \
--output_archive_dir=C:\Users\Work\subsets \
--first_image=10

This command runs the script with the following arguments:

Parameter Explanation
--source_images_archive_dir=<full_path_to_source_images_archive> Full path to the downloaded archive with images, including the name
--source_annotations_archive_dir=<full_path_to_source_annotations_archive> Full path to the downloaded archive with annotations, including the name
--output_size=20 Number of images to be left in a smaller dataset
--output_archive_dir=C:\Users\Work\subsets Full directory to the smaller dataset excluding the name
--dataset_type=cocoType of the source dataset
--first_image=10Optional. The number of the image to start cutting from. Specify if you want to split your dataset into train/val subsets. The default is 0.