The LITE series of image datasets

Lighweight datasets for experiments derived from the ImageNet dataset.

Problem Setting

Training a CNN on the entire ILSVRC2012 training set with 1281167 images is a process that requires a substancial amount of time. For experimenting with multiple CNN architectures, or when training without sufficient computational resources (GPU(s) with enough memory) you have to use few images per class or smaller input resolution. Training an architecture on tiny resolutions like 32x32 or 64x64 generates an entirely different model, with regard to its local feature extraction capabilities. Thus, it is difficult to compare the performance of an experimental CNN architecture to that of the state of the art CNNs, since they are trained on the visual features distribution of the ILSVRC2012 training set.

Less ImageNet Training Examples - LITE

To cope with the above, a lesser count of images and/or classes is used with the standard medium resolutions, like 227x227 and 299x299. Hence, the Less ImageNet Training Examples - LITE datasets that have a ground truth set of images which are randomly selected from the ILSVRC2012 training set. For each LITE test set, all 50 images per class that are available in the ILSVRC2012 validation set are used. The ground truth set can be splitted into 90% training and 10% validation sets. There are currently 8 different LITE datasets depending on the desired amount of classes and images for the experiments:

  • LITE10  :10 classes, 480 samples/class, 4800 ground truth set (GTS) images,1000 as test set (TS)
  • LITE20  :20 classes, 480 samples/class, GTS: 9600,TS: 2000
  • LITE30  :30 classes, 640 samples/class, GTS: 19200, TS: 3000
  • LITE50  :50 classes, 768 samples/class, GTS: 38400, TS: 5000
  • LITE100:100 classes, 768 samples/class, GTS: 76800,TS: 10000
  • LITE200:200 classes, 720 samples/class, GTS: 144000,TS: 20000
  • LITE250:250 classes, 960 samples/class, GTS: 240000,TS: 25000
  • LITE1000:1000 classes,480 samples/class, GTS: 480000, TS: The ILSVRC2012 validation set

For mini-batch SGD training with TALOS, that uses a disk cache of pages, some compatible mini-batch sizes are 5, 10, 15, 20, 24, 30, 40, 48, 60, 80, 120, 160 and 240.

LITE20

20 classes, 9600 grouth truth images

LITE100

100 classes, 76800 grouth truth images

LITE10

10 classes, 4800 ground truth images

LITE250

250 classes, 240000 ground truth images

Contact me

Feel free to contact me at my email