What are the two subclass functions to override when creating your own custom dataset using PyTorch’s Dataset abstract class?

  1. __len__ function – returns size of dataset

  2. __getitem__ function – returns sample from dataset given an index

What is the purpose of DataLoader function?

The DataLoader is used for batching, sampling, and loading data during the training cycle. It takes in the Dataset object and other optional parameters such as shuffling, batch size, and the number of workers.

DataLoader loads data such that batches are stack vertically rather than horizontally and it handles shuffling for you.

What do you need to do to add transformation pipeline to the custom dataset?

  1. Create transformation classes for different transformation that you want to perform on your custom dataset

  2. Create a transformation pipeline using something similar to Compose (for torchvision)

  3. Load the transformation pipeline into the custom dataset class by feeding it as an parameter when initialising the class

  4. Within the __getitem__ function, transform the data before returning it



Data Scientist

Leave a Reply