SPRADB0 may 2023 AM62A3 , AM62A3-Q1 , AM62A7 , AM62A7-Q1 , AM67A , AM68A , AM69A

2 Creating the Dataset

The first step of neural network, a.k.a. "model", creation is to create/curate a dataset. As a data-driven methodology to solving problems, machine learning and deep learning models are only as good as the data these models are trained on. It is ideal to train a model with data that is custom to the final task is designed for.

Public datasets like COCO or Imagenet provide a convenient path to developing and evaluating a deep learning model. There are many of publicly available datasets; many can be found on sites like paperswithcode.com [5]. However, not all public datasets are usable per license terms; they may also have too few high-quality data points. At the same time, custom datasets are time-consuming to create.

In the context of a retail scanner application, the usable and licensable online datasets were not high enough quality to use as-is. Many images were crowd-sourced and poorly labeled, such that trained model had poor performance on both the validation data subset as well as in practice within a well-lit checkout area. Creating a dataset from scratch was necessary.