Click here to flash read.
Food image analysis is the groundwork for image-based dietary assessment,
which is the process of monitoring what kinds of food and how much energy is
consumed using captured food or eating scene images. Existing deep
learning-based methods learn the visual representation for downstream tasks
based on human annotation of each food image. However, most food images in real
life are obtained without labels, and data annotation requires plenty of time
and human effort, which is not feasible for real-world applications. To make
use of the vast amount of unlabeled images, many existing works focus on
unsupervised or self-supervised learning of visual representations directly
from unlabeled data. However, none of these existing works focus on food
images, which is more challenging than general objects due to its high
inter-class similarity and intra-class variance.
In this paper, we focus on the implementation and analysis of existing
representative self-supervised learning methods on food images. Specifically,
we first compare the performance of six selected self-supervised learning
models on the Food-101 dataset. Then we analyze the pros and cons of each
selected model when training on food data to identify the key factors that can
help improve the performance. Finally, we propose several ideas for future work
on self-supervised visual representation learning for food images.
No creative common's license