An ecologically motivated image dataset for deep learning yields better models of human vision
Number of pages
SourceProceedings of the National Academy of Sciences USA, 118, 8, (2021), article e2011417118
Article / Letter to editor
Display more detailsDisplay less details
SW OZ DCC AI
Proceedings of the National Academy of Sciences USA
SubjectCognitive artificial intelligence
Inspired by core principles of information processing in the brain, deep neural networks (DNNs) have demonstrated remarkable success in computer vision applications. At the same time, networks trained on the task of object classification exhibit similarities to representations found in the primate visual system. This result is surprising because the datasets commonly used for training are designed to be engineering challenges. Here, we use linguistic corpus statistics and human concreteness ratings as guiding principles to design a resource that more closely mirrors categories that are relevant to humans. The result is ecoset, a collection of 1.5 million images from 565 basic-level categories. We show that ecoset-trained DNNs yield better models of human higher-level visual cortex and human behavior.Deep neural networks provide the current best models of visual information processing in the primate brain. Drawing on work from computer vision, the most commonly used networks are pretrained on data from the ImageNet Large Scale Visual Recognition Challenge. This dataset comprises images from 1,000 categories, selected to provide a challenging testbed for automated visual object recognition systems. Moving beyond this common practice, we here introduce ecoset, a collection of >1.5 million images from 565 basic-level categories selected to better capture the distribution of objects relevant to humans. Ecoset categories were chosen to be both frequent in linguistic usage and concrete, thereby mirroring important physical objects in the world. We test the effects of training on this ecologically more valid dataset using multiple instances of two neural network architectures: AlexNet and vNet, a novel architecture designed to mimic the progressive increase in receptive field sizes along the human ventral stream. We show that training on ecoset leads to significant improvements in predicting representations in human higher-level visual cortex and perceptual judgments, surpassing the previous state of the art. Significant and highly consistent benefits are demonstrated for both architectures on two separate functional magnetic resonance imaging (fMRI) datasets and behavioral data, jointly covering responses to 1,292 visual stimuli from a wide variety of object categories. These results suggest that computational visual neuroscience may take better advantage of the deep learning framework by using image sets that reflect the human perceptual and cognitive experience. Ecoset and trained network models are openly available to the research community. All materials presented in this paper (ecoset dataset, pretrained networks, and test stimuli) are openly available for research purposes via CodeOcean (52): https://dx.doi.org/10.24433/CO.4784989.v1.
This item appears in the following Collection(s)
- Academic publications 
- Electronic publications 
- Faculty of Social Sciences 
- Open Access publications 
Upload full text
Use your RU credentials (u/z-number and password) to log in with SURFconext to upload a file for processing by the repository team.