Crowding and the architecture of the visual system
Lausanne : École Polytechnique Fédérale de Lausanne
Thèse ; 7582
Number of pages
École Polytechnique Fédérale de Lausanne, 7 februari 2020
Promotor : Herzog, M.H.
Display more detailsDisplay less details
SW OZ DCC AI
SubjectCognitive artificial intelligence
Classically, vision is seen as a cascade of local, feedforward computations. This framework has been tremendously successful, inspiring a wide range of ground-breaking findings in neuroscience and computer vision. Recently, feedforward Convolutional Neural Networks (ffCNNs), a kind of deep neural network inspired by this classic framework, have revolutionized computer vision and been adopted as tools in neuroscience. However, despite these successes, there is much more to vision. First, there are flagrant architectural differences between biological systems and the classic framework. For example, recurrence is abundant in the brain but absent from the classic framework and ffCNNs. Although there is widespread agreement about the importance of these recurrent connections, their computational role is still poorly understood. Second, these architectural differences lead to behavioural differences too, highlighted by psychophysical evidence. Relatedly, ffCNNs are extremely vulnerable to small changes to their inputs and do not generalize well beyond the dataset used to train them. Human vision, in contrast, is much more robust. New insights are needed to face up to these challenges. In this thesis, I use visual crowding and related psychophysical effects as probes into visual processes that go beyond the classic framework. In crowding, perception of a target deteriorates in clutter. I focus on global aspects of crowding, in which perception of a small target is strongly modulated by the global configuration of elements across the visual field. I show that models based on the classic framework, including ffCNNs, cannot explain these effects for principled reasons and identify recurrent grouping and segmentation as a key missing ingredient. Then, I show that capsule networks, a recent kind of deep learning architecture combining the power of ffCNNs with recurrent grouping and segmentation, naturally explain these effects. I provide psychophysical evidence that humans indeed use a similar recurrent grouping and segmentation strategy in global crowding effects. In crowding, visual elements interfere across space. To study how elements interfere over time, I use the Sequential Metacontrast psychophysical paradigm, in which perception of visual elements depends on elements presented hundreds of milliseconds later. I psychophysically characterize the temporal structure of this interference and propose a simple computational model. My results support the idea that perception is a discrete process. I lay out theoretical implications of these findings. Together, the results presented here provide stepping-stones towards a fuller understanding of the visual system by suggesting architectural changes needed for more human-like neural computations.
This item appears in the following Collection(s)
- Non RU Publications 
Upload full text
Use your RU credentials (u/z-number and password) to log in with SURFconext to upload a file for processing by the repository team.