{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,12,7]],"date-time":"2022-12-07T14:30:36Z","timestamp":1670423436844},"reference-count":46,"publisher":"MIT Press - Journals","issue":"9","content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,19]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The insideness problem is an aspect of image segmentation that consists of determining which pixels are inside and outside a region. Deep neural networks (DNNs) excel in segmentation benchmarks, but it is unclear if they have the ability to solve the insideness problem as it requires evaluating long-range spatial dependencies. In this letter, we analyze the insideness problem in isolation, without texture or semantic cues, such that other aspects of segmentation do not interfere in the analysis. We demonstrate that DNNs for segmentation with few units have sufficient complexity to solve the insideness for any curve. Yet such DNNs have severe problems with learning general solutions. Only recurrent networks trained with small images learn solutions that generalize well to almost any curve. Recurrent networks can decompose the evaluation of long-range dependencies into a sequence of local operations, and learning with small images alleviates the common difficulties of training recurrent networks with a large number of unrolling steps.<\/jats:p>","DOI":"10.1162\/neco_a_01413","type":"journal-article","created":{"date-parts":[[2021,6,30]],"date-time":"2021-06-30T21:34:16Z","timestamp":1625088856000},"page":"2511-2549","update-policy":"http:\/\/dx.doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":2,"title":["Do Neural Networks for Segmentation Understand Insideness?"],"prefix":"10.1162","volume":"33","author":[{"given":"Kimberly","family":"Villalobos","sequence":"first","affiliation":[{"name":"Center for Brains, Minds and Machines, MIT, Cambridge, MA 02139, U.S.A. kimvc@mit.edu"}]},{"given":"Vilim","family":"\u0160tih","sequence":"additional","affiliation":[{"name":"Center for Brains, Minds and Machines, MIT, Cambridge, MA 02139, U.S.A., and Max Planck Institute of Neurobiology, 82152 Martinsried, Germany vilim@neuro.mpg.de"}]},{"given":"Amineh","family":"Ahmadinejad","sequence":"additional","affiliation":[{"name":"Center for Brains, Minds and Machines, MIT, Cambridge, MA 02139, U.S.A. amineh@mit.edu"}]},{"given":"Shobhita","family":"Sundaram","sequence":"additional","affiliation":[{"name":"Center for Brains, Minds and Machines, MIT, Cambridge, MA 02139, U.S.A. shobhita@mit.edu"}]},{"given":"Jamell","family":"Dozier","sequence":"additional","affiliation":[{"name":"Center for Brains, Minds and Machines, MIT, Cambridge, MA 02139, U.S.A. jamell@mit.edu"}]},{"given":"Andrew","family":"Francl","sequence":"additional","affiliation":[{"name":"Center for Brains, Minds and Machines, MIT, Cambridge, MA 02139, U.S.A. francl@mit.edu"}]},{"given":"Frederico","family":"Azevedo","sequence":"additional","affiliation":[{"name":"Center for Brains, Minds and Machines, MIT, Cambridge, MA 02139, U.S.A. fazevedo@mit.edu"}]},{"given":"Tomotake","family":"Sasaki","sequence":"additional","affiliation":[{"name":"Fujitsu Laboratories, Kawasaki 211-8588, Japan, and Center for Brains, Minds and Machines, MIT, Cambridge, MA 02139, U.S.A. tomotake.sasaki@fujitsu.com"}]},{"given":"Xavier","family":"Boix","sequence":"additional","affiliation":[{"name":"Center for Brains, Minds and Machines, MIT, Cambridge, MA 02139, U.S.A. xboix@mit.edu"}]}],"member":"281","published-online":{"date-parts":[[2021,8,19]]},"reference":[{"key":"2021081922222826700_B1","author":"A140517: Number of cycles in an n \u00d7 n grid. (N.d.)","journal-title":"The on-line encyclopedia of integer sequences"},{"key":"2021081922222826700_B2","author":"Alom","year":"2018","journal-title":"Recurrent residual convolutional neural network based on U-Net (R2U-Net) for medical image segmentation."},{"issue":"12","key":"2021081922222826700_B3","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"SegNet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"2","key":"2021081922222826700_B4","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1109\/72.279181","volume":"5","author":"Bengio","year":"1994","journal-title":"IEEE Transactions on Neural Networks"},{"key":"2021081922222826700_B5","first-page":"4013","article-title":"MaskLab: Instance segmentation by refining object detection with semantic and direction features.","author":"Chen","year":"2018","journal-title":"Proceedings of the 31st IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"issue":"4","key":"2021081922222826700_B6","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs","volume":"40","author":"Chen","year":"2018","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2021081922222826700_B7","first-page":"80","article-title":"Encoder-decoder with atrous separable convolution for semantic image segmentation.","author":"Chen","year":"2018","journal-title":"Proceedings of the 15th European Conference on Computer Vision"},{"issue":"4","key":"2021081922222826700_B8","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/BF02551274","article-title":"Approximation by superpositions of a sigmoidal function","volume":"2","author":"Cybenko","year":"1989","journal-title":"Mathematics of Control, Signals and Systems"},{"key":"2021081922222826700_B9","article-title":"The PASCAL Visual Object Classes Challenge 2012 (VOC2012) results.","author":"Everingham"},{"key":"2021081922222826700_B10","first-page":"249","article-title":"Understanding the difficulty of training deep feedforward neural networks.","author":"Glorot","year":"2010","journal-title":"Proceedings of the 13th International Conference on Artificial Intelligence and Statistics"},{"key":"2021081922222826700_B11","first-page":"4125","volume-title":"Advances in neural information processing systems","author":"Gruslys","year":"2016"},{"key":"2021081922222826700_B12","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1016\/B978-0-12-336156-1.50013-6","volume-title":"Graphics gems IV","author":"Haines","year":"1994"},{"key":"2021081922222826700_B13","doi-asserted-by":"crossref","DOI":"10.21236\/AD0705364","author":"Harary","year":"1969","journal-title":"Graph theory"},{"key":"2021081922222826700_B14","first-page":"2961","article-title":"Mask R-CNN.","author":"He","year":"2017","journal-title":"Proceedings of the 16th IEEE International Conference on Computer Vision"},{"issue":"8","key":"2021081922222826700_B15","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Computation"},{"key":"2021081922222826700_B16","first-page":"4233","article-title":"Learning to segment every thing.","author":"Hu","year":"2018","journal-title":"Proceedings of the 31st IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"key":"2021081922222826700_B17","article-title":"Fast computation of the number of paths in a grid graph.","author":"Iwashita","year":"2013,","journal-title":"Proceedings of the 16th Japan Conference on Discrete and Computational Geometry and Graphs"},{"key":"2021081922222826700_B18","doi-asserted-by":"crossref","DOI":"10.7554\/eLife.14320","article-title":"Serial grouping of 2D-image regions with object-based attention in humans","volume":"5","author":"Jeurissen","year":"2016","journal-title":"Elife"},{"key":"2021081922222826700_B19","article-title":"Table of n, a(n) for n = 0..26","author":"Karavaev","year":"..","journal-title":"The on-line encyclopedia of integer sequences"},{"key":"2021081922222826700_B20","article-title":"Disentangling neural mechanisms for perceptual grouping.","author":"Kim","year":"2020","journal-title":"Proceedings of the 8th International Conference on Learning Representations"},{"issue":"4","key":"2021081922222826700_B21","doi-asserted-by":"crossref","DOI":"10.1098\/rsfs.2018.0011","article-title":"Not-So-CLEVR: Learning same\u2013different relations strains feedforward neural networks","volume":"8","author":"Kim","year":"2018","journal-title":"Interface Focus"},{"key":"2021081922222826700_B22","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1007\/978-1-4615-1529-6_3","volume-title":"Foundations of image understanding","author":"Kong","year":"2001"},{"key":"2021081922222826700_B23","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1016\/j.neucom.2019.02.003","article-title":"Survey on semantic segmentation using deep learning techniques","volume":"338","author":"Lateef","year":"2019","journal-title":"Neurocomputing"},{"key":"2021081922222826700_B24","first-page":"3659","article-title":"Iterative instance segmentation.","author":"Li","year":"2016","journal-title":"Proceedings of the 29th IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"key":"2021081922222826700_B25","first-page":"5745","article-title":"Referring image segmentation via recurrent refinement networks.","author":"Li","year":"2018","journal-title":"Proceedings of the 31st IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"key":"2021081922222826700_B26","first-page":"2359","article-title":"Fully convolutional instance-aware semantic segmentation.","author":"Li","year":"2017","journal-title":"Proceedings of the 30th IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"key":"2021081922222826700_B27","first-page":"152","volume-title":"Advances in neural information processing systems","author":"Linsley","year":"2018"},{"key":"2021081922222826700_B28","first-page":"9605","volume-title":"Advances in neural information processing systems","author":"Liu","year":"2018"},{"key":"2021081922222826700_B29","first-page":"8759","author":"Liu","year":"2018","journal-title":"Proceedings of the 31st IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"key":"2021081922222826700_B30","first-page":"3431","article-title":"Fully convolutional networks for semantic segmentation.","author":"Long","year":"2015","journal-title":"Proceedings of the 28th IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"key":"2021081922222826700_B31","first-page":"616","article-title":"Deep extreme cut: from extreme points to object segmentation.","author":"Maninis","year":"2018","journal-title":"Proceedings of the 31st IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"key":"2021081922222826700_B32","author":"Minsky","year":"1969","journal-title":"Perceptrons: An introduction to computational geometry"},{"key":"2021081922222826700_B33","first-page":"1310","article-title":"On the difficulty of training recurrent neural networks.","author":"Pascanu","year":"2013","journal-title":"Proceedings of the 30th International Conference on Machine Learning"},{"key":"2021081922222826700_B34","first-page":"234","article-title":"U-Net: convolutional networks for biomedical image segmentation.","author":"Ronneberger","year":"2015","journal-title":"Proceedings of the 18th International Conference on Medical Image Computing and Computer Assisted Intervention"},{"issue":"1","key":"2021081922222826700_B35","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1145\/321556.321570","article-title":"Connectivity in digital pictures","volume":"17","author":"Rosenfeld","year":"1970","journal-title":"Journal of the ACM"},{"key":"2021081922222826700_B36","first-page":"3067","article-title":"Failures of gradient-based deep learning.","author":"Shalev-Shwartz","year":"2017","journal-title":"Proceedings of the 34th International Conference on Machine Learning"},{"key":"2021081922222826700_B37","article-title":"SeedNet: Automatic seed generation with deep reinforcement learning for robust interactive segmentation.","author":"Song","year":"2018","journal-title":"Proceedings of the 31st IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"issue":"1","key":"2021081922222826700_B38","first-page":"2822","article-title":"The implicit bias of gradient descent on separable data","volume":"19","author":"Soudry","year":"2018","journal-title":"Journal of Machine Learning Research"},{"key":"2021081922222826700_B39","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/0010-0277(84)90023-4","article-title":"Visual routines","volume":"18","author":"Ullman","year":"1984","journal-title":"Cognition"},{"key":"2021081922222826700_B40","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/3496.001.0001","author":"Ullman","year":"1996","journal-title":"High-level vision: Object recognition and visual cognition."},{"key":"2021081922222826700_B41","article-title":"ReSeg: A recurrent neural network-based model for semantic segmentation.","author":"Visin","year":"2016","journal-title":"Proceedings of the 29th IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops."},{"key":"2021081922222826700_B42","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v33i01.33011303","article-title":"Cognitive deficit of deep learning in numerosity.","author":"Wu","year":"2019","journal-title":"Proceedings of the 33rd AAAI Conference on Artificial Intelligence"},{"key":"2021081922222826700_B43","article-title":"Convolutional LSTM network: A machine learning approach for precipitation nowcasting.","volume":"28","author":"Xingjian","year":"2015","journal-title":"Advances in neural information processing systems"},{"key":"2021081922222826700_B44","article-title":"Multi-scale context aggregation by dilated convolutions.","author":"Yu","year":"2016","journal-title":"Proceedings of the 4th International Conference on Learning Representations"},{"key":"2021081922222826700_B45","article-title":"Visualizing and understanding convolutional networks.","author":"Zeiler","year":"2014","journal-title":"Proceedings of the 13th European Conference on Computer Vision"},{"key":"2021081922222826700_B46","article-title":"Unpaired image-to-image translation using cycle-consistent adversarial networks.","author":"Zhu","year":"2017","journal-title":"Proceedings of the 30th IEEE\/CVF Conference on Computer Vision and Pattern Recognition"}],"container-title":["Neural Computation"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/direct.mit.edu\/neco\/article-pdf\/33\/9\/2511\/1958091\/neco_a_01413.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/direct.mit.edu\/neco\/article-pdf\/33\/9\/2511\/1958091\/neco_a_01413.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,8,19]],"date-time":"2021-08-19T22:23:47Z","timestamp":1629411827000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/neco\/article\/33\/9\/2511\/102620\/Do-Neural-Networks-for-Segmentation-Understand"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,19]]},"references-count":46,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2021,8,19]]},"published-print":{"date-parts":[[2021,8,19]]}},"URL":"https:\/\/doi.org\/10.1162\/neco_a_01413","relation":{},"ISSN":["0899-7667","1530-888X"],"issn-type":[{"value":"0899-7667","type":"print"},{"value":"1530-888X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,9]]},"published":{"date-parts":[[2021,8,19]]}}}