{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,23]],"date-time":"2025-06-23T22:02:38Z","timestamp":1750716158815,"version":"3.37.3"},"reference-count":59,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T00:00:00Z","timestamp":1688083200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T00:00:00Z","timestamp":1688083200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The attention mechanism empowers deep learning to a broader range of applications, but the contribution of the attention module is highly controversial. Research on modern Hopfield networks indicates that the attention mechanism can also be used in shallow networks. Its automatic sample filtering facilitates instance extraction in Multiple Instances Learning tasks. Since the attention mechanism has a clear contribution and intuitive performance in shallow networks, this paper further investigates its optimization method based on the recurrent neural network. Through comprehensive comparison, we find that the Synergetic Neural Network has the advantage of more accurate and controllable convergences and revertible converging steps. Therefore, we design the Syn layer based on the Synergetic Neural Network and propose the novel invertible activation function as the forward and backward update formula for attention weights concentration or distraction. Experimental results show that our method outperforms other methods in all Multiple Instances Learning benchmark datasets. Concentration improves the robustness of the results, while distraction expands the instance observing space and yields better results. Codes available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/wzh134\/Syn\">https:\/\/github.com\/wzh134\/Syn<\/jats:ext-link>.<\/jats:p>","DOI":"10.1007\/s40747-023-01133-0","type":"journal-article","created":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T04:01:40Z","timestamp":1688097700000},"page":"7381-7393","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Concentration or distraction? A synergetic-based attention weights optimization method"],"prefix":"10.1007","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9404-8627","authenticated-orcid":false,"given":"Zihao","family":"Wang","sequence":"first","affiliation":[]},{"given":"Haifeng","family":"Li","sequence":"additional","affiliation":[]},{"given":"Lin","family":"Ma","sequence":"additional","affiliation":[]},{"given":"Feng","family":"Jiang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,6,30]]},"reference":[{"doi-asserted-by":"publisher","unstructured":"Weston J, Chopra S, Bordes A (2015) Memory networks. In: 3rd Int conf learn represent ICLR 2015\u2014conf track proc. https:\/\/doi.org\/10.1007\/978-3-030-82184-5_11","key":"1133_CR1","DOI":"10.1007\/978-3-030-82184-5_11"},{"unstructured":"Sukhbaatar S, Szlam A, Weston J, Fergus R (2015) End-to-end memory networks. In: Advances in neural information processing systems","key":"1133_CR2"},{"unstructured":"Daniluk M, Rockt\u00e4schel T, Welbl J, Riedel S (2017) Frustratingly short attention spans in neural language modeling. CoRR abs\/1702.0","key":"1133_CR3"},{"unstructured":"Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Advances in neural information processing systems. pp 5999\u20136009","key":"1133_CR4"},{"unstructured":"Radford A, Narasimhan K, Salimans T, Sutskever I (2018) improving language understanding by generative pre-training. Homol Homotopy Appl","key":"1133_CR5"},{"doi-asserted-by":"crossref","unstructured":"Peters ME, Neumann M, Iyyer M et al (2018) Deep contextualized word representations. In: NAACL HLT 2018\u20142018 conference of the North American chapter of the association for computational linguistics: human language technologies\u2014proceedings of the conference, pp 2227\u20132237","key":"1133_CR6","DOI":"10.18653\/v1\/N18-1202"},{"unstructured":"Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019\u20142019 Conf North Am chapter assoc comput linguist hum lang technol\u2014proc conf 1, pp 4171\u20134186","key":"1133_CR7"},{"unstructured":"Zaheer M, Guruganesh G, Dubey A et al (2020) Big bird: transformers for longer sequences. In: Advances in neural information processing systems","key":"1133_CR8"},{"unstructured":"Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16\u00d716 words: transformers for image recognition at scale. CoRR abs\/2010.1","key":"1133_CR9"},{"unstructured":"Wang Y, Huang R, Song S et al (2021) Not all images are worth 16\u00d716 words: dynamic transformers for efficient image recognition. In: Advances in neural information processing systems, pp 11960\u201311973","key":"1133_CR10"},{"doi-asserted-by":"crossref","unstructured":"Chan W, Jaitly N, Le Q, Vinyals O (2016) Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: ICASSP, IEEE international conference on acoustics, speech and signal processing\u2014proceedings, pp 4960\u20134964","key":"1133_CR11","DOI":"10.1109\/ICASSP.2016.7472621"},{"doi-asserted-by":"crossref","unstructured":"Park DS, Chan W, Zhang Y et al (2019) Specaugment: a simple data augmentation method for automatic speech recognition. In: Proceedings of the annual conference of the international speech communication association, Interspeech, pp 2613\u20132617","key":"1133_CR12","DOI":"10.21437\/Interspeech.2019-2680"},{"doi-asserted-by":"crossref","unstructured":"Rossenbach N, Zeyer A, Schluter R, Ney H (2020) Generating synthetic audio data for attention-based speech recognition systems. In: ICASSP, IEEE international conference on acoustics, speech and signal processing\u2014proceedings, pp 7069\u20137073","key":"1133_CR13","DOI":"10.1109\/ICASSP40776.2020.9053008"},{"unstructured":"Mehta S, Rastegari M (2021) MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer. CoRR abs\/2110.0","key":"1133_CR14"},{"unstructured":"Touvron H, Cord M, Douze M et al (2020) Training data-efficient image transformers & distillation through attention. In: Int Conf Mach Learn, pp 10347\u201310357","key":"1133_CR15"},{"doi-asserted-by":"crossref","unstructured":"Graham B, El-Nouby A, Touvron H et al (2021) LeViT: a vision transformer in ConvNet's clothing for faster inference. In: Proceedings of the IEEE international conference on computer vision, pp 12239\u201312249","key":"1133_CR16","DOI":"10.1109\/ICCV48922.2021.01204"},{"doi-asserted-by":"crossref","unstructured":"Wu H, Xiao B, Codella N et al (2021) CvT: introducing convolutions to vision transformers. In: Proceedings of the IEEE international conference on computer vision, pp 22\u201331","key":"1133_CR17","DOI":"10.1109\/ICCV48922.2021.00009"},{"unstructured":"Cordonnier J-B, Loukas A, Jaggi M (2019) On the relationship between self-attention and convolutional layers. CoRR abs\/1911.0","key":"1133_CR18"},{"doi-asserted-by":"crossref","unstructured":"Bello I, Zoph B, Le Q et al (2019) Attention augmented convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 3285\u20133294","key":"1133_CR19","DOI":"10.1109\/ICCV.2019.00338"},{"unstructured":"Ramachandran P, Bello I, Parmar N et al (2019) Stand-alone self-attention in vision models. In: Advances in neural information processing systems","key":"1133_CR20"},{"unstructured":"Steiner A, Kolesnikov A, Zhai X et al (2021) How to train your ViT? Data, augmentation, and regularization in vision transformers. CoRR abs\/2106.1","key":"1133_CR21"},{"unstructured":"Radford A, Wu J, Child R, Luan D, Dario Amodei IS (2020) GPT2: language models are unsupervised multitask learners. In: OpenAI Blog, pp 1\u20137","key":"1133_CR22"},{"unstructured":"Brown TB, Mann B, Ryder N et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst","key":"1133_CR23"},{"unstructured":"Beltagy I, Peters ME, Cohan A (2020) Longformer: the long-document transformer. CoRR abs\/2004.0","key":"1133_CR24"},{"key":"1133_CR25","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1162\/tacl_a_00353","volume":"9","author":"A Roy","year":"2021","unstructured":"Roy A, Saffar M, Vaswani A, Grangier D (2021) Efficient content-based sparse attention with routing transformers. Trans Assoc Comput Linguist 9:53\u201368. https:\/\/doi.org\/10.1162\/tacl_a_00353","journal-title":"Trans Assoc Comput Linguist"},{"unstructured":"Tay Y, Bahri D, Yang L et al (2020) Sparse Sinkhorn attention. In: 37th Int Conf Mach Learn ICML 2020 Part F16814, pp 9380\u20139389","key":"1133_CR26"},{"key":"1133_CR27","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1016\/j.neucom.2021.04.134","volume":"484","author":"J Duan","year":"2022","unstructured":"Duan J, Liu Z, Li SE et al (2022) Adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints. Neurocomputing 484:128\u2013141. https:\/\/doi.org\/10.1016\/j.neucom.2021.04.134","journal-title":"Neurocomputing"},{"key":"1133_CR28","doi-asserted-by":"publisher","DOI":"10.3934\/dcdss.2021145","author":"V Djordjevic","year":"2022","unstructured":"Djordjevic V, Stojanovic V, Tao H et al (2022) Data-driven control of hydraulic servo actuator based on adaptive dynamic programming. Discret Contin Dyn Syst Ser. https:\/\/doi.org\/10.3934\/dcdss.2021145","journal-title":"Discret Contin Dyn Syst Ser"},{"key":"1133_CR29","doi-asserted-by":"publisher","DOI":"10.1155\/2020\/3152174","author":"J Xu","year":"2020","unstructured":"Xu J, Xu P, Wei Z et al (2020) DC-NNMN: across components fault diagnosis based on deep few-shot learning. Shock Vib. https:\/\/doi.org\/10.1155\/2020\/3152174","journal-title":"Shock Vib"},{"key":"1133_CR30","doi-asserted-by":"publisher","DOI":"10.1088\/1361-6501\/ac8368","volume":"33","author":"H Tao","year":"2022","unstructured":"Tao H, Cheng L, Qiu J, Stojanovic V (2022) Few shot cross equipment fault diagnosis method based on parameter optimization and feature mertic. Meas Sci Technol 33:115005. https:\/\/doi.org\/10.1088\/1361-6501\/ac8368","journal-title":"Meas Sci Technol"},{"doi-asserted-by":"crossref","unstructured":"Widrich M, Sch\u00e4fl B, Pavlovic M et al (2020) Modern Hopfield networks and attention for immune repertoire classification. In: Adv. Neural Inf. Process. Syst","key":"1133_CR31","DOI":"10.1101\/2020.04.12.038158"},{"unstructured":"Ramsauer H, Sch\u00e4fl B, Lehner J et al (2020) Hopfield networks is all you need","key":"1133_CR32"},{"key":"1133_CR33","doi-asserted-by":"publisher","first-page":"329","DOI":"10.1016\/j.patcog.2017.10.009","volume":"77","author":"MA Carbonneau","year":"2018","unstructured":"Carbonneau MA, Cheplygina V, Granger E, Gagnon G (2018) Multiple instance learning: a survey of problem characteristics and applications. Pattern Recognit 77:329\u2013353. https:\/\/doi.org\/10.1016\/j.patcog.2017.10.009","journal-title":"Pattern Recognit"},{"unstructured":"Widrich M, Sch\u00e4fl B, Pavlovi\u0107 M et al (2020) DeepRC: immune repertoire classification with attention-based deep massive multiple instance learning. bioRxiv 2020.04.12.038158","key":"1133_CR34"},{"unstructured":"Ilse M, Tomczak JM, Welling M (2018) Attention-based deep multiple instance learning. In: 35th international conference on machine learning, ICML 2018, pp 3376\u20133391","key":"1133_CR35"},{"key":"1133_CR36","doi-asserted-by":"publisher","DOI":"10.1007\/s00530-022-00992-w","author":"L Zhao","year":"2022","unstructured":"Zhao L, Yuan L, Hao K, Wen X (2022) Generalized attention-based deep multi-instance learning. Multimed Syst. https:\/\/doi.org\/10.1007\/s00530-022-00992-w","journal-title":"Multimed Syst"},{"unstructured":"Jain S, Wallace BC (2019) Attention is not explanation. CoRR abs\/1902.1","key":"1133_CR37"},{"doi-asserted-by":"crossref","unstructured":"Wiegreffe S, Pinter Y (2019) Attention is not explanation. In: EMNLP-IJCNLP 2019\u20142019 conference on empirical methods in natural language processing and 9th international joint conference on natural language processing, proceedings of the conference, pp 11\u201320","key":"1133_CR38","DOI":"10.18653\/v1\/D19-1002"},{"key":"1133_CR39","first-page":"9204","volume":"11","author":"H Liu","year":"2021","unstructured":"Liu H, Dai Z, So DR, Le QV (2021) Pay attention to MLPs. Adv Neural Inf Process Syst 11:9204\u20139215","journal-title":"Adv Neural Inf Process Syst"},{"key":"1133_CR40","doi-asserted-by":"publisher","first-page":"2344","DOI":"10.1609\/aaai.v36i2.20133","volume":"36","author":"C Tang","year":"2022","unstructured":"Tang C, Zhao Y, Wang G et al (2022) Sparse MLP for image recognition: is self-attention really necessary? Proc AAAI Conf Artif Intell 36:2344\u20132351. https:\/\/doi.org\/10.1609\/aaai.v36i2.20133","journal-title":"Proc AAAI Conf Artif Intell"},{"key":"1133_CR41","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-22450-2","volume-title":"Synergetic computers and cognition: a top-down approach to neural nets","author":"H Haken","year":"1991","unstructured":"Haken H (1991) Synergetic computers and cognition: a top-down approach to neural nets. Springer, Berlin"},{"key":"1133_CR42","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1109\/101.9569","volume":"4","author":"HPJ Haken","year":"1988","unstructured":"Haken HPJ (1988) Synergetics. IEEE Circ Devices Mag 4:3\u20137. https:\/\/doi.org\/10.1109\/101.9569","journal-title":"IEEE Circ Devices Mag"},{"unstructured":"Van Den Oord A, Vinyals O, Kavukcuoglu K (2017) Neural discrete representation learning. In: Advances in neural information processing systems, pp 6307\u20136316","key":"1133_CR43"},{"unstructured":"Razavi A, van den Oord A, Vinyals O (2019) Generating diverse high-fidelity images with VQ-VAE-2. In: Advances in neural information processing systems","key":"1133_CR44"},{"key":"1133_CR45","first-page":"394","volume":"26","author":"EH Moore","year":"1920","unstructured":"Moore EH (1920) On the reciprocal of the general algebraic matrix. Bull Am Math Soc 26:394\u2013395","journal-title":"Bull Am Math Soc"},{"key":"1133_CR46","doi-asserted-by":"publisher","first-page":"406","DOI":"10.1017\/S0305004100030401","volume":"51","author":"R Penrose","year":"1955","unstructured":"Penrose R (1955) A generalized inverse for matrices. Math Proc Camb Philos Soc 51:406\u2013413. https:\/\/doi.org\/10.1017\/S0305004100030401","journal-title":"Math Proc Camb Philos Soc"},{"doi-asserted-by":"crossref","unstructured":"Chan CS, Kong H, Liang G (2022) A comparative study of faithfulness metrics for model interpretability methods. In: Proceedings of the annual meeting of the association for computational linguistics, pp 5029\u20135038","key":"1133_CR47","DOI":"10.18653\/v1\/2022.acl-long.345"},{"doi-asserted-by":"crossref","unstructured":"C\u00f3rdova S\u00e1enz CA, Becker K (2021) Assessing the use of attention weights to interpret BERT-based stance classification. In: ACM international conference proceeding series, pp 194\u2013201","key":"1133_CR48","DOI":"10.1145\/3486622.3493966"},{"unstructured":"Andrews S, Tsochantaridis I, Hofmann T (2002) Support vector machines for multiple-instance learning. In: Proceedings of the 15th international conference on neural information processing systems. MIT Press, Cambridge, pp 577\u2013584","key":"1133_CR49"},{"key":"1133_CR50","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1016\/s0004-3702(96)00034-3","volume":"89","author":"TG Dietterich","year":"1997","unstructured":"Dietterich TG, Lathrop RH, Lozano-P\u00e9rez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89:31\u201371. https:\/\/doi.org\/10.1016\/s0004-3702(96)00034-3","journal-title":"Artif Intell"},{"key":"1133_CR51","first-page":"228","volume":"17","author":"M Kandemir","year":"2014","unstructured":"Kandemir M, Zhang C, Hamprecht FA (2014) Empowering multiple instance histopathology cancer diagnosis by cell graphs. Med Image Comput Comput Assist Interv 17:228\u2013235","journal-title":"Med Image Comput Comput Assist Interv"},{"key":"1133_CR52","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1007\/s10489-005-5602-z","volume":"22","author":"ZH Zhou","year":"2005","unstructured":"Zhou ZH, Jiang K, Li M (2005) Multi-instance learning based web mining. Appl Intell 22:135\u2013147. https:\/\/doi.org\/10.1007\/s10489-005-5602-z","journal-title":"Appl Intell"},{"unstructured":"Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: 7th international conference on learning representations, ICLR 2019","key":"1133_CR53"},{"key":"1133_CR54","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1016\/j.ins.2018.08.020","volume":"467","author":"E \u015eeyma K\u00fc\u00e7\u00fcka\u015fc\u0131","year":"2018","unstructured":"\u015eeyma K\u00fc\u00e7\u00fcka\u015fc\u0131 E, G\u00f6k\u00e7e Baydo\u011fan M (2018) Bag encoding strategies in multiple instance learning problems. Inf Sci (Ny) 467:559\u2013578. https:\/\/doi.org\/10.1016\/j.ins.2018.08.020","journal-title":"Inf Sci (Ny)"},{"key":"1133_CR55","doi-asserted-by":"publisher","DOI":"10.1007\/s10898-021-01120-0","author":"E\u015e K\u00fc\u00e7\u00fcka\u015fc\u0131","year":"2022","unstructured":"K\u00fc\u00e7\u00fcka\u015fc\u0131 E\u015e, Baydo\u011fan MG, Ta\u015fk\u0131n ZC (2022) Multiple instance classification via quadratic programming. J Glob Optim. https:\/\/doi.org\/10.1007\/s10898-021-01120-0","journal-title":"J Glob Optim"},{"key":"1133_CR56","doi-asserted-by":"publisher","first-page":"1379","DOI":"10.1109\/TNNLS.2015.2424254","volume":"27","author":"V Cheplygina","year":"2016","unstructured":"Cheplygina V, Tax DMJ, Loog M (2016) Dissimilarity-Based Ensembles for Multiple Instance Learning. IEEE Trans Neural Netw Learn Syst 27:1379\u20131391. https:\/\/doi.org\/10.1109\/TNNLS.2015.2424254","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"1133_CR57","doi-asserted-by":"publisher","first-page":"1931","DOI":"10.1109\/TPAMI.2006.248","volume":"28","author":"Y Chen","year":"2006","unstructured":"Chen Y, Bi J, Wang JZ (2006) MILES: multiple-instance learning via embedded instance selection. IEEE Trans Pattern Anal Mach Intell 28:1931\u20131947. https:\/\/doi.org\/10.1109\/TPAMI.2006.248","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"unstructured":"Wang J, Zucker J-D (2000) Solving multiple-instance problem: a lazy learning approach. In: Proc 17th Int Conf Mach Learn, pp 1119\u20131125","key":"1133_CR58"},{"doi-asserted-by":"crossref","unstructured":"Edunov S, Ott M, Auli M, Grangier D (2018) Understanding backtranslation at scale. In: Proceedings of the 2018 conference on empirical methods in natural\nlanguage processing, EMNLP, pp 489\u2013500","key":"1133_CR59","DOI":"10.18653\/v1\/D18-1045"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01133-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-01133-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01133-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,27]],"date-time":"2023-10-27T19:08:19Z","timestamp":1698433699000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-01133-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,30]]},"references-count":59,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,12]]}},"alternative-id":["1133"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-01133-0","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"type":"print","value":"2199-4536"},{"type":"electronic","value":"2198-6053"}],"subject":[],"published":{"date-parts":[[2023,6,30]]},"assertion":[{"value":"14 November 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 May 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 June 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}