{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T14:18:49Z","timestamp":1764857929744,"version":"3.46.0"},"reference-count":50,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,12,3]],"date-time":"2025-12-03T00:00:00Z","timestamp":1764720000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Science Foundation Grants","award":["CNS-2120350","III-2311598"],"award-info":[{"award-number":["CNS-2120350","III-2311598"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Contrastive learning improves model performance by differentiating between positive and negative sample pairs. However, its application is primarily confined to classification tasks, facing challenges with complex recognition tasks such as object detection and segmentation due to its limited capacity to capture spatial relationships and fine-grained features. To address this limitation, we propose LossTransform, a novel approach that redefines positive sample pairs and establishes a novel contrastive loss paradigm. LossTransform advances contrastive learning to the instance level, departing from the traditional sample level. Empirical evaluations on ImageNet, CIFAR, and object detection benchmarks indicate that LossTransform improves accuracy by +2.73% on CIFAR, +2.52% on ImageNet, and up to +5.2% in average precision on detection tasks, while maintaining efficiency. These results illustrate that LossTransform is compatible with large-scale training pipelines and exhibits robust performance across diverse and complex datasets. By optimizing model performance and significantly reducing training time, this research enables more efficient and accessible solutions for societal applications.<\/jats:p>","DOI":"10.3390\/info16121068","type":"journal-article","created":{"date-parts":[[2025,12,3]],"date-time":"2025-12-03T15:02:48Z","timestamp":1764774168000},"page":"1068","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["LossTransform: Reformulating the Loss Function for Contrastive Learning"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-2111-4047","authenticated-orcid":false,"given":"Zheng","family":"Li","sequence":"first","affiliation":[{"name":"Department of Computer Science, New York Institute of Technology, New York, NY 10023, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3968-9699","authenticated-orcid":false,"given":"Jerry","family":"Cheng","sequence":"additional","affiliation":[{"name":"Department of Computer Science, New York Institute of Technology, New York, NY 10023, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-9510-6696","authenticated-orcid":false,"given":"Huanying Helen","family":"Gu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, New York Institute of Technology, New York, NY 10023, USA"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,3]]},"reference":[{"key":"ref_1","unstructured":"Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020, January 13\u201318). A simple framework for contrastive learning of visual representations. Proceedings of the International Conference on Machine Learning PMLR, Virtual."},{"key":"ref_2","first-page":"18661","article-title":"Supervised contrastive learning","volume":"33","author":"Khosla","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_3","unstructured":"Krizhevsky, A. (2025, November 29). Learning Multiple Layers of Features from Tiny Images; Technical Report, University of Toronto. Available online: https:\/\/www.cs.toronto.edu\/~kriz\/cifar.html."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_5","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster r-cnn: Towards real-time object detection with region proposal networks. Proceedings of the 29th International Conference on Neural Information Processing Systems\u2014Volume 1, Montreal, QC, Canada."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_7","unstructured":"Wang, L., Shi, J., Song, G., and Shen, I.-f. (2024, August 09). Penn-Fudan Database for Pedestrian Detection and Tracking. Available online: https:\/\/www.cis.upenn.edu\/~jshi\/ped_html\/."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The pascal visual object classes (voc) challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the Computer Vision\u2013ECCV 2014: 13th European Conference, Zurich, Switzerland. Proceedings, Part V 13.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1007\/s11704-019-8208-z","article-title":"A survey on ensemble learning","volume":"14","author":"Dong","year":"2020","journal-title":"Front. Comput. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Shanmugam, D., Blalock, D., Balakrishnan, G., and Guttag, J. (2021, January 10\u201317). Better aggregation in test-time augmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00125"},{"key":"ref_12","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1789","DOI":"10.1007\/s11263-021-01453-z","article-title":"Knowledge distillation: A survey","volume":"129","author":"Gou","year":"2021","journal-title":"Int. J. Comput. Vis."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Galstyan, A., and Cohen, P.R. (2007, January 19\u201321). Empirical comparison of \u201chard\u201d and \u201csoft\u201d label propagation for relational classification. Proceedings of the International Conference on Inductive Logic Programming, Corvallis, OR, USA.","DOI":"10.1007\/978-3-540-78469-2_13"},{"key":"ref_17","unstructured":"Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. arXiv."},{"key":"ref_18","first-page":"4163","article-title":"Learning loss for test-time augmentation","volume":"33","author":"Kim","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_19","unstructured":"Zbontar, J., Jing, L., Misra, I., LeCun, Y., and Deny, S. (2021, January 18\u201324). Barlow twins: Self-supervised learning via redundancy reduction. Proceedings of the International Conference on Machine Learning PMLR, Virtual."},{"key":"ref_20","unstructured":"Athiwaratkun, B., Finzi, M., Izmailov, P., and Wilson, A.G. (2018). There are many consistent explanations of unlabeled data: Why you should average. arXiv."},{"key":"ref_21","unstructured":"Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and J\u00e9gou, H. (2021, January 18\u201324). Training data-efficient image transformers & distillation through attention. Proceedings of the International Conference on Machine Learning PMLR, Virtual."},{"key":"ref_22","unstructured":"Balasubramanian, R., and Rathore, K. (2022). Contrastive learning for object detection. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Xie, E., Ding, J., Wang, W., Zhan, X., Xu, H., Sun, P., Li, Z., and Luo, P. (2021, January 10\u201317). Detco: Unsupervised contrastive learning for object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00828"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Wu, W., Chang, H., Zheng, Y., Li, Z., Chen, Z., and Zhang, Z. (2022, January 18\u201324). Contrastive learning-based robust object detection under smoky conditions. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPRW56347.2022.00475"},{"key":"ref_25","unstructured":"Terven, J., Cordova-Esparza, D.M., Ramirez-Pedraza, A., Chavez-Urbiola, E.A., and Romero-Gonzalez, J.A. (2023). Loss functions and metrics in deep learning. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Cubuk, E.D., Zoph, B., Shlens, J., and Le, Q.V. (2020, January 14\u201319). Randaugment: Practical automated data augmentation with a reduced search space. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00359"},{"key":"ref_27","unstructured":"DeVries, T., and Taylor, G.W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv."},{"key":"ref_28","unstructured":"Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., and Yoo, Y. (November, January 27). Cutmix: Regularization strategy to train strong classifiers with localizable features. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization. arXiv.","DOI":"10.1007\/978-1-4899-7687-1_79"},{"key":"ref_30","unstructured":"Liu, W., Wen, Y., Yu, Z., and Yang, M. (2016). Large-margin softmax loss for convolutional neural networks. arXiv."},{"key":"ref_31","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Zoph, B., Vasudevan, V., Shlens, J., and Le, Q.V. (2018, January 18\u201323). Learning transferable architectures for scalable image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00907"},{"key":"ref_33","unstructured":"Tan, M., and Le, Q. (2019, January 9\u201315). Efficientnet: Rethinking model scaling for convolutional neural networks. Proceedings of the International Conference on Machine Learning PMLR, Long Beach, CA, USA."},{"key":"ref_34","unstructured":"Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, D., Chen, M., Lee, H., Ngiam, J., Le, Q.V., and Wu, Y. (2019, January 8\u201314). Gpipe: Efficient training of giant neural networks using pipeline parallelism. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zagoruyko, S., and Komodakis, N. (2016). Wide residual networks. arXiv.","DOI":"10.5244\/C.30.87"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., and Xie, S. (2022, January 3\u20137). A convnet for the 2020s. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Denver, CO, USA.","DOI":"10.1109\/CVPR52688.2022.01167"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016, January 27\u201330). Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.308"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., and Le, Q.V. (2019, January 15\u201320). Mnasnet: Platform-aware neural architecture search for mobile. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00293"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Zhang, X., Zhou, X., Lin, M., and Sun, J. (2018, January 18\u201323). Shufflenet: An extremely efficient convolutional neural network for mobile devices. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00716"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1016\/0304-3975(94)00262-2","article-title":"Two linear time union-find strategies for image processing","volume":"154","author":"Fiorio","year":"1996","journal-title":"Theor. Comput. Sci."},{"key":"ref_43","unstructured":"TorchVision Contributors (2024, August 09). TorchVision Models. Available online: https:\/\/pytorch.org\/vision\/master\/models.html."},{"key":"ref_44","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Chen, X., Xie, S., and He, K. (2021, January 10\u201317). An empirical study of training self-supervised vision transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00950"},{"key":"ref_46","unstructured":"Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L.M., and Shum, H.Y. (2022). Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv."},{"key":"ref_47","unstructured":"Sheng, G., Zhang, C., Ye, Z., Wu, X., Zhang, W., Zhang, R., Peng, Y., Lin, H., and Wu, C. (April, January 30). Hybridflow: A flexible and efficient rlhf framework. Proceedings of the Twentieth European Conference on Computer Systems, Rotterdam, The Netherlands."},{"key":"ref_48","unstructured":"Team, G., Mesnard, T., Hardin, C., Dadashi, R., Bhupatiraju, S., Pathak, S., Sifre, L., Rivi\u00e8re, M., Kale, M.S., and Love, J. (2024). Gemma: Open models based on gemini research and technology. arXiv."},{"key":"ref_49","unstructured":"Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., Fan, Y., Ge, W., Han, Y., and Huang, F. (2023). Qwen technical report. arXiv."},{"key":"ref_50","unstructured":"Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., Zhao, C., Deng, C., Zhang, C., and Ruan, C. (2024). Deepseek-v3 technical report. arXiv."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/12\/1068\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T14:13:52Z","timestamp":1764857632000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/12\/1068"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,3]]},"references-count":50,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["info16121068"],"URL":"https:\/\/doi.org\/10.3390\/info16121068","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2025,12,3]]}}}