{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T02:26:28Z","timestamp":1771467988727,"version":"3.50.1"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,8,2]],"date-time":"2021-08-02T00:00:00Z","timestamp":1627862400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,8,2]],"date-time":"2021-08-02T00:00:00Z","timestamp":1627862400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100014718","name":"Innovative Research Group Project of the National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61802394, U1836209"],"award-info":[{"award-number":["61802394, U1836209"]}],"id":[{"id":"10.13039\/100014718","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62032010"],"award-info":[{"award-number":["62032010"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Cybersecur"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Codes of Open Source Software (OSS) are widely reused during software development nowadays. However, reusing some specific versions of OSS introduces 1-day vulnerabilities of which details are publicly available, which may be exploited and lead to serious security issues. Existing state-of-the-art OSS reuse detection work can not identify the specific versions of reused OSS well. The features they selected are not distinguishable enough for version detection and the matching scores are only based on similarity.This paper presents B2SMatcher, a fine-grained version identification tool for OSS in commercial off-the-shelf (COTS) software. We first discuss five kinds of version-sensitive code features that are trackable in both binary and source code. We categorize these features into program-level features and function-level features and propose a two-stage version identification approach based on the two levels of code features. B2SMatcher also identifies different types of OSS version reuse based on matching scores and matched feature instances. In order to extract source code features as accurately as possible, B2SMatcher innovatively uses machine learning methods to obtain the source files involved in the compilation and uses function abstraction and normalization methods to eliminate the comparison costs on redundant functions across versions. We have evaluated B2SMatcher using 6351 candidate OSS versions and 585 binaries. The result shows that B2SMatcher achieves a high precision up to 89.2% and outperforms state-of-the-art tools. Finally, we show how B2SMatcher can be used to evaluate real-world software and find some security risks in practice.<\/jats:p>","DOI":"10.1186\/s42400-021-00085-7","type":"journal-article","created":{"date-parts":[[2021,8,1]],"date-time":"2021-08-01T23:03:25Z","timestamp":1627859005000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["B2SMatcher: fine-Grained version identification of open-Source software in binary files"],"prefix":"10.1186","volume":"4","author":[{"given":"Gu","family":"Ban","sequence":"first","affiliation":[]},{"given":"Lili","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Yang","family":"Xiao","sequence":"additional","affiliation":[]},{"given":"Xinhua","family":"Li","sequence":"additional","affiliation":[]},{"given":"Zimu","family":"Yuan","sequence":"additional","affiliation":[]},{"given":"Wei","family":"Huo","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,8,2]]},"reference":[{"key":"85_CR1","unstructured":"2020 Open Source Security and Risk Analysis Report (2020). https:\/\/www.synopsys.com\/zh-cn\/software-integrity\/resources\/reports\/2020-open-source-security-risk-analysis.html. Accessed 10 Apr 2021."},{"key":"85_CR2","doi-asserted-by":"publisher","first-page":"516","DOI":"10.1109\/SANER.2015.7081868","volume-title":"Proceedings of 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)","author":"M Cadariu","year":"2015","unstructured":"Cadariu, M, Bouwers E, Visser J, van Deursen A (2015) Tracking known security vulnerabilities in proprietary software systems In: Proceedings of 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), 516\u2013519.. Software Analysis, Evolution, and Reengineering, New York."},{"key":"85_CR3","doi-asserted-by":"publisher","first-page":"678","DOI":"10.1145\/2950290.2950350","volume-title":"Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering","author":"M Chandramohan","year":"2016","unstructured":"Chandramohan, M, Xue Y, Xu Z, Liu Y, Cho CY, Tan HBK (2016) Bingo: Cross-architecture cross-os binary search In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 678\u2013689.. USENIX Association, Kyoto."},{"key":"85_CR4","unstructured":"CVEDetails (2020) Free CVE security vulnerblity database source. https:\/\/www.cvedetails.com\/. Accessed 10 Apr 2021."},{"key":"85_CR5","doi-asserted-by":"crossref","unstructured":"Cybellum (2020) Uncover the Software Components Inside Your Vehicles and Identify All Vulnerabilities. https:\/\/cybellum.com\/. Accessed 10 Apr 2021.","DOI":"10.2307\/j.ctv10crcbg.5"},{"key":"85_CR6","unstructured":"Decision tree (2020). https:\/\/en.wikipedia.org\/wiki\/Decision_tree. Accessed 10 Apr 2021."},{"key":"85_CR7","unstructured":"Detailed datasets used in this paper (2020). https:\/\/github.com\/summerban\/B2SMatcher-cybersecurity. Accessed 10 Apr 2021."},{"key":"85_CR8","doi-asserted-by":"publisher","first-page":"472","DOI":"10.1109\/SP.2019.00003","volume-title":"Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP)","author":"SH Ding","year":"2019","unstructured":"Ding, SH, Fung BC, Charland P (2019) Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization In: Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), 472\u2013489.. Springer, Kyoto."},{"key":"85_CR9","doi-asserted-by":"crossref","unstructured":"Duan, R, Bijlani A, Ji Y, Alrawi O, Xiong Y, Ike M, Saltaformaggio B, Lee W (2019) Automating patching of vulnerable open-source software versions in application binaries In: Proceedings of the 2019 Annual Network and Distributed System Security Symposium (NDSS).","DOI":"10.14722\/ndss.2019.23126"},{"key":"85_CR10","doi-asserted-by":"crossref","unstructured":"Duan, R, Bijlani A, Xu M, Kim T, Lee W (2017) Identifying open-source license violation and 1-day security risk at large scale In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2169\u20132185.. ACM.","DOI":"10.1145\/3133956.3134048"},{"key":"85_CR11","volume-title":"Proceedings of the 27th Annual Network and Distributed System Security Symposium (NDSS\u201920)","author":"Y Duan","year":"2020","unstructured":"Duan, Y, Li X, Wang J, Yin H (2020) Deepbindiff: Learning program-wide code representations for binary diffing In: Proceedings of the 27th Annual Network and Distributed System Security Symposium (NDSS\u201920).. Springer, Shanghai."},{"key":"85_CR12","volume-title":"Proceedings of the 2016 Annual Network and Distributed System Security Symposium (NDSS)","author":"S Eschweiler","year":"2016","unstructured":"Eschweiler, S, Yakdan K, Gerhards-Padilla E (2016) discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code In: Proceedings of the 2016 Annual Network and Distributed System Security Symposium (NDSS).. The Internet Society, London."},{"key":"85_CR13","unstructured":"Euclidean Distance (2020). https:\/\/en.wikipedia.org\/wiki\/Euclidean_distance. Accessed 10 Apr 2021."},{"key":"85_CR14","doi-asserted-by":"publisher","first-page":"480","DOI":"10.1145\/2976749.2978370","volume-title":"Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security","author":"Q Feng","year":"2016","unstructured":"Feng, Q, Zhou R, Xu C, Cheng Y, Testa B, Yin H (2016) Scalable graph-based bug search for firmware images In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 480\u2013491.. Springer, Beijing."},{"key":"85_CR15","doi-asserted-by":"publisher","first-page":"238","DOI":"10.1007\/978-3-540-88625-9_16","volume-title":"Proceedings of the International Conference on Information and Communications Security","author":"D Gao","year":"2008","unstructured":"Gao, D, Reiter MK, Song D (2008) Binhunt: Automatically finding semantic differences in binary programs In: Proceedings of the International Conference on Information and Communications Security, 238\u2013255.. The Internet Society, London."},{"key":"85_CR16","unstructured":"GitHub (2020) Where the World Builds Software. https:\/\/github.com\/. Accessed 10 Apr 2021."},{"key":"85_CR17","unstructured":"Heartbleed (2020). https:\/\/en.wikipedia.org\/wiki\/Heartbleed. Accessed 10 Apr 2021."},{"key":"85_CR18","doi-asserted-by":"crossref","unstructured":"Hemel, A, Kalleberg KT, Vermaas R, Dolstra E (2011) Finding software license violations through binary code clone detection In: Proceedings of the 8th Working Conference on Mining Software Repositories, 63\u201372.","DOI":"10.1145\/1985441.1985453"},{"key":"85_CR19","unstructured":"IDAPython (2020). https:\/\/www.hex-rays.com\/products\/ida\/support\/idapython_docs\/. Accessed 10 Apr 2021."},{"key":"85_CR20","unstructured":"K-means Clustering (2020). https:\/\/en.wikipedia.org\/wiki\/K-means_clustering. Accessed 10 Apr 2021."},{"issue":"7","key":"85_CR21","doi-asserted-by":"publisher","first-page":"654","DOI":"10.1109\/TSE.2002.1019480","volume":"28","author":"T Kamiya","year":"2002","unstructured":"Kamiya, T, Kusumoto S, Inoue K (2002) Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654\u2013670.","journal-title":"IEEE Trans Softw Eng"},{"key":"85_CR22","unstructured":"Karta (2020). https:\/\/github.com\/CheckPointSW\/Karta. Accessed 10 Apr 2021."},{"key":"85_CR23","first-page":"595","volume-title":"Proceedings of the 38th IEEE Symposium on Security and Privacy (Oakland)","author":"S Kim","year":"2017","unstructured":"Kim, S, Woo S, Lee H, Oh H (2017) Vuddy: A scalable approach for vulnerable code clone discovery In: Proceedings of the 38th IEEE Symposium on Security and Privacy (Oakland), 595\u2013614.. IEEE, San Jose."},{"issue":"3","key":"85_CR24","doi-asserted-by":"publisher","first-page":"176","DOI":"10.1109\/TSE.2006.28","volume":"32","author":"Z Li","year":"2006","unstructured":"Li, Z, Lu S, Myagmar S, Zhou Y (2006) Cp-miner: Finding copy-paste and related bugs in large-scale software code. IEEE Trans Softw Eng 32(3):176\u2013192.","journal-title":"IEEE Trans Softw Eng"},{"key":"85_CR25","doi-asserted-by":"crossref","unstructured":"Li, M, Wang W, Wang P, Wang S, Wu D, Liu J, Xue R, Huo W (2017) Libd: scalable and precise third-party library detection in android markets In: 2017 IEEE\/ACM 39th International Conference on Software Engineering (ICSE), 335\u2013346.. IEEE.","DOI":"10.1109\/ICSE.2017.38"},{"key":"85_CR26","first-page":"266","volume":"13","author":"Z Li","year":"2018","unstructured":"Li, Z, Zou D, Xu S, Ou X, Jin H, Wang S, Deng Z, Zhong Y (2018) Vuldeepecker: A deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681 13:266\u2013267.","journal-title":"arXiv preprint arXiv:1801.01681"},{"key":"85_CR27","unstructured":"LibreOffice (2020) A Free and Open-source Office Suite, a Project of The Document Foundation. https:\/\/en.wikipedia.org\/wiki\/LibreOffice. Accessed 10 Apr 2021."},{"key":"85_CR28","doi-asserted-by":"publisher","first-page":"667","DOI":"10.1145\/3238147.3238199","volume-title":"Proceedings of the 33rd ACM\/IEEE International Conference on Automated Software Engineering","author":"B Liu","year":"2018","unstructured":"Liu, B, Huo W, Zhang C, Li W, Li F, Piao A, Zou W (2018) \u03b1diff: cross-version binary code similarity detection with dnn In: Proceedings of the 33rd ACM\/IEEE International Conference on Automated Software Engineering, 667\u2013678.. The Internet Society, San Diego."},{"key":"85_CR29","unstructured":"MATLAB (2020). https:\/\/en.wikipedia.org\/wiki\/MATLAB. Accessed 10 Apr 2021."},{"key":"85_CR30","first-page":"149","volume":"Suppl 3","author":"D Miyani","year":"2017","unstructured":"Miyani, D, Huang Z, Lie D (2017) Binpro: A tool for binary source code provenance. arXiv preprint arXiv:1711.00830 Suppl 3:149\u2013170.","journal-title":"arXiv preprint arXiv:1711.00830"},{"key":"85_CR31","unstructured":"One-hot Embedding (2020). https:\/\/en.wikipedia.org\/wiki\/One-hot. Accessed 10 Apr 2021."},{"key":"85_CR32","unstructured":"OWASP Top 10 Application Security Risks (2020). https:\/\/www.owasp.org\/index.php\/Category:OWASP_Top_Ten_Project. Accessed 10 Apr 2021."},{"issue":"7","key":"85_CR33","doi-asserted-by":"publisher","first-page":"789","DOI":"10.1002\/spe.4380250705","volume":"25","author":"TJ Parr","year":"1995","unstructured":"Parr, TJ, Quong RW (1995) Antlr: A predicated-ll (k) parser generator. Softw Pract Experience 25(7):789\u2013810.","journal-title":"Softw Pract Experience"},{"key":"85_CR34","volume-title":"Proceedings of the 36th IEEE Symposium on Security and Privacy (S&P)","author":"J Pewny","year":"2015","unstructured":"Pewny, J, Garmany B, Gawlik R, Rossow C, Holz T (2015) Cross-architecture bug search in binary executables In: Proceedings of the 36th IEEE Symposium on Security and Privacy (S&P).. Springer, Kyoto."},{"key":"85_CR35","doi-asserted-by":"publisher","first-page":"449","DOI":"10.1109\/ICSME.2018.00054","volume-title":"Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)","author":"SE Ponta","year":"2018","unstructured":"Ponta, SE, Plate H, Sabetta A (2018) Beyond metadata: Code-centric and usage-based analysis of known vulnerabilities in open-source software In: Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), 449\u2013460.. International Conference on Software Maintenance and Evolution, New York."},{"key":"85_CR36","unstructured":"Repo Statistics on Github (2020). https:\/\/octoverse.github.com. Accessed 10 Apr 2021."},{"key":"85_CR37","unstructured":"Shahkar, A (2016) On matching binary to source code. PhD thesis, Concordia University."},{"key":"85_CR38","unstructured":"Storm Codec 7 (2020) A Video Codec Pack. https:\/\/storm-codec-7.en.uptodown.com\/windows. Accessed 10 Apr 2021."},{"key":"85_CR39","unstructured":"t-Distributed Stochastic Neighbor Embedding (2020). https:\/\/lvdmaaten.github.io\/tsne\/. Accessed 10 Apr 2021."},{"key":"85_CR40","first-page":"104","volume-title":"Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER)","author":"W Tang","year":"2020","unstructured":"Tang, W, Luo P, Fu J, Zhang D (2020) Libdx: A cross-platform and accurate system to detect third-party libraries in binary code In: Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), 104\u2013115.. Software Analysis, Evolution, and Reengineering, New York."},{"key":"85_CR41","unstructured":"TeamViewer (2020). https:\/\/www.teamviewer.com\/en\/. Accessed 10 Apr 2021."},{"key":"85_CR42","unstructured":"Tencent Software Download Official Version (2020). https:\/\/pc.qq.com\/. Accessed 10 Apr 2021."},{"key":"85_CR43","unstructured":"VMware Workstation Pro (2020). https:\/\/en.wikipedia.org\/wiki\/VMware_Workstation. Accessed 10 Apr 2021."},{"key":"85_CR44","first-page":"149","volume":"Suppl 3","author":"Y Wang","year":"2020","unstructured":"Wang, Y, Chen B, Huang K, Shi B, Xu C, Peng X, Liu Y, Wu Y (2020) An empirical study of usages, updates and risks of third-party libraries in java projects. arXiv preprint arXiv:2002.11028 Suppl 3:149\u2013170.","journal-title":"arXiv preprint arXiv:2002.11028"},{"key":"85_CR45","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1109\/ACSAC.2009.24","volume-title":"Proceedings of the 2009 Annual Computer Security Applications Conference","author":"X Wang","year":"2009","unstructured":"Wang, X, Jhi Y-C, Zhu S, Liu P (2009) Detecting software theft via system call based birthmarks In: Proceedings of the 2009 Annual Computer Security Applications Conference, 149\u2013158.. The Internet Society, San Diego."},{"key":"85_CR46","doi-asserted-by":"publisher","first-page":"363","DOI":"10.1145\/3133956.3134018","volume-title":"Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security","author":"X Xu","year":"2017","unstructured":"Xu, X, Liu C, Feng Q, Yin H, Song L, Song D (2017) Neural network-based graph embedding for cross-platform binary code similarity detection In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 363\u2013376.. Springer, Beijing."},{"key":"85_CR47","first-page":"1145","volume":"34","author":"Z Yu","year":"2020","unstructured":"Yu, Z, Cao R, Tang Q, Nie S, Huang J, Wu S (2020) Order matters: Semantic-aware neural networks for binary code similarity detection. Proc AAAI Conf Artif Intell 34:1145\u20131152.","journal-title":"Proc AAAI Conf Artif Intell"},{"key":"85_CR48","doi-asserted-by":"crossref","unstructured":"Yuan, Z, Feng M, Li F, Ban G, Xiao Y, Wang S, Tang Q, Su H, Yu C, Xu J, et al. (2019) B2sfinder: detecting open-source software reuse in cots software In: Proceedings of the 2019 34th IEEE\/ACM International Conference on Automated Software Engineering (ASE), 1038\u20131049.. IEEE.","DOI":"10.1109\/ASE.2019.00100"},{"key":"85_CR49","first-page":"887","volume-title":"Proceedings of the 27th USENIX Security Symposium (Security)","author":"H Zhang","year":"2018","unstructured":"Zhang, H, Qian Z (2018) Precise and accurate patch presence test for binaries In: Proceedings of the 27th USENIX Security Symposium (Security), 887\u2013902.. Springer, Oakland."},{"key":"85_CR50","unstructured":"Zoom (2020). https:\/\/zoom.us\/. Accessed 10 Apr 2021."}],"container-title":["Cybersecurity"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s42400-021-00085-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s42400-021-00085-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s42400-021-00085-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,8,1]],"date-time":"2021-08-01T23:18:43Z","timestamp":1627859923000},"score":1,"resource":{"primary":{"URL":"https:\/\/cybersecurity.springeropen.com\/articles\/10.1186\/s42400-021-00085-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,2]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["85"],"URL":"https:\/\/doi.org\/10.1186\/s42400-021-00085-7","relation":{},"ISSN":["2523-3246"],"issn-type":[{"value":"2523-3246","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,8,2]]},"assertion":[{"value":"25 January 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 March 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 August 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"21"}}