{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T12:38:55Z","timestamp":1775047135091,"version":"3.50.1"},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,1,24]],"date-time":"2021-01-24T00:00:00Z","timestamp":1611446400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,1,24]],"date-time":"2021-01-24T00:00:00Z","timestamp":1611446400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100010665","name":"H2020 Marie Sklodowska-Curie Actions","doi-asserted-by":"publisher","award":["799062"],"award-info":[{"award-number":["799062"]}],"id":[{"id":"10.13039\/100010665","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Comput Softw Big Sci"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The globally distributed computing infrastructure required to cope with the multi-petabyte datasets produced by the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) at CERN comprises several subsystems, such as workload management, data management, data transfers, and submission of users\u2019 and centrally managed production requests. To guarantee the efficient operation of the whole infrastructure, CMS monitors all subsystems according to their performance and status. Moreover, we track key metrics to evaluate and study the system performance over time. The CMS monitoring architecture allows both real-time and historical monitoring of a variety of data sources. It relies on scalable and open source solutions tailored to satisfy the experiment\u2019s monitoring needs. We present the monitoring data flow and software architecture for the CMS distributed computing applications. We discuss the challenges, components, current achievements, and future developments of the CMS monitoring infrastructure.<\/jats:p>","DOI":"10.1007\/s41781-020-00051-x","type":"journal-article","created":{"date-parts":[[2021,1,24]],"date-time":"2021-01-24T10:02:51Z","timestamp":1611482571000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["The CMS monitoring infrastructure and applications"],"prefix":"10.1007","volume":"5","author":[{"given":"Christian","family":"Ariza-Porras","sequence":"first","affiliation":[]},{"given":"Valentin","family":"Kuznetsov","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1400-0709","authenticated-orcid":false,"given":"Federica","family":"Legger","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,1,24]]},"reference":[{"key":"51_CR1","doi-asserted-by":"crossref","unstructured":"Collaboration CMS (2008) The CMS Experiment at the CERN LHC. JINST 3:S08004","DOI":"10.1088\/1748-0221\/3\/08\/S08004"},{"key":"51_CR2","unstructured":"Bird I et al (2014) Update of the Computing Models of the WLCG and the LHC Experiments, CERN-LHCC-2014-014, LCG-TDR-002"},{"key":"51_CR3","first-page":"2428","volume":"2","author":"I Sfiligoi","year":"2009","unstructured":"Sfiligoi I et al (2009) The pilot way to grid resources using Glidein WMS. Proc WRI World Congress Comput Sci Inf Eng 2:2428\u2013432","journal-title":"Proc WRI World Congress Comput Sci Inf Eng"},{"issue":"2\u20134","key":"51_CR4","doi-asserted-by":"publisher","first-page":"323","DOI":"10.1002\/cpe.938","volume":"17","author":"D Thain","year":"2005","unstructured":"Thain D, Tannenbaum T, Livny M (2005) Distributed computing in practice: the condor experience. Concur Comput Pract Exp 17(2\u20134):323\u2013356","journal-title":"Concur Comput Pract Exp"},{"key":"51_CR5","doi-asserted-by":"crossref","unstructured":"Ivanov T et al (2019) Improving efficiency of analysis jobs in CMS. EPJ Web Conf 03006","DOI":"10.1051\/epjconf\/201921403006"},{"key":"51_CR6","doi-asserted-by":"crossref","unstructured":"Giffels M, Guo Y, Kuznetsov V, Magini N, Wildish T (2014) The CMS data management system. J Phys Conf Ser Vol 513, Issue 4","DOI":"10.1088\/1742-6596\/513\/4\/042052"},{"key":"51_CR7","unstructured":"Apache Hadoop, http:\/\/hadoop.apache.org"},{"key":"51_CR8","unstructured":"InfluxDB, https:\/\/www.influxdata.com\/time-series-platform\/influxdb\/"},{"key":"51_CR9","unstructured":"Elasticsearch, http:\/\/elastic.co"},{"key":"51_CR10","unstructured":"Kibana, https:\/\/www.elastic.co\/products\/kibana"},{"key":"51_CR11","unstructured":"Grafana, http:\/\/grafana.org"},{"key":"51_CR12","unstructured":"VictoriaMetrics, https:\/\/victoriametrics.com\/"},{"key":"51_CR13","unstructured":"NATS https:\/\/nats.io\/"},{"key":"51_CR14","unstructured":"Prometheus, https:\/\/prometheus.io\/"},{"key":"51_CR15","unstructured":"ATLAS Collaboration, The ATLAS Experiment at the CERN Large Hadron Collider, JINST 3 S08003 (2008)"},{"key":"51_CR16","doi-asserted-by":"publisher","first-page":"032017","DOI":"10.1088\/1742-6596\/331\/3\/032017","volume":"331","author":"D Hufnagel","year":"2011","unstructured":"Hufnagel D et al (2011) The architecture and operation of the CMS Tier-0. J Phys Conf Ser 331:032017","journal-title":"J Phys Conf Ser"},{"key":"51_CR17","doi-asserted-by":"publisher","first-page":"323","DOI":"10.1007\/s10723-010-9148-x","volume":"8","author":"J Andreeva","year":"2010","unstructured":"Andreeva J et al (2010) Experiment dashboard for monitoring computing activities of the LHC virtual organizations. J Grid Comp 8:323\u2013339","journal-title":"J Grid Comp"},{"key":"51_CR18","doi-asserted-by":"publisher","first-page":"092033","DOI":"10.1088\/1742-6596\/898\/9\/092033","volume":"898","author":"A Aimar","year":"2017","unstructured":"Aimar A et al (2017) Unified monitoring architecture for IT and grid services. J Phys Conf Ser 898:092033. https:\/\/doi.org\/10.1088\/1742-6596\/898\/9\/092033","journal-title":"J Phys Conf Ser"},{"key":"51_CR19","unstructured":"Apache ActiveMQ, http:\/\/activemq.apache.org"},{"key":"51_CR20","unstructured":"Logstash, https:\/\/www.elastic.co\/products\/logstash"},{"key":"51_CR21","unstructured":"Apache Kafka, http:\/\/kafka.apache.org"},{"key":"51_CR22","unstructured":"Apache Spark, http:\/\/spark.apache.org"},{"key":"51_CR23","unstructured":"Apache Flume, https:\/\/flume.apache.org\/"},{"key":"51_CR24","unstructured":"Apache Avro, https:\/\/avro.apache.org\/"},{"key":"51_CR25","unstructured":"Apache Parquet, https:\/\/parquet.apache.org\/"},{"key":"51_CR26","unstructured":"JSON (JavaScript Object Notation), https:\/\/www.json.org"},{"key":"51_CR27","doi-asserted-by":"publisher","first-page":"1071","DOI":"10.1016\/j.future.2016.11.035","volume":"78","author":"D Piparo","year":"2018","unstructured":"Piparo D et al (2018) SWAN: a service for interactive analysis in the cloud. Fut Gen Comput Syst 78:1071\u20131078. https:\/\/doi.org\/10.1016\/j.future.2016.11.035","journal-title":"Fut Gen Comput Syst"},{"key":"51_CR28","unstructured":"Graphite, https:\/\/graphiteapp.org\/"},{"key":"51_CR29","unstructured":"Open TSDB, http:\/\/opentsdb.net\/"},{"key":"51_CR30","unstructured":"MySQL, https:\/\/www.mysql.com\/"},{"key":"51_CR31","unstructured":"Jupyter, https:\/\/jupyter.org"},{"key":"51_CR32","doi-asserted-by":"publisher","first-page":"08031","DOI":"10.1051\/epjconf\/201921408031","volume":"214","author":"A Aimar","year":"2019","unstructured":"Aimar A et al (2019) MONIT: monitoring the CERN data centres and the WLCG infrastructure. EPJ Web Conf 214:08031","journal-title":"EPJ Web Conf"},{"key":"51_CR33","unstructured":"Kubernetes, https:\/\/kubernetes.io\/"},{"key":"51_CR34","unstructured":"Prometheus AlertManager https:\/\/prometheus.io\/docs\/alerting\/alertmanager\/"},{"key":"51_CR35","unstructured":"VictoriaMetrics benchmarks, https:\/\/victoriametrics.github.io\/Articles.html"},{"key":"51_CR36","unstructured":"The spider repository, https:\/\/github.com\/dmwm\/cms-htcondor-es"},{"key":"51_CR37","unstructured":"CMSSpark framework, https:\/\/github.com\/dmwm\/CMSSpark"},{"key":"51_CR38","unstructured":"Apache Sqoop, https:\/\/sqoop.apache.org\/"},{"key":"51_CR39","doi-asserted-by":"crossref","unstructured":"Castro Le\u00f3n J (2019) Advanced features of the CERN OpenStack Cloud, EPJ Web Conf. 214 07026","DOI":"10.1051\/epjconf\/201921407026"},{"key":"51_CR40","unstructured":"Filebeat, https:\/\/www.elastic.co\/beats\/filebeat"},{"key":"51_CR41","unstructured":"Kube-eagle Prometheus exporter for Kubernetes clusters, https:\/\/github.com\/cloudworkz\/kube-eagle"},{"key":"51_CR42","unstructured":"Slack, https:\/\/slack.com"},{"key":"51_CR43","unstructured":"ServiceNow, https:\/\/www.servicenow.com\/"},{"key":"51_CR44","doi-asserted-by":"publisher","first-page":"052002","DOI":"10.1088\/1742-6596\/119\/5\/052002","volume":"119","author":"T Antoni","year":"2008","unstructured":"Antoni T, Buhler W, Dres H, Grein G, Roth M (2008) Global grid user support: building a worldwide distributed user support infrastructure. J Phys Conf Ser 119:052002","journal-title":"J Phys Conf Ser"},{"key":"51_CR45","unstructured":"Grafterm, https:\/\/github.com\/slok\/grafterm\/"},{"key":"51_CR46","doi-asserted-by":"publisher","first-page":"042003","DOI":"10.1088\/1742-6596\/219\/4\/042003","volume":"219","author":"P Buncic","year":"2010","unstructured":"Buncic P, Aguado Sanchez C, Blomer J, Franco L, Harutyunian A, Mato P, Yao Y (2010) CernVM: a virtual software appliance for LHC applications. J Phys Conf Ser 219:042003","journal-title":"J Phys Conf Ser"},{"key":"51_CR47","unstructured":"CMSMonitoring framework, https:\/\/github.com\/dmwm\/CMSMonitoring"},{"key":"51_CR48","unstructured":"MIT license, https:\/\/opensource.org\/licenses\/MIT"}],"container-title":["Computing and Software for Big Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41781-020-00051-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41781-020-00051-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41781-020-00051-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,12,26]],"date-time":"2021-12-26T11:04:15Z","timestamp":1640516655000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s41781-020-00051-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,24]]},"references-count":48,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["51"],"URL":"https:\/\/doi.org\/10.1007\/s41781-020-00051-x","relation":{},"ISSN":["2510-2036","2510-2044"],"issn-type":[{"value":"2510-2036","type":"print"},{"value":"2510-2044","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,24]]},"assertion":[{"value":"13 July 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 December 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 January 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"5"}}