{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:16:56Z","timestamp":1760242616427,"version":"build-2065373602"},"reference-count":22,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2017,12,13]],"date-time":"2017-12-13T00:00:00Z","timestamp":1513123200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>To ensure seamless, programmatic access to data for High Performance Computing (HPC) and analysis across multiple research domains, it is vital to have a methodology for standardization of both data and services. At the Australian National Computational Infrastructure (NCI) we have developed a Data Quality Strategy (DQS) that currently provides processes for: (1) Consistency of data structures needed for a High Performance Data (HPD) platform; (2) Quality Control (QC) through compliance with recognized community standards; (3) Benchmarking cases of operational performance tests; and (4) Quality Assurance (QA) of data through demonstrated functionality and performance across common platforms, tools and services. By implementing the NCI DQS, we have seen progressive improvement in the quality and usefulness of the datasets across the different subject domains, and demonstrated the ease by which modern programmatic methods can be used to access the data, either in situ or via web services, and for uses ranging from traditional analysis methods through to emerging machine learning techniques. To help increase data re-usability by broader communities, particularly in high performance environments, the DQS is also used to identify the need for any extensions to the relevant international standards for interoperability and\/or programmatic access.<\/jats:p>","DOI":"10.3390\/informatics4040045","type":"journal-article","created":{"date-parts":[[2017,12,14]],"date-time":"2017-12-14T04:30:55Z","timestamp":1513225855000},"page":"45","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["A Data Quality Strategy to Enable FAIR, Programmatic Access across Large, Diverse Data Collections for High Performance Data Analysis"],"prefix":"10.3390","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6719-2671","authenticated-orcid":false,"given":"Ben","family":"Evans","sequence":"first","affiliation":[{"name":"National Computational Infrastructure, the Australian National University, Acton 2601, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kelsey","family":"Druken","sequence":"additional","affiliation":[{"name":"National Computational Infrastructure, the Australian National University, Acton 2601, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jingbo","family":"Wang","sequence":"additional","affiliation":[{"name":"National Computational Infrastructure, the Australian National University, Acton 2601, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rui","family":"Yang","sequence":"additional","affiliation":[{"name":"National Computational Infrastructure, the Australian National University, Acton 2601, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Clare","family":"Richards","sequence":"additional","affiliation":[{"name":"National Computational Infrastructure, the Australian National University, Acton 2601, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5976-4943","authenticated-orcid":false,"given":"Lesley","family":"Wyborn","sequence":"additional","affiliation":[{"name":"National Computational Infrastructure, the Australian National University, Acton 2601, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2017,12,13]]},"reference":[{"key":"ref_1","unstructured":"Wang, J., Evans, B., Bastrakova, I., Ryder, G., Martin, J., Duursma, D., Gohar, K., Mackey, T., Paget, M., and Siddeswara, G. (2014, January 13\u201317). Large-Scale Data Collection Metadata Management at the National Computation Infrastructure. Proceedings of the American Geophysical Union Fall Meeting, San Francisco, CA, USA."},{"key":"ref_2","unstructured":"(2017, August 23). The FAIR Data Principles. Available online: https:\/\/www.force11.org\/group\/fairgroup\/fairprinciples."},{"key":"ref_3","unstructured":"Evans, B., Wyborn, L., Druken, K., Richards, C., Trenham, C., and Wang, J. (2016, January 15\u201319). Extending the Common Framework for Earth Observation Data to other Disciplinary Data and Programmatic Access. Proceedings of the American Geophysical Union Fall Meeting, San Francisco, CA, USA."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Ramapriyan, H., Peng, G., Moroni, D., and Shie, C.L. (2017, October 18). Ensuring and Improving Information Quality for Earth Science Data and Products. D-Lib Magazine. Volume 23, No. 7\/8. Available online: https:\/\/doi.org\/10.1045\/july2017-ramapriyan.","DOI":"10.1045\/july2017-ramapriyan"},{"key":"ref_5","unstructured":"Atkin, B., and Brooks, A. (2005). Chapter 8: Service Specifications, Service Level Agreements and Performance. Total Facilities Management, Blackwell Publishing Ltd.. [2nd ed.]."},{"key":"ref_6","unstructured":"(2017, October 24). The CoreTrustSeal. Available online: https:\/\/www.coretrustseal.org\/why-certification\/requirements\/."},{"key":"ref_7","unstructured":"Stall, S. (2017, October 23). AGU\u2019s Data Management Maturity Model. Abstracts SciDataCon 2016. Available online: http:\/\/www.scidatacon.org\/2016\/sessions\/100\/paper\/278\/."},{"key":"ref_8","unstructured":"Stall, S., Hanson, B., and Wyborn, L. (2017, October 23). The American Geophysical Union Data Management Maturity Program. Abstracts for eResearch Australasia 2016. Available online: https:\/\/eresearchau.files.wordpress.com\/2016\/03\/eresau2016_paper_72.pdf."},{"key":"ref_9","unstructured":"(2014). Data Management Maturity Model, CMMI\u00ae Institute."},{"key":"ref_10","unstructured":"(2017, August 23). NCI\u2019s Data Catalogue Websites. Available online: https:\/\/datacatalogue.nci.org.au\/ and https:\/\/geonetwork.nci.org.au."},{"key":"ref_11","unstructured":"(2017, August 23). CMIP5 Data Reference Syntax, Available online: http:\/\/cmip-pcmdi.llnl.gov\/cmip5\/docs\/cmip5_data_reference_syntax.pdf."},{"key":"ref_12","unstructured":"(2017, August 23). NASA Landsat File Name Convention, Available online: https:\/\/landsat.usgs.gov\/what-are-naming-conventions-landsat-scene-identifiers."},{"key":"ref_13","unstructured":"(2016, May 25). ISO 2015 ISO19115-1:2014. Geographic Information\u2014Metadata\u2014Part 1: Fundamentals. Standards document. International Organization for Standardization, Geneva. Available online: http:\/\/www.iso.org\/iso\/home\/store\/catalogue_tc\/catalogue_detail.htm?csnumber=53798."},{"key":"ref_14","unstructured":"(2017, August 23). NASA Glossary, Available online: https:\/\/earthdata.nasa.gov\/user-resources\/glossary#ed-glossary-g."},{"key":"ref_15","unstructured":"(2017, August 23). NetCDF Climate and Forecast Metadata Conventions. Available online: http:\/\/cfconventions.org."},{"key":"ref_16","unstructured":"(2017, August 23). Attribute Convention for Data Discovery 1.3. Available online: http:\/\/wiki.esipfed.org\/index.php\/Attribute_Convention_for_Data_Discovery_(ACDD)."},{"key":"ref_17","unstructured":"(2017, November 22). IOOS Compliance Checker. Available online: https:\/\/github.com\/ioos\/compliance-checker."},{"key":"ref_18","unstructured":"Wang, J., Yang, R., and Evans, B.J.E. (2017, October 24). Improving Seismic Data Accessibility and Performance Using HDF Containers. Abstracts AGU 2017 Fall Meeting. Available online: https:\/\/agu.confex.com\/agu\/fm17\/meetingapp.cgi\/Paper\/222706."},{"key":"ref_19","unstructured":"(2017, November 06). ObsPy. Available online: https:\/\/github.com\/obspy\/obspy\/wiki."},{"key":"ref_20","unstructured":"(2017, November 06). SPECFEM3D. Available online: https:\/\/geodynamics.org\/cig\/software\/specfem3d\/."},{"key":"ref_21","unstructured":"(2017, October 18). PH5: What Is It? IRIS PASSCAL. Available online: https:\/\/www.passcal.nmt.edu\/content\/ph5-what-it."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1003","DOI":"10.1093\/gji\/ggw319","article-title":"An Adaptable Seismic Data Format","volume":"207","author":"Krischer","year":"2016","journal-title":"Geophys. J. Int."}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/4\/4\/45\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:53:48Z","timestamp":1760208828000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/4\/4\/45"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,12,13]]},"references-count":22,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2017,12]]}},"alternative-id":["informatics4040045"],"URL":"https:\/\/doi.org\/10.3390\/informatics4040045","relation":{},"ISSN":["2227-9709"],"issn-type":[{"type":"electronic","value":"2227-9709"}],"subject":[],"published":{"date-parts":[[2017,12,13]]}}}