{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:57:11Z","timestamp":1760241431683,"version":"build-2065373602"},"reference-count":57,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2018,2,23]],"date-time":"2018-02-23T00:00:00Z","timestamp":1519344000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>Scientists routinely analyse and share data for others to use. Successful data (re)use relies on having metadata describing the context of analysis of data. In many disciplines the creation of contextual metadata is referred to as reporting. One method of implementing analyses is with workflows. A stand-out feature of workflows is their ability to record provenance from executions. Provenance is useful when analyses are executed with changing parameters (changing contexts) and results need to be traced to respective parameters. In this paper we investigate whether provenance can be exploited to support reporting. Specifically; we outline a case-study based on a real-world workflow and set of reporting queries. We observe that provenance, as collected from workflow executions, is of limited use for reporting, as it supports queries partially. We identify that this is due to the generic nature of provenance, its lack of domain-specific contextual metadata. We observe that the required information is available in implicit form, embedded in data. We describe LabelFlow, a framework comprised of four Labelling Operators for decorating provenance with domain-specific Labels. LabelFlow can be instantiated for a domain by plugging it with domain-specific metadata extractors. We provide a tool that takes as input a workflow, and produces as output a Labelling Pipeline for that workflow, comprised of Labelling Operators. We revisit the case-study and show how Labels provide a more complete implementation of reporting queries.<\/jats:p>","DOI":"10.3390\/informatics5010011","type":"journal-article","created":{"date-parts":[[2018,2,23]],"date-time":"2018-02-23T12:17:39Z","timestamp":1519388259000},"page":"11","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["LabelFlow Framework for Annotating Workflow Provenance"],"prefix":"10.3390","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2224-0780","authenticated-orcid":false,"given":"Pinar","family":"Alper","sequence":"first","affiliation":[{"name":"Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L 4365 Esch-sur-Alzette, Luxembourg"}]},{"given":"Khalid","family":"Belhajjame","sequence":"additional","affiliation":[{"name":"LAMSADE Research Lab, Universit\u00e9 Paris Dauphine, UMR CNRS 7243 Paris, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8308-2886","authenticated-orcid":false,"given":"Vasa","family":"Curcin","sequence":"additional","affiliation":[{"name":"Department of Population Health Sciences, King\u2019s College London, London SE1 1UL, UK"}]},{"given":"Carole","family":"Goble","sequence":"additional","affiliation":[{"name":"School of Computer Science, University of Manchester, Manchester M13 9PL, UK"}]}],"member":"1968","published-online":{"date-parts":[[2018,2,23]]},"reference":[{"key":"ref_1","unstructured":"Hey, T., Tansley, S., and Tolle, K.M. (2009). The Fourth Paradigm: Data-Intensive Scientific Discovery, Microsoft Research."},{"key":"ref_2","unstructured":"(2018, February 22). Available online: http:\/\/www.nature.com\/sdata\/."},{"key":"ref_3","unstructured":"Davenhall, C. (2011). Curation Reference Manual, Chapter on Scientific Metadata, The Digital Curation Centre (DCC). Available online: http:\/\/www.dcc.ac.uk\/resources\/curation-reference-manual."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"889","DOI":"10.1038\/nbt.1411","article-title":"Promoting coherent minimum reporting guidelines for biological and biomedical investigations: The MIBBI project","volume":"26","author":"Taylor","year":"2008","journal-title":"Nat. Biotechnol."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1038\/ng.1054","article-title":"Toward interoperable bioscience data","volume":"44","author":"Sansone","year":"2012","journal-title":"Nat. Genet."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1039","DOI":"10.1002\/cpe.994","article-title":"Scientific workflow management and the Kepler system","volume":"18","author":"Ludaescher","year":"2006","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1451","DOI":"10.1101\/gr.4086505","article-title":"Galaxy: A platform for interactive large-scale genome analysis","volume":"15","author":"Giardine","year":"2005","journal-title":"Genome Res."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1109\/MIS.2010.9","article-title":"Wings: Intelligent Workflow-Based Design of Computational Experiments","volume":"26","author":"Gil","year":"2011","journal-title":"IEEE Intell. Syst."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Callahan, S.P., Freire, J., Santos, E., Scheidegger, C.E., Silva, C.T., and Vo, H.T. (2006). Vistrails: Visualization meets data management. ACM SIGMOD, ACM Press.","DOI":"10.1145\/1142473.1142574"},{"key":"ref_10","unstructured":"R Core Team (2014). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing. Available online: https:\/\/www.r-project.org."},{"key":"ref_11","unstructured":"Rossum, G. (1995). Python Reference Manual, CWI (Centre for Mathematics and Computer Science). Technical Report."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Missier, P., Paton, N.W., and Belhajjame, K. (2010, January 22\u201326). Fine-grained and Efficient Lineage Querying of Collection-based Workflow Provenance. Proceedings of the 13th International Conference on Extending Database Technology, Lausanne, Switzerland.","DOI":"10.1145\/1739041.1739079"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1007\/s10619-009-7058-3","article-title":"Understanding provenance black boxes","volume":"27","author":"Chapman","year":"2010","journal-title":"Distrib. Parallel Databases"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A.U., Wu, L., Read, E., Manoff, M., and Frame, M. (2011). Data Sharing by Scientists: Practices and Perceptions. PLoS ONE, 6.","DOI":"10.1371\/journal.pone.0021101"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Missier, P., Sahoo, S.S., Zhao, J., Goble, C., and Sheth, A. (2010, January 15\u201316). Janus: From Workflows to Semantic Provenance and Linked Open Data. Proceedings of the 3rd International Provenance and Annotation Workshop (IPAW 2010), Troy, NY, USA.","DOI":"10.1007\/978-3-642-17819-1_16"},{"key":"ref_16","unstructured":"Cao, B., Plale, B., Subramanian, G., Missier, P., Goble, C.A., and Simmhan, Y. (2009, January 25). Semantically Annotated Provenance in the Life Science Grid. Proceedings of the 1st International Workshop on the role of Semantic Web in Provenance Management (SWPM 2009), Washington DC, USA."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1145\/1743546.1743568","article-title":"Managing Scientific Data","volume":"53","author":"Ailamaki","year":"2010","journal-title":"Commun. ACM"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Belhajjame, K., Zhao, J., Garijo, D., Garrido, A., Soiland-Reyes, S., Alper, P., and Corcho, O. (2013, January 18\u201322). A Workflow PROV-corpus Based on Taverna and Wings. Proceedings of the Joint EDBT\/ICDT 2013 Workshops, Genoa, Italy.","DOI":"10.1145\/2457317.2457376"},{"key":"ref_19","unstructured":"Hull, D., Stevens, R., Lord, P., Wroe, C., and Goble, C. (2004, January 8). Treating shimantic web syndrome with ontologies. Proceedings of the 1st Advanced Knowledge Technologies Workshop on Semantic Web Services (AKT-SWS04) KMi, Milton Keynes, UK."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Alagiannis, I., Borovica, R., Branco, M., Idreos, S., and Ailamaki, A. (2012, January 20\u201324). NoDB: Efficient Query Execution on Raw Data Files. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, Scottsdale, AZ, USA.","DOI":"10.1145\/2213836.2213864"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"556","DOI":"10.1088\/1742-6596\/16\/1\/077","article-title":"FastBit: An efficient indexing technology for accelerating data-intensive science","volume":"16","author":"Wu","year":"2005","journal-title":"J. Phys. Conf. Ser."},{"key":"ref_22","unstructured":"Alawini, A., Maier, D., Tufte, K., Howe, B., and Nandikur, R. (July, January 29). Towards Automated Prediction of Relationships Among Scientific Datasets. Proceedings of the 27th International Conference on Scientific and Statistical Database Management, La Jolla, CA, USA."},{"key":"ref_23","unstructured":"Sousa, V.S., de Oliveira, D., and Mattoso, M. (2014, January 22\u201324). Exploratory Analysis of Raw Data Files through Dataflows. Proceedings of the 2014 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW), Paris, France."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1016\/j.future.2013.09.018","article-title":"Common motifs in scientific workflows: An empirical analysis","volume":"36","author":"Garijo","year":"2014","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1109\/MIC.2011.7","article-title":"Extending Semantic Provenance into the Web of Data","volume":"15","author":"Zhao","year":"2011","journal-title":"IEEE Internet Comput."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Alper, P., Goble, C.A., and Belhajjame, K. (2013, January 17). On assisting scientific data curation in collection-based dataflows using labels. Proceedings of the 8th Workshop On Workflows in Support of Large-Scale Science, (WORKS), Denver, CO, USA.","DOI":"10.1145\/2534248.2534249"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Alper, P., Belhajjame, K., Goble, C.A., and Karagoz, P. (2014, January 9\u201313). LabelFlow: Exploiting Workflow Provenance to Surface Scientific Data Provenance. Proceedings of the 5th International Provenance and Annotation Workshop (IPAW), Cologne, Germany.","DOI":"10.1007\/978-3-319-16462-5_7"},{"key":"ref_28","unstructured":"Exposito, S.S. (2018, February 22). Available online: http:\/\/www.myexperiment.org\/workflows\/2920\/versions\/2.html."},{"key":"ref_29","first-page":"471","article-title":"Taverna, Reloaded","volume":"Volume 6187","author":"Gertz","year":"2010","journal-title":"Proceedings of Scientific and Statistical Database Management Conference (SSDBM), Lecture Notes in Computer Science, Heidelberg, Germany, 30 June\u20132 July 2010"},{"key":"ref_30","first-page":"409","article-title":"The First Provenance Challenge","volume":"20","author":"Moreau","year":"2008","journal-title":"CCPE"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1016\/j.future.2017.01.004","article-title":"Static analysis of Taverna workflows to predict provenance patterns","volume":"75","author":"Alper","year":"2017","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1016\/j.websem.2015.01.003","article-title":"Using a suite of ontologies for preserving workflow-centric research objects","volume":"32","author":"Belhajjame","year":"2015","journal-title":"Web Semant. Sci. Serv. Agents World Wide Web"},{"key":"ref_33","unstructured":"Wood, D., Lanthaler, M., and Cyganiak, R. (2018, February 22). Available online: https:\/\/www.w3.org\/TR\/rdf11-concepts\/."},{"key":"ref_34","unstructured":"Groth, P., and Editors, L.M. (2018, February 22). Available online: http:\/\/www.w3.org\/TR\/2013\/NOTE-prov-overview-20130430\/."},{"key":"ref_35","unstructured":"Missier, P., Dey, S., Belhajjame, K., Cuevas-Vicentt\u0131n, V., and Lud\u00e4scher, B. (2013, January 2\u20133). D-PROV: Extending the PROV provenance model with workflow structure. Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance, Lombard, IL, USA."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Brandizi, M., Melnichuk, O., Bild, R., Kohlmayer, F., Rodriguez-Castro, B., Spengler, H., Kuhn, K.A., Kuchinke, W., Ohmann, C., and Mustonen, T. (2017). Orchestrating differential data access for translational research: A pilot implementation. BMC Med. Inf. Decis. Mak., 17.","DOI":"10.1186\/s12911-017-0424-6"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1533","DOI":"10.14778\/3007263.3007302","article-title":"SPARQLByE: Querying RDF Data by Example","volume":"9","author":"Diaz","year":"2016","journal-title":"Proc. VLDB Endow."},{"key":"ref_38","unstructured":"Garijo, D., Alper, P., and Belhajjame, K. (2018, February 22). Available online: http:\/\/vocab.linkeddata.es\/motifs\/."},{"key":"ref_39","unstructured":"Booch, G., Rumbaugh, J., and Jacobson, I. (2005). Unified Modeling Language User Guide, Addison-Wesley Professional. [2nd ed.]."},{"key":"ref_40","unstructured":"Alper, P. (2018, February 22). LabelFlow Evaluation Datasets. Available online: https:\/\/github.com\/pinarpink\/phd-sources\/tree\/master\/labeling-workflow-generator."},{"key":"ref_41","unstructured":"Belhajjame, K., Cheney, J., Corsar, D., Garijo, D., Soiland-Reyes, S., Zednik, S., and Zhao, J. (2018, February 22). Available online: http:\/\/www.w3.org\/TR\/prov-o\/."},{"key":"ref_42","unstructured":"Group, P.W. (2018, February 22). PROV Implementation Report. Available online: https:\/\/www.w3.org\/TR\/prov-implementations\/."},{"key":"ref_43","unstructured":"Carroll, J.J., Dickinson, I., Dollin, C., Reynolds, D., Seaborne, A., and Wilkinson, K. (2004, January 17\u201320). Jena: Implementing the Semantic Web Recommendations. Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers & Amp, New York, NY, USA."},{"key":"ref_44","unstructured":"Gnesi, S., and Rensink, A. (2014). An Online Validator for Provenance: Algorithmic Design, Testing, and API. Fundamental Approaches to Software Engineering, Springer."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1007\/978-3-642-17819-1_16","article-title":"Janus: From Workflows to Semantic Provenance and Linked Open Data","volume":"Volume 6378","author":"Missier","year":"2010","journal-title":"Provenance and Annotation of Data and Processes"},{"key":"ref_46","first-page":"92","article-title":"Using Semantic Web Technologies for Representing e-Science Provenance","volume":"Volume 3298","author":"Zhao","year":"2004","journal-title":"Proceedings of the ISWC 2004"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1109\/MIC.2008.86","article-title":"Semantic provenance for escience: Managing the deluge of scientific data","volume":"12","author":"Sahoo","year":"2008","journal-title":"IEEE Internet Comput."},{"key":"ref_49","unstructured":"De Oliveira, D., Silva, V., and Mattoso, M. How Much Domain Data Should Be in Provenance Databases? In Proceedings of the 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP 15), Edinburgh, UK, 8\u20139 July 2015; USENIX Association: Edinburgh, UK, 2015."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Halper, M., Geller, J., and Perl, Y. (1993, January 1\u20135). Value Propagation in Object-oriented Database Part Hierarchies. Proceedings of the Second International Conference on Information and Knowledge Management, ACM, CIKM\u201993, Washington, DC, USA.","DOI":"10.1145\/170088.170439"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1016\/S0169-023X(96)00013-4","article-title":"Part-whole Relations in Object-centered Systems: An Overview","volume":"20","author":"Artale","year":"1996","journal-title":"Data Knowl. Eng."},{"key":"ref_52","first-page":"380","article-title":"Theoretical Considerations of Lifecycle Modelling: An Analysis of the Dryad Repository Demonstrating Automatic Metadata Propagation, Inheritance, and Value System Adoption","volume":"47","author":"Greenberg","year":"2009","journal-title":"Cat. Classif. Q."},{"key":"ref_53","unstructured":"Nascimento, M.A., Ozsu, M.T., Nascimento, M.A., \u00d6zsu, M.T., Kossmann, D., Miller, R.J., Blakeley, J.A., and Schiefe, B. (September, January 31). An Annotation Management System for Relational Databases. Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, ON, Canada."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Bowers, S., and Lud\u00e4scher, B. (2006, January 26\u201331). A Calculus for Propagating Semantic Annotations Through Scientific Workflow Queries. Proceedings of the 2006 International Conference on Current Trends in Database Technology, Munich, Germany.","DOI":"10.1007\/11896548_54"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"689","DOI":"10.1093\/nar\/gkq394","article-title":"BioCatalogue: A universal catalogue of web services for the life sciences","volume":"38","author":"Bhagat","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"ref_56","unstructured":"Hitzler, P., Kr\u00f6tzsch, M., Parsia, B., and Rudolph, S. (2018, February 22). Available online: http:\/\/www.w3.org\/TR\/owl2-primer\/."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"599","DOI":"10.1016\/j.future.2011.08.004","article-title":"Why linked data is not enough for scientists. Special section: Recent advances in e-Science","volume":"29","author":"Bechhofer","year":"2013","journal-title":"Future Gener. Comput. Syst."}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/5\/1\/11\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T14:56:04Z","timestamp":1760194564000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/5\/1\/11"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,2,23]]},"references-count":57,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2018,3]]}},"alternative-id":["informatics5010011"],"URL":"https:\/\/doi.org\/10.3390\/informatics5010011","relation":{},"ISSN":["2227-9709"],"issn-type":[{"type":"electronic","value":"2227-9709"}],"subject":[],"published":{"date-parts":[[2018,2,23]]}}}