{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:05:59Z","timestamp":1760241959133,"version":"build-2065373602"},"reference-count":29,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2018,11,22]],"date-time":"2018-11-22T00:00:00Z","timestamp":1542844800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>We are living at a time that allows the generation of mass data in almost any field of science. For instance, in pharmacogenomics, there exist a number of big data repositories, e.g., the Library of Integrated Network-based Cellular Signatures (LINCS) that provide millions of measurements on the genomics level. However, to translate these data into meaningful information, the data need to be analyzable. The first step for such an analysis is the deliberate selection of subsets of raw data for studying dedicated research questions. Unfortunately, this is a non-trivial problem when millions of individual data files are available with an intricate connection structure induced by experimental dependencies. In this paper, we argue for the need to introduce such search capabilities for big genomics data repositories with a specific discussion about LINCS. Specifically, we suggest the introduction of smart interfaces allowing the exploitation of the connections among individual raw data files, giving raise to a network structure, by graph-based searches.<\/jats:p>","DOI":"10.3390\/make1010012","type":"journal-article","created":{"date-parts":[[2018,11,23]],"date-time":"2018-11-23T03:41:31Z","timestamp":1542944491000},"page":"205-210","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Exploiting Genomic Relations in Big Data Repositories by Graph-Based Search Methods"],"prefix":"10.3390","volume":"1","author":[{"given":"Aliyu","family":"Musa","sequence":"first","affiliation":[{"name":"Predictive Medicine and Data Analytics Lab, Department of Signal Processing, Tampere University of Technology, 33720 Tampere, Finland"},{"name":"Institute of Biosciences and Medical Technology, 33520 Tampere, Finland"}]},{"given":"Matthias","family":"Dehmer","sequence":"additional","affiliation":[{"name":"Department of Mechatronics and Biomedical Computer Science, UMIT, 6060 Hall in Tyrol, Austria"},{"name":"College of Computer and Control Engineering, Nankai University, Tianjin 300071, China"},{"name":"Institute for Intelligent Production, Faculty for Management, University of Applied Sciences Upper Austria, 4400 Steyr Campus, Austria"}]},{"given":"Olli","family":"Yli-Harja","sequence":"additional","affiliation":[{"name":"Institute of Biosciences and Medical Technology, 33520 Tampere, Finland"},{"name":"Computational Systems Biology Lab, Tampere University of Technology, 33720 Tampere, Finland"},{"name":"Institute for Systems Biology, Seattle, WA 98109, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0745-5641","authenticated-orcid":false,"given":"Frank","family":"Emmert-Streib","sequence":"additional","affiliation":[{"name":"Predictive Medicine and Data Analytics Lab, Department of Signal Processing, Tampere University of Technology, 33720 Tampere, Finland"},{"name":"Institute of Biosciences and Medical Technology, 33520 Tampere, Finland"}]}],"member":"1968","published-online":{"date-parts":[[2018,11,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1093\/nar\/30.1.207","article-title":"Gene Expression Omnibus: NCBI gene expression and hybridization array data repository","volume":"30","author":"Edgar","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Holzinger, A., and Jurisica, I. (2014). Knowledge discovery and data mining in biomedical informatics: The future is in integrative, interactive machine learning solutions. Interactive Knowledge Discovery and Data Mining in Biomedical Informatics, Springer.","DOI":"10.1007\/978-3-662-43968-5"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1929","DOI":"10.1126\/science.1132939","article-title":"The Connectivity Map: Using gene-expression signatures to connect small molecules, genes, and disease","volume":"313","author":"Lamb","year":"2006","journal-title":"Science"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"450","DOI":"10.1016\/j.tips.2014.07.001","article-title":"Lean Big Data integration in systems biology and systems pharmacology","volume":"35","author":"Rouillard","year":"2014","journal-title":"Trends Pharmacol. Sci."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1126\/science.1158140","article-title":"Drug target identification using side-effect similarity","volume":"321","author":"Campillos","year":"2008","journal-title":"Science"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Subramanian, A., Narayan, R., Corsello, S.M., Peck, D.D., Natoli, T.E., Lu, X., Gould, J., Davis, J.F., Tubelli, A.A., and Asiedu, J.K. (2017). A Next Generation Connectivity Map: L1000 Platform And The First 1,000,000 Profiles. BioRxiv.","DOI":"10.1016\/j.cell.2017.10.049"},{"key":"ref_7","first-page":"506","article-title":"A Review of Connectivity Mapping and Computational Approaches in Pharmacogenomics","volume":"19","author":"Musa","year":"2017","journal-title":"Brief. Bioinform."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Musa, A., Tripathi, S., Kandhavelu, M., Dehmer, M., and Emmert-Streib, F. (2018). Harnessing the biological complexity of Big Data from LINCS gene expression signatures. PLoS ONE, 13.","DOI":"10.1371\/journal.pone.0201937"},{"key":"ref_9","first-page":"342","article-title":"Large-scale integration of small molecule-induced genome-wide transcriptional responses, Kinome-wide binding affinities and cell-growth inhibition profiles reveal global trends characterizing systems-level drug action","volume":"5","author":"Vidovic","year":"2014","journal-title":"Front. Genet."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"D1005","DOI":"10.1093\/nar\/gkq1184","article-title":"NCBI GEO: Archive for functional genomics data sets -10 years on","volume":"39","author":"Barrett","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1145\/362384.362685","article-title":"A Relational Model of Data for Large Shared Data Banks","volume":"13","author":"Codd","year":"1970","journal-title":"Commun. ACM"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wiese, L. (2015). Advanced Data Management: For SQL, NoSQL, Cloud and Distributed Databases, De Gruyter.","DOI":"10.1515\/9783110441413"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1322432.1322433","article-title":"Survey of Graph Database Models","volume":"40","author":"Angles","year":"2008","journal-title":"ACM Comput. Surv."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"886","DOI":"10.14778\/1687627.1687727","article-title":"Distance-join: Pattern match query in a large graph database","volume":"2","author":"Zou","year":"2009","journal-title":"Proc. VLDB Endowment"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"e26726","DOI":"10.7554\/eLife.26726","article-title":"Systematic integration of biomedical knowledge prioritizes drugs for repurposing","volume":"6","author":"Himmelstein","year":"2017","journal-title":"eLife"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"D619","DOI":"10.1093\/nar\/gkn863","article-title":"Reactome knowledgebase of human biological pathways and processes","volume":"37","author":"Matthews","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pone.0179130","article-title":"biochem4j: Integrated and extensible biochemical knowledge through graph databases","volume":"12","author":"Swainston","year":"2017","journal-title":"PLoS ONE"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Tour\u00e9, V., Mazein, A., Waltemath, D., Balaur, I., Saqi, M., Henkel, R., Pellet, J., and Auffray, C. (2016). STON: Exploring biological pathways using the SBGN standard and graph databases. BMC Bioinform., 17.","DOI":"10.1186\/s12859-016-1394-x"},{"key":"ref_19","unstructured":"Cormen, T., Leiserson, C., Rivest, R., and Stein, C. (2001). Introduction to Algorithms, MIT Press."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Loeckx, J. (1974). File organization, an application of graph theory. Automata, Languages and Programming: 2nd Colloquium, University of Saarbr\u00fccken 29 July\u2013 2 August 1974, Springer.","DOI":"10.1007\/978-3-662-21545-6"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1016\/0304-3975(76)90023-2","article-title":"Information storage and retrieval? mathematical foundations II (combinatorial problems)","volume":"3","author":"Lipski","year":"1976","journal-title":"Theor. Comput. Sci."},{"key":"ref_22","unstructured":"Baeza-Yates, R., and Ribeiro-Neto, B. (1999). Modern Information Retrieval, ACM Press."},{"key":"ref_23","unstructured":"Chowdhury, G.G. (2010). Introduction to Modern Information Retrieval, Facet Publishing."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1145\/1365815.1365816","article-title":"Bigtable: A distributed storage system for structured data","volume":"26","author":"Chang","year":"2008","journal-title":"ACM Trans. Comput. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"813","DOI":"10.1038\/nrc1951","article-title":"The NCI60 human tumour cell line anticancer drug screen","volume":"6","author":"Shoemaker","year":"2006","journal-title":"Nat. Rev. Cancer"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1093\/nar\/gkg091","article-title":"ArrayExpress-a public repository for microarray gene expression data at the EBI","volume":"31","author":"Brazma","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1038\/nature11003","article-title":"The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity","volume":"483","author":"Barretina","year":"2012","journal-title":"Nature"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Dehmer, M., and Emmert-Streib, F. (2009). Analysis of Complex Networks: From Biology to Linguistics, Wiley-VCH.","DOI":"10.1002\/9783527627981"},{"key":"ref_29","first-page":"12","article-title":"The process of analyzing data is the emergent feature of data science","volume":"7","author":"Moutari","year":"2016","journal-title":"Front. Genet."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/1\/12\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:31:29Z","timestamp":1760196689000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/1\/12"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,11,22]]},"references-count":29,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["make1010012"],"URL":"https:\/\/doi.org\/10.3390\/make1010012","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2018,11,22]]}}}