{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T21:35:06Z","timestamp":1777498506967,"version":"3.51.4"},"reference-count":63,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2019,1,7]],"date-time":"2019-01-07T00:00:00Z","timestamp":1546819200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Having insight into the causal associations in a complex system facilitates decision making, e.g., for medical treatments, urban infrastructure improvements or financial investments. The amount of observational data grows, which enables the discovery of causal relationships between variables from observation of their behaviour in time. Existing methods for causal discovery from time series data do not yet exploit the representational power of deep learning. We therefore present the Temporal Causal Discovery Framework (TCDF), a deep learning framework that learns a causal graph structure by discovering causal relationships in observational time series data. TCDF uses attention-based convolutional neural networks combined with a causal validation step. By interpreting the internal parameters of the convolutional networks, TCDF can also discover the time delay between a cause and the occurrence of its effect. Our framework learns temporal causal graphs, which can include confounders and instantaneous effects. Experiments on financial and neuroscientific benchmarks show state-of-the-art performance of TCDF on discovering causal relationships in continuous time series data. Furthermore, we show that TCDF can circumstantially discover the presence of hidden confounders. Our broadly applicable framework can be used to gain novel insights into the causal dependencies in a complex system, which is important for reliable predictions, knowledge discovery and data-driven decision making.<\/jats:p>","DOI":"10.3390\/make1010019","type":"journal-article","created":{"date-parts":[[2019,1,9]],"date-time":"2019-01-09T03:06:06Z","timestamp":1547003166000},"page":"312-340","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":203,"title":["Causal Discovery with Attention-Based Convolutional Neural Networks"],"prefix":"10.3390","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0558-3810","authenticated-orcid":false,"given":"Meike","family":"Nauta","sequence":"first","affiliation":[{"name":"Faculty of EEMCS, University of Twente, PO Box 217, 7500 AE Enschede, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4830-7162","authenticated-orcid":false,"given":"Doina","family":"Bucur","sequence":"additional","affiliation":[{"name":"Faculty of EEMCS, University of Twente, PO Box 217, 7500 AE Enschede, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6776-3868","authenticated-orcid":false,"given":"Christin","family":"Seifert","sequence":"additional","affiliation":[{"name":"Faculty of EEMCS, University of Twente, PO Box 217, 7500 AE Enschede, The Netherlands"}]}],"member":"1968","published-online":{"date-parts":[[2019,1,7]]},"reference":[{"key":"ref_1","unstructured":"Kleinberg, S. (2015). Why: A Guide to Finding and Using Causes, O\u2019Reilly."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Kleinberg, S. (2013). Causality, Probability, and Time, Cambridge University Press.","DOI":"10.1017\/CBO9781139207799"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"2327","DOI":"10.1109\/TAC.2015.2491678","article-title":"AR Identification of Latent-Variable Graphical Models","volume":"61","author":"Zorzi","year":"2016","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_4","first-page":"1643","article-title":"Introduction to causal inference","volume":"11","author":"Spirtes","year":"2010","journal-title":"J. Mach. Learn. Res."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1093\/nsr\/nwx137","article-title":"Learning causality and causality-related learning: Some recent progress","volume":"5","author":"Zhang","year":"2017","journal-title":"Natl. Sci. Rev."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Helen Beebee, C.H., and Menzies, P. (2009). The Psychology of Causal Perception and Reasoning. The Oxford Handbook of Causation, Oxford University Press. Chapter 21.","DOI":"10.1093\/oxfordhb\/9780199279739.001.0001"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Abdul, A., Vermeulen, J., Wang, D., Lim, B.Y., and Kankanhalli, M. (2018, January 21\u201326). Trends and trajectories for explainable, accountable and intelligible systems: An hci research agenda. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada.","DOI":"10.1145\/3173574.3174156"},{"key":"ref_8","unstructured":"Runge, J., Sejdinovic, D., and Flaxman, S. (arXiv, 2017). Detecting causal associations in large nonlinear time series datasets, arXiv."},{"key":"ref_9","unstructured":"Huang, Y., and Kleinberg, S. (2015, January 18\u201320). Fast and Accurate Causal Inference from Time Series Data. Proceedings of the FLAIRS Conference, Hollywood, FL, USA."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1016\/j.neuroimage.2014.06.013","article-title":"A copula approach to assessing Granger causality","volume":"100","author":"Hu","year":"2014","journal-title":"NeuroImage"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1007\/s10614-015-9491-x","article-title":"Detecting causality in non-stationary time series using partial symbolic transfer entropy: Evidence in financial data","volume":"47","author":"Papana","year":"2016","journal-title":"Comput. Econ."},{"key":"ref_12","unstructured":"M\u00fcller, B., Reinhardt, J., and Strickland, M.T. (2012). Neural Networks: An Introduction, Springer."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Hyv\u00e4rinen, A., Shimizu, S., and Hoyer, P.O. (2008, January 5\u20139). Causal modelling combining instantaneous and lagged effects: An identifiable model based on non-Gaussianity. Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland.","DOI":"10.1145\/1390156.1390210"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"e12470","DOI":"10.1111\/phc3.12470","article-title":"Causal discovery algorithms: A practical guide","volume":"13","author":"Malinsky","year":"2018","journal-title":"Philos. Compass"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1007\/s10827-010-0247-2","article-title":"Estimating the directed information to infer causal relationships in ensemble neural spike train recordings","volume":"30","author":"Quinn","year":"2011","journal-title":"J. Comput. Neurosci."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"10580","DOI":"10.1016\/j.ifacol.2017.08.1310","article-title":"On the identifiability of dynamical networks","volume":"50","author":"Gevers","year":"2017","journal-title":"IFAC-PapersOnLine"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1016\/j.conb.2012.11.010","article-title":"Analysing connectivity with Granger causality and dynamic causal modelling","volume":"23","author":"Friston","year":"2013","journal-title":"Curr. Opin. Neurobiol."},{"key":"ref_18","unstructured":"Peters, J., Janzing, D., and Sch\u00f6lkopf, B. (2017). Elements of Causal Inference: Foundations and Learning Algorithms, MIT Press."},{"key":"ref_19","unstructured":"Papana, A., Kyrtsou, K., Kugiumtzis, D., and Diks, C. (2014). Identifying Causal Relationships in Case of Non-Stationary Time Series, Universiteit van Amsterdam. Technical Report."},{"key":"ref_20","first-page":"967","article-title":"Search for additive nonlinear time series causal models","volume":"9","author":"Chu","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_21","unstructured":"Entner, D., and Hoyer, P.O. (2010, January 13\u201315). On causal discovery from time series data using FCI. Proceedings of the Fifth European Workshop on Probabilistic Graphical Models, Helsinki, Finland."},{"key":"ref_22","unstructured":"Peters, J., Janzing, D., and Sch\u00f6lkopf, B. (2013). Causal inference on time series using restricted structural equation models. Advances in Neural Information Processing Systems, The MIT Press."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"6220","DOI":"10.1109\/TIT.2013.2267934","article-title":"Universal estimation of directed information","volume":"59","author":"Jiao","year":"2013","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_24","first-page":"424","article-title":"Investigating causal relations by econometric models and cross-spectral methods","volume":"37","author":"Granger","year":"1969","journal-title":"Econom. J. Econom. Soc."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1016\/j.jneumeth.2005.06.011","article-title":"Frequency decomposition of conditional Granger causality and application to multivariate neural field potential data","volume":"150","author":"Chen","year":"2006","journal-title":"J. Neurosci. Methods"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1016\/j.automatica.2016.08.014","article-title":"Sparse plus low rank network identification: A nonparametric approach","volume":"76","author":"Zorzi","year":"2017","journal-title":"Automatica"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"144103","DOI":"10.1103\/PhysRevLett.100.144103","article-title":"Kernel method for nonlinear Granger causality","volume":"100","author":"Marinazzo","year":"2008","journal-title":"Phys. Rev. Lett."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Luo, Q., Ge, T., Grabenhorst, F., Feng, J., and Rolls, E.T. (2013). Attention-dependent modulation of cortical taste circuits revealed by Granger causality with signal-dependent noise. PLoS Comput. Biol., 9.","DOI":"10.1371\/journal.pcbi.1003265"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/s40535-016-0018-x","article-title":"Causal discovery and inference: Concepts and recent methodological advances","volume":"Volume 3","author":"Spirtes","year":"2016","journal-title":"Applied Informatics"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Spirtes, P., Glymour, C.N., and Scheines, R. (2000). Causation, Prediction, and Search, MIT Press.","DOI":"10.7551\/mitpress\/1754.001.0001"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Liu, Y., and Aviyente, S. (2012, January 5\u20138). The relationship between transfer entropy and directed information. Proceedings of the Statistical Signal Processing Workshop (SSP), Ann Arbor, MI, USA.","DOI":"10.1109\/SSP.2012.6319809"},{"key":"ref_32","unstructured":"Guo, T., Lin, T., and Lu, Y. (May, January 30). An Interpretable LSTM Neural Network for Autoregressive Exogenous Model. Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada."},{"key":"ref_33","unstructured":"Louizos, C., Shalit, U., Mooij, J.M., Sontag, D., Zemel, R., and Welling, M. (2017). Causal effect inference with deep latent-variable models. Advances in Neural Information Processing Systems, The MIT Press."},{"key":"ref_34","unstructured":"Goudet, O., Kalainathan, D., Caillou, P., Guyon, I., Lopez-Paz, D., and Sebag, M. (arXiv, 2018). Causal Generative Neural Networks, arXiv."},{"key":"ref_35","unstructured":"Kalainathan, D., Goudet, O., Guyon, I., Lopez-Paz, D., and Sebag, M. (arXiv, 2018). SAM: Structural Agnostic Model, Causal Discovery and Penalized Adversarial Learning, arXiv."},{"key":"ref_36","unstructured":"Bai, S., Kolter, J.Z., and Koltun, V. (May, January 30). Convolutional Sequence Modeling Revisited. Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1109\/72.279181","article-title":"Learning long-term dependencies with gradient descent is difficult","volume":"5","author":"Bengio","year":"1994","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_38","unstructured":"Gehring, J., Auli, M., Grangier, D., Yarats, D., and Dauphin, Y.N. (2017, January 6\u201311). Convolutional Sequence to Sequence Learning. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia."},{"key":"ref_39","unstructured":"Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., and Kavukcuoglu, K. (2016). Conditional image generation with pixelCNN decoders. Advances in Neural Information Processing Systems, The MIT Press."},{"key":"ref_40","unstructured":"Borovykh, A., Bohte, S., and Oosterlee, C.W. (2017). Conditional time series forecasting with convolutional neural networks. Lecture Notes in Computer Science\/Lecture Notes in Artificial Intelligence, Springer."},{"key":"ref_41","unstructured":"Binkowski, M., Marti, G., and Donnat, P. (arXiv, 2017). Autoregressive Convolutional Neural Networks for Asynchronous Time Series, arXiv."},{"key":"ref_42","unstructured":"Walther, D., Rutishauser, U., Koch, C., and Perona, P. (2004, January 15). On the usefulness of attention for object recognition. Proceedings of the Workshop on Attention and Performance in Computational Vision at ECCV, Prague, Czech Republic."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1162\/tacl_a_00097","article-title":"ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs","volume":"4","author":"Yin","year":"2016","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015, January 7\u201313). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.123"},{"key":"ref_45","unstructured":"Van Den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., and Kavukcuoglu, K. (arXiv, 2016). Wavenet: A generative model for raw audio, arXiv."},{"key":"ref_46","unstructured":"Sifre, L., and Mallat, S. (2018, October 15). Rigid-Motion Scattering for Image Classification. Available online: http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.672.7091&rep=rep1&type=pdf."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Chollet, F. (2017, January 21\u201326). Xception: Deep Learning with Depthwise Separable Convolutions. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.195"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_49","unstructured":"Martins, A., and Astudillo, R. (2016, January 19\u201324). From softmax to sparsemax: A sparse model of attention and multi-label classification. Proceedings of the International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Shen, T., Zhou, T., Long, G., Jiang, J., Wang, S., and Zhang, C. (2018, January 13\u201319). Reinforced Self-Attention Network: A Hybrid of Hard and Soft Attention for Sequence Modeling. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden.","DOI":"10.24963\/ijcai.2018\/604"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Eichler, M. (2012). Causal inference in time series analysis. Causality: Statistical Perspectives and Applications, Wiley.","DOI":"10.1002\/9781119945710.ch22"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Woodward, J. (2005). Making Things Happen: A Theory of Causal Explanation, Oxford University Press.","DOI":"10.1093\/0195155270.001.0001"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Van der Laan, M.J. (2006). Statistical inference for variable importance. Int. J. Biostat., 2.","DOI":"10.2202\/1557-4679.1008"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Datta, A., Sen, S., and Zick, Y. (2016, January 23\u201325). Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. Proceedings of the IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA.","DOI":"10.1109\/SP.2016.42"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"2324","DOI":"10.1214\/13-AOS1145","article-title":"Quantifying causal influences","volume":"41","author":"Janzing","year":"2013","journal-title":"Ann. Stat."},{"key":"ref_57","first-page":"427","article-title":"The cross-section of expected stock returns","volume":"47","author":"Fama","year":"1992","journal-title":"J. Financ."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1016\/j.neuroimage.2010.08.063","article-title":"Network modelling methods for FMRI","volume":"54","author":"Smith","year":"2011","journal-title":"Neuroimage"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"855","DOI":"10.1002\/mrm.1910390602","article-title":"Dynamics of blood flow and oxygenation changes during brain activation: The balloon model","volume":"39","author":"Buxton","year":"1998","journal-title":"Magn. Reson. Med."},{"key":"ref_60","unstructured":"Kingma, D.P., and Ba, J. (2015, January 7\u20139). Adam: A method for stochastic optimization, 2014. Proceedings of the International Conference on Learning Representations, San Diego, CA, USA."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"95405","DOI":"10.18637\/jss.v027.i03","article-title":"Automatic Time Series Forecasting: The forecast Package for R","volume":"27","author":"Hyndman","year":"2008","journal-title":"J. Stat. Softw."},{"key":"ref_62","unstructured":"Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, MIT Press. Available online: http:\/\/www.deeplearningbook.org."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1177\/2515245917745629","article-title":"Thinking clearly about correlations and causation: Graphical causal models for observational data","volume":"1","author":"Rohrer","year":"2018","journal-title":"Adv. Methods Pract. Psychol. Sci."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/1\/19\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:24:03Z","timestamp":1760185443000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/1\/19"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,7]]},"references-count":63,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["make1010019"],"URL":"https:\/\/doi.org\/10.3390\/make1010019","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,1,7]]}}}