{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T15:35:07Z","timestamp":1775057707197,"version":"3.50.1"},"reference-count":69,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T00:00:00Z","timestamp":1758672000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>Financial markets exhibit fundamental asymmetries in temporal causality, where policy interventions create asymmetric transmission patterns that traditional symmetric modeling approaches fail to capture. This work introduces a mathematical framework that exploits the inherent symmetries of transformer architectures while preserving essential asymmetric temporal relationships in financial causal inference. We develop CausalFormer, a symmetry-aware neural architecture that maintains the permutation equivariance properties of self-attention mechanisms while enforcing strict temporal asymmetry constraints for causal discovery. The framework incorporates three mathematically principled components: (1) a symmetric attention matrix construction with asymmetric temporal masking that preserves the mathematical elegance of transformer operations while ensuring causal consistency, (2) a multi-scale convolution module with symmetric kernel initialization but asymmetric temporal receptive fields that captures policy transmission effects across heterogeneous time horizons, and (3) enhanced Nelson\u2013Siegel decomposition that maintains the symmetric factor structure while modeling the evolution dynamics of asymmetric factors. Our mathematical formulation establishes the formal symmetry properties of the attention mechanism under temporal transformations while proving asymmetric convergence behaviors in policy transmission scenarios. The integration of symmetric optimization landscapes with asymmetric causal constraints enables simultaneous achievement of mathematical elegance and economic interpretability. Comprehensive experiments on monetary policy datasets demonstrate that the symmetry-aware design achieves a 15.3% improvement in the accuracy of causal effect estimations and a 12.7% enhancement in the predictive performance compared to those for existing methods while maintaining 91.2% causal consistency scores. The framework successfully identifies asymmetric policy transmission mechanisms, revealing that monetary tightening exhibits 40% faster propagation than easing policies, establishing new mathematical insights into the temporal asymmetries in financial systems. This work demonstrates how principled exploitation of architectural symmetries combined with domain-specific asymmetric constraints opens up new directions for mathematically rigorous yet economically interpretable deep learning in financial econometrics, with broad applications spanning computational finance, economic forecasting, and policy analysis.<\/jats:p>","DOI":"10.3390\/sym17101591","type":"journal-article","created":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T10:39:42Z","timestamp":1758710382000},"page":"1591","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Symmetry-Aware Transformers for Asymmetric Causal Discovery in Financial Time Series"],"prefix":"10.3390","volume":"17","author":[{"given":"Wenxia","family":"Zheng","sequence":"first","affiliation":[{"name":"Department of Economics, Texas A&M University, College Station, TX 77840, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4679-2958","authenticated-orcid":false,"given":"Wenhe","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA"}]}],"member":"1968","published-online":{"date-parts":[[2025,9,24]]},"reference":[{"key":"ref_1","unstructured":"Bernanke, B.S. (2022). 21st Century Monetary Policy: The Federal Reserve from the Great Inflation to COVID-19, WW Norton & Company."},{"key":"ref_2","unstructured":"Yellen, J. (2017). The Goals of Monetary Policy and How We Pursue Them: A Speech at the Commonwealth Club, San Francisco, California, January 18, 2017, Technical report."},{"key":"ref_3","unstructured":"Powell, J.H. (2020, August 27). New Economic Challenges and the Fed\u2019s Monetary Policy Review: A Speech at \u201cNavigating the Decade Ahead: Implications for Monetary Policy\u201d, an Economic Policy Symposium Sponsored by the Federal Reserve Bank of Kansas City, Jackson Hole, Wyoming, 27 August 2020. Technical Report, Available online: https:\/\/www.federalreserve.gov\/newsevents\/speech\/powell20200827a.htm."},{"key":"ref_4","first-page":"1","article-title":"Macroeconomics and reality","volume":"48","author":"Sims","year":"1980","journal-title":"Econom. J. Econom. Soc."},{"key":"ref_5","first-page":"655","article-title":"The dynamic effects of aggregate demand and supply disturbances","volume":"79","author":"Blanchard","year":"1989","journal-title":"Am. Econ. Rev."},{"key":"ref_6","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1002\/asmb.2209","article-title":"Deep learning for finance: Deep portfolios","volume":"33","author":"Heaton","year":"2017","journal-title":"Appl. Stoch. Model. Bus. Ind."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Pearl, J. (2009). Causality, Cambridge University Press.","DOI":"10.1017\/CBO9780511803161"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"688","DOI":"10.1037\/h0037350","article-title":"Estimating causal effects of treatments in randomized and nonrandomized studies","volume":"66","author":"Rubin","year":"1974","journal-title":"J. Educ. Psychol."},{"key":"ref_10","unstructured":"Sharma, A., and Kiciman, E. (2020). DoWhy: An end-to-end library for causal inference. arXiv."},{"key":"ref_11","first-page":"251","article-title":"Co-integration and error correction: Representation, estimation, and testing","volume":"55","author":"Engle","year":"1987","journal-title":"Econom. J. Econom. Soc."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"White, H. (1988, January 24\u201327). Economic prediction using neural networks: The case of IBM daily stock returns. Proceedings of the IEEE 1988 International Conference on Neural Networks (ICNN), San Diego, CA, USA.","DOI":"10.1109\/ICNN.1988.23959"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"3234","DOI":"10.1016\/j.eswa.2014.12.003","article-title":"Recurrent neural network and a hybrid model for prediction of stock returns","volume":"42","author":"Rather","year":"2015","journal-title":"Expert Syst. Appl."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Nelson, D.M., Pereira, A.C., and De Oliveira, R.A. (2017, January 14\u201319). Stock market\u2019s price movement prediction with LSTM neural networks. Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA.","DOI":"10.1109\/IJCNN.2017.7966019"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"106181","DOI":"10.1016\/j.asoc.2020.106181","article-title":"Financial time series forecasting with deep learning: A systematic literature review: 2005\u20132019","volume":"90","author":"Sezer","year":"2020","journal-title":"Appl. Soft Comput."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1145\/3309547","article-title":"Temporal relational ranking for stock prediction","volume":"37","author":"Feng","year":"2019","journal-title":"ACM Trans. Inf. Syst. (TOIS)"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Feng, F., Chen, H., He, X., Ding, J., Sun, M., and Chua, T.S. (2018). Enhancing stock movement prediction with adversarial training. arXiv.","DOI":"10.24963\/ijcai.2019\/810"},{"key":"ref_18","unstructured":"Wu, N., Green, B., Ben, X., and O\u2019Banion, S. (2020). Deep transformer models for time series forecasting: The influenza prevalence case. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1419","DOI":"10.1080\/14697688.2020.1730426","article-title":"Quant GANs: Deep generation of financial time series","volume":"20","author":"Wiese","year":"2020","journal-title":"Quant. Financ."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Eckerli, F., and Osterrieder, J. (2021). Generative adversarial networks in finance: An overview. arXiv.","DOI":"10.2139\/ssrn.3864965"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Imbens, G.W., and Rubin, D.B. (2015). Causal Inference in Statistics, Social, and Biomedical Sciences, Cambridge University Press.","DOI":"10.1017\/CBO9781139025751"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"669","DOI":"10.1093\/biomet\/82.4.669","article-title":"Causal diagrams for empirical research","volume":"82","author":"Pearl","year":"1995","journal-title":"Biometrika"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"103616","DOI":"10.1016\/j.clon.2024.07.002","article-title":"Causal inference in oncology: Why, what, how and when","volume":"38","author":"Elias","year":"2025","journal-title":"Clin. Oncol."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1080\/1350178X.2025.2468462","article-title":"Mostly harmless econometrics? Statistical paradigms in the \u2018top five\u2019 from 2000 to 2018","volume":"32","author":"Engler","year":"2025","journal-title":"J. Econ. Methodol."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Spirtes, P., Glymour, C.N., and Scheines, R. (2000). Causation, Prediction, and Search, MIT Press.","DOI":"10.7551\/mitpress\/1754.001.0001"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1873","DOI":"10.1016\/j.artint.2008.08.001","article-title":"On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias","volume":"172","author":"Zhang","year":"2008","journal-title":"Artif. Intell."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"7353","DOI":"10.1073\/pnas.1510489113","article-title":"Recursive partitioning for heterogeneous causal effects","volume":"113","author":"Athey","year":"2016","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1228","DOI":"10.1080\/01621459.2017.1319839","article-title":"Estimation and inference of heterogeneous treatment effects using random forests","volume":"113","author":"Wager","year":"2018","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"C1","DOI":"10.1111\/ectj.12097","article-title":"Double\/debiased machine learning for treatment and structural parameters","volume":"21","author":"Chernozhukov","year":"2018","journal-title":"Econom. J."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1257\/jep.24.2.3","article-title":"The credibility revolution in empirical economics: How better research design is taking the con out of econometrics","volume":"24","author":"Angrist","year":"2010","journal-title":"J. Econ. Perspect."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1055","DOI":"10.1257\/0002828042002651","article-title":"A new measure of monetary shocks: Derivation and implications","volume":"94","author":"Romer","year":"2004","journal-title":"Am. Econ. Rev."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1257\/mac.20130329","article-title":"Monetary policy surprises, credit costs, and economic activity","volume":"7","author":"Gertler","year":"2015","journal-title":"Am. Econ. J. Macroecon."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1283","DOI":"10.1093\/qje\/qjy004","article-title":"High-frequency identification of monetary non-neutrality: The information effect","volume":"133","author":"Nakamura","year":"2018","journal-title":"Q. J. Econ."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"2897","DOI":"10.1111\/j.1540-6261.2007.01296.x","article-title":"Vote trading and information aggregation","volume":"62","author":"Christoffersen","year":"2007","journal-title":"J. Financ."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"595","DOI":"10.2307\/3595013","article-title":"Partial adjustment to public information and IPO underpricing","volume":"37","author":"Bradley","year":"2002","journal-title":"J. Financ. Quant. Anal."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1635","DOI":"10.1093\/rfs\/hhw018","article-title":"Loan originations and defaults in the mortgage crisis: The role of the middle class","volume":"29","author":"Adelino","year":"2016","journal-title":"Rev. Financ. Stud."},{"key":"ref_37","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA."},{"key":"ref_38","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Yoo, J., Soun, Y., Park, Y.c., and Kang, U. (2021, January 14\u201318). Accurate multivariate stock movement prediction via data-axis transformer with multi-level contexts. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Virtual.","DOI":"10.1145\/3447548.3467297"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"20111","DOI":"10.1007\/s00521-024-09805-9","article-title":"Combining transformer based deep reinforcement learning with Black-Litterman model for portfolio optimization","volume":"36","author":"Sun","year":"2024","journal-title":"Neural Comput. Appl."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Qin, Y., Song, D., Chen, H., Cheng, W., Jiang, G., and Cottrell, G. (2017). A dual-stage attention-based recurrent neural network for time series prediction. arXiv.","DOI":"10.24963\/ijcai.2017\/366"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1421","DOI":"10.1007\/s10994-019-05815-0","article-title":"Temporal pattern attention for multivariate time series forecasting","volume":"108","author":"Shih","year":"2019","journal-title":"Mach. Learn."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1016\/S0925-2312(03)00372-2","article-title":"Financial time series forecasting using support vector machines","volume":"55","author":"Kim","year":"2003","journal-title":"Neurocomputing"},{"key":"ref_44","unstructured":"Kitaev, N., Kaiser, \u0141., and Levskaya, A. (2020). Reformer: The efficient transformer. arXiv."},{"key":"ref_45","unstructured":"Wang, S., Li, B.Z., Khabsa, M., Fang, H., and Ma, H. (2020). Linformer: Self-attention with linear complexity. arXiv."},{"key":"ref_46","first-page":"17283","article-title":"Big bird: Transformers for longer sequences","volume":"33","author":"Zaheer","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1086\/296409","article-title":"Parsimonious modeling of yield curves","volume":"60","author":"Nelson","year":"1987","journal-title":"J. Bus."},{"key":"ref_48","first-page":"13","article-title":"Estimating and interpreting forward interest rates: Sweden 1992\u20131994","volume":"3","author":"Svensson","year":"1995","journal-title":"Sveriges Riksbank Q. Rev."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1016\/j.jeconom.2005.03.005","article-title":"Forecasting the term structure of government bond yields","volume":"130","author":"Diebold","year":"2006","journal-title":"J. Econom."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1016\/j.jimonfin.2013.12.007","article-title":"International channels of the Fed\u2019s unconventional monetary policy","volume":"44","author":"Bauer","year":"2014","journal-title":"J. Int. Money Financ."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"736","DOI":"10.1016\/j.ijforecast.2015.11.017","article-title":"Nonlinear forecasting with many predictors using kernel ridge regression","volume":"32","author":"Exterkate","year":"2016","journal-title":"Int. J. Forecast."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1017\/asb.2024.26","article-title":"Multiple yield curve modeling and forecasting using deep learning","volume":"54","author":"Richman","year":"2024","journal-title":"ASTIN Bull. J. IAA"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"3248","DOI":"10.1080\/00949655.2018.1505197","article-title":"Elements of causal inference: Foundations and learning algorithms","volume":"88","author":"Shanmugam","year":"2018","journal-title":"J. Stat. Comput. Simul."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Sch\u00f6lkopf, B. (2022). Causality for machine learning. Probabilistic and Causal Inference: The Works of Judea Pearl, Association for Computing Machinery.","DOI":"10.1145\/3501714.3501755"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Wang, T., Zhou, C., Sun, Q., and Zhang, H. (2021, January 10\u201317). Causal attention for unbiased visual recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00308"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Sui, Y., Wang, X., Wu, J., Lin, M., He, X., and Chua, T.S. (2022, January 14\u201318). Causal attention for interpretable and generalizable graph classification. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington DC, USA.","DOI":"10.1145\/3534678.3539366"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"1748","DOI":"10.1016\/j.ijforecast.2021.03.012","article-title":"Temporal fusion transformers for interpretable multi-horizon time series forecasting","volume":"37","author":"Lim","year":"2021","journal-title":"Int. J. Forecast."},{"key":"ref_58","unstructured":"Goudet, O., Kalainathan, D., Caillou, P., Guyon, I., Lopez-Paz, D., and Sebag, M. (2017). Causal generative neural networks. arXiv."},{"key":"ref_59","unstructured":"Khemakhem, I., Kingma, D., Monti, R., and Hyvarinen, A. (2020, January 26\u201328). Variational autoencoders and nonlinear ica: A unifying framework. Proceedings of the International Conference on Artificial Intelligence and Statistics. PMLR, Virtual."},{"key":"ref_60","first-page":"82","article-title":"D\u2019ya like dags? A survey on structure learning and causal discovery","volume":"55","author":"Vowels","year":"2022","journal-title":"ACM Comput. Surv."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_62","unstructured":"Oreshkin, B.N., Carpov, D., Chapados, N., and Bengio, Y. (2019). N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. arXiv."},{"key":"ref_63","unstructured":"Kalainathan, D., Goudet, O., Guyon, I., Lopez-Paz, D., and Sebag, M. (2018). Sam: Structural agnostic model, causal discovery and penalized adversarial learning. arXiv."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1198\/073500102753410444","article-title":"Comparing predictive accuracy","volume":"20","author":"Diebold","year":"2002","journal-title":"J. Bus. Econ. Stat."},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1257\/jep.9.4.27","article-title":"Inside the Black Box: The Credit Channel of Monetary Policy Transmission","volume":"9","author":"Bernanke","year":"1995","journal-title":"J. Econ. Perspect."},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1257\/mac.4.2.1","article-title":"Are the Effects of Monetary Policy Shocks Big or Small?","volume":"4","author":"Coibion","year":"2012","journal-title":"Am. Econ. J. Macroecon."},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1257\/mac.20150016","article-title":"Pushing on a String: US Monetary Policy Is Less Powerful in Recessions","volume":"8","author":"Tenreyro","year":"2016","journal-title":"Am. Econ. J. Macroecon."},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1353\/eca.2003.0010","article-title":"The Zero Bound on Interest Rates and Optimal Monetary Policy","volume":"2003","author":"Eggertsson","year":"2003","journal-title":"Brookings Pap. Econ. Act."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Mishkin, F.S. (1996). The Channels of Monetary Transmission: Lessons for Monetary Policy. NBER Working Paper, National Bureau of Economic Research. No. 5464.","DOI":"10.3386\/w5464"}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/17\/10\/1591\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:48:46Z","timestamp":1760035726000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/17\/10\/1591"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,24]]},"references-count":69,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,10]]}},"alternative-id":["sym17101591"],"URL":"https:\/\/doi.org\/10.3390\/sym17101591","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,24]]}}}