{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T21:32:26Z","timestamp":1780522346738,"version":"3.54.1"},"reference-count":285,"publisher":"ASME International","issue":"1","license":[{"start":{"date-parts":[[2023,11,24]],"date-time":"2023-11-24T00:00:00Z","timestamp":1700784000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.asme.org\/publications-submissions\/publishing-information\/legal-policies"}],"content-domain":{"domain":["asmedigitalcollection.asme.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>In the rapidly advancing field of multi-modal machine learning (MMML), the convergence of multiple data modalities has the potential to reshape various applications. This paper presents a comprehensive overview of the current state, advancements, and challenges of MMML within the sphere of engineering design. The review begins with a deep dive into five fundamental concepts of MMML: multi-modal information representation, fusion, alignment, translation, and co-learning. Following this, we explore the cutting-edge applications of MMML, placing a particular emphasis on tasks pertinent to engineering design, such as cross-modal synthesis, multi-modal prediction, and cross-modal information retrieval. Through this comprehensive overview, we highlight the inherent challenges in adopting MMML in engineering design, and proffer potential directions for future research. To spur on the continued evolution of MMML in engineering design, we advocate for concentrated efforts to construct extensive multi-modal design datasets, develop effective data-driven MMML techniques tailored to design applications, and enhance the scalability and interpretability of MMML models. MMML models, as the next generation of intelligent design tools, hold a promising future to impact how products are designed.<\/jats:p>","DOI":"10.1115\/1.4063954","type":"journal-article","created":{"date-parts":[[2023,11,1]],"date-time":"2023-11-01T05:42:50Z","timestamp":1698817370000},"update-policy":"https:\/\/doi.org\/10.1115\/crossmarkpolicy-asme","source":"Crossref","is-referenced-by-count":66,"title":["Multi-Modal Machine Learning in Engineering Design: A Review and Future Directions"],"prefix":"10.1115","volume":"24","author":[{"given":"Binyang","family":"Song","sequence":"first","affiliation":[{"name":"Virginia Tech Department of Industrial and Systems Engineering, , Blacksburg, VA 24060"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rui","family":"Zhou","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology Department of Mechanical Engineering, , Cambridge, MA 02139"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Faez","family":"Ahmed","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology Department of Mechanical Engineering, , Cambridge, MA 02139"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"33","published-online":{"date-parts":[[2023,11,24]]},"reference":[{"issue":"8","key":"2023112408330717500_CIT0001","doi-asserted-by":"publisher","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","article-title":"Representation Learning: A Review and New Perspectives","volume":"35","author":"Bengio","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"issue":"5","key":"2023112408330717500_CIT0002","doi-asserted-by":"publisher","first-page":"051403","DOI":"10.1115\/1.4039450","article-title":"Multiple Surrogate-Assisted Many-Objective Optimization for Computationally Expensive Engineering Design","volume":"140","author":"Bhattacharjee","year":"2018","journal-title":"ASME J. Mech. Des."},{"issue":"4","key":"2023112408330717500_CIT0003","doi-asserted-by":"publisher","first-page":"041409","DOI":"10.1115\/1.4056598","article-title":"Biologically Inspired Design Concept Generation Using Generative Pre-Trained Transformers","volume":"145","author":"Zhu","year":"2023","journal-title":"ASME J. Mech. Des."},{"issue":"4","key":"2023112408330717500_CIT0004","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1115\/1.4056220","article-title":"Generative Transformers for Design Concept Generation","volume":"23","author":"Zhu","year":"2023","journal-title":"ASME J. Comput. Inf. Sci. Eng."},{"key":"2023112408330717500_CIT0005","first-page":"610","article-title":"PcDGAN: A Continuous Conditional Diverse Generative Adversarial Network for Inverse Design","author":"Nobari","year":"2021"},{"key":"2023112408330717500_CIT0006","doi-asserted-by":"publisher","first-page":"106873","DOI":"10.1016\/j.knosys.2021.106873","article-title":"Guiding Data-Driven Design Ideation by Knowledge Distance","volume":"218","author":"Luo","year":"2021","journal-title":"Knowl. Based Syst."},{"issue":"1","key":"2023112408330717500_CIT0007","doi-asserted-by":"publisher","first-page":"011002","DOI":"10.1115\/1.4062454","article-title":"What\u2019s in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models Through User-Provided Names in Computer Aided Design Files","volume":"24","author":"Meltzer","year":"2024","journal-title":"ASME J. Comput. Inf. Sci. Eng."},{"issue":"4","key":"2023112408330717500_CIT0008","doi-asserted-by":"publisher","first-page":"041410","DOI":"10.1115\/1.4056669","article-title":"Attention-Enhanced Multimodal Learning for Conceptual Design Evaluations","volume":"145","author":"Song","year":"2023","journal-title":"ASME J. Mech. Des."},{"issue":"3","key":"2023112408330717500_CIT0009","doi-asserted-by":"publisher","first-page":"031002","DOI":"10.1115\/1.4049895","article-title":"A Digital Twin-Driven Method for Product Performance Evaluation Based on Intelligent Psycho-Physiological Analysis","volume":"21","author":"Feng","year":"2021","journal-title":"ASME J. Comput. Inf. Sci. Eng."},{"key":"2023112408330717500_CIT0010","first-page":"V03BT03A039","article-title":"Range-GAN: Range-Constrained Generative Adversarial Network for Conditioned Design Synthesis","volume-title":"Proceedings of the ASME Design Engineering Technical Conference","author":"Nobari","year":"2021"},{"key":"2023112408330717500_CIT0011","doi-asserted-by":"crossref","DOI":"10.1115\/DETC2023-117216","article-title":"Counterfactuals for Design: A Model-Agnostic Method For Design Recommendations","author":"Regenwetter","year":"2023"},{"key":"2023112408330717500_CIT0012","doi-asserted-by":"publisher","first-page":"1777","DOI":"10.1017\/pds.2022.180","article-title":"Assessing Machine Learnability of Image and Graph Representations for Drone Performance Prediction","volume":"2","author":"Song","year":"2022","journal-title":"Proc. Des. Soc."},{"issue":"4","key":"2023112408330717500_CIT0013","first-page":"26","article-title":"Design Prototypes: A Knowledge Representation Schema for Design","volume":"11","author":"Gero","year":"1990","journal-title":"AI Mag."},{"key":"2023112408330717500_CIT0014","first-page":"257","volume-title":"Design Creativity","author":"Tseng","year":"2011"},{"issue":"7","key":"2023112408330717500_CIT0015","doi-asserted-by":"publisher","first-page":"071408","DOI":"10.1115\/1.4030181","article-title":"Connections Between the Design Tool, Design Attributes, and User Preferences in Early Stage Design","volume":"137","author":"H\u00e4ggman","year":"2015","journal-title":"ASME J. Mech. Des."},{"key":"2023112408330717500_CIT0016","first-page":"V007T06A037","article-title":"How It Is Made Matters: Distinguishing Traits of Designs Created by Sketches, Prototypes, and CAD","volume-title":"International Design Engineering Technical Conferences and Computers and Information in Engineering Conference","author":"Tsai","year":"2017"},{"issue":"4","key":"2023112408330717500_CIT0017","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1016\/S0142-694X(98)00015-5","article-title":"Drawings and the Design Process: A Review of Protocol Studies in Design and Other Disciplines and Related Research in Cognitive Psychology","volume":"19","author":"Purcell","year":"1998","journal-title":"Des. Stud."},{"issue":"2","key":"2023112408330717500_CIT0018","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1016\/0097-8493(90)90037-X","article-title":"The Importance of Drawing in the Mechanical Design Process","volume":"14","author":"Ullman","year":"1990","journal-title":"Comput. Graph."},{"key":"2023112408330717500_CIT0019","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1016\/j.chb.2016.08.024","article-title":"Effects of 3D CAD Applications on the Design Creativity of Students With Different Representational Abilities","volume":"65","author":"Chang","year":"2016","journal-title":"Comput. Human Behav."},{"key":"2023112408330717500_CIT0020","doi-asserted-by":"publisher","first-page":"110","DOI":"10.1016\/j.destud.2015.10.005","article-title":"The Effects of Representation on Idea Generation and Design Fixation: A Study Comparing Sketches and Function Trees","volume":"42","author":"Atilola","year":"2016","journal-title":"Des. Stud."},{"issue":"1","key":"2023112408330717500_CIT0021","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1260\/1478077053739667","article-title":"An Assessment of the Effectiveness of Sketch Representations in Early Stage Digital Design","volume":"3","author":"Hannibal","year":"2016","journal-title":"Int. J. Archit. Comput."},{"issue":"2","key":"2023112408330717500_CIT0022","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1017\/S0890060415000049","article-title":"Representing Analogies to Influence Fixation and Creativity: A Study Comparing Computer-Aided Design, Photographs, and Sketches","volume":"29","author":"Atilola","year":"2015","journal-title":"Artif. Intell. Eng. Des. Anal. Manuf."},{"issue":"9","key":"2023112408330717500_CIT0023","doi-asserted-by":"publisher","first-page":"091008","DOI":"10.1115\/1.4024724","article-title":"Impact of Product Design Representation on Customer Judgment","volume":"135","author":"Reid","year":"2013","journal-title":"ASME J. Mech. Des."},{"issue":"6","key":"2023112408330717500_CIT0024","doi-asserted-by":"publisher","first-page":"649","DOI":"10.1016\/j.destud.2005.04.005","article-title":"A Study of Prototypes, Design Activity, and Design Outcome","volume":"26","author":"Yang","year":"2005","journal-title":"Des. Stud."},{"key":"2023112408330717500_CIT0025","first-page":"39","article-title":"Influence of Design Representation on Effectiveness of Idea Generation","author":"McKoy","year":"2020"},{"issue":"3\u20134","key":"2023112408330717500_CIT0026","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1080\/21650349.2014.943295","article-title":"Data-Intensive Evaluation of Design Creativity Using Novelty, Value, and Surprise","volume":"3","author":"Grace","year":"2014","journal-title":"Int. J. Des. Creat. Innov."},{"issue":"1","key":"2023112408330717500_CIT0027","doi-asserted-by":"publisher","first-page":"1413","DOI":"10.1017\/dsi.2019.147","article-title":"Assessing Concept Novelty Potential With Lexical and Distributional Word Similarity for Innovative Design","volume":"1","author":"Nomaguchi","year":"2019","journal-title":"Proc. Des. Soc. Int. Conf. Eng. Des."},{"issue":"5","key":"2023112408330717500_CIT0028","doi-asserted-by":"publisher","first-page":"051403","DOI":"10.1115\/1.4029768","article-title":"A Machine Learning-Based Design Representation Method for Designing Heterogeneous Microstructures","volume":"137","author":"Xu","year":"2015","journal-title":"ASME J. Mech. Des."},{"key":"2023112408330717500_CIT0029","volume-title":"Product Design: Techniques in Reverse Engineering and New Product Development.","author":"Wood","year":"2001"},{"issue":"5","key":"2023112408330717500_CIT0030","doi-asserted-by":"publisher","first-page":"051101","DOI":"10.1115\/1.4029519","article-title":"Integrating Function- and Affordance-Based Design Representations","volume":"137","author":"Ciavola","year":"2015","journal-title":"ASME J. Mech. Des."},{"key":"2023112408330717500_CIT0031","volume-title":"Product Design and Development","author":"Ulrich","year":"2000"},{"issue":"1","key":"2023112408330717500_CIT0032","doi-asserted-by":"publisher","first-page":"1067","DOI":"10.21278\/idc.2018.0118","article-title":"Issues Related to Missing Attributes in Aposteriori Novelty Assessments","volume":"3","author":"Fiorineschi","year":"2018","journal-title":"Proc. Int. Des. Conf."},{"issue":"3","key":"2023112408330717500_CIT0033","doi-asserted-by":"publisher","first-page":"785","DOI":"10.1115\/1.2919451","article-title":"Conversions of Feature-Based Design Representations Using Graph Grammar Parsing","volume":"116","author":"Rosen","year":"1994","journal-title":"ASME J. Mech. Des."},{"issue":"10","key":"2023112408330717500_CIT0034","doi-asserted-by":"publisher","first-page":"104501","DOI":"10.1115\/1.4046806","article-title":"Using Recurrent Neural Networks to Model Spatial Grammars for Design Creation","volume":"142","author":"Yukish","year":"2020","journal-title":"ASME J. Mech. Des."},{"issue":"1","key":"2023112408330717500_CIT0035","doi-asserted-by":"publisher","first-page":"011010","DOI":"10.1115\/1.4025961","article-title":"A Scheme for Numerical Representation of Graph Structures in Engineering Design","volume":"136","author":"Wyatt","year":"2014","journal-title":"ASME J. Mech. Des."},{"issue":"4","key":"2023112408330717500_CIT0036","doi-asserted-by":"publisher","first-page":"041411","DOI":"10.1115\/1.4056799","article-title":"Generative Design: Reframing the Role of the Designer in Early-Stage Design Process","volume":"145","author":"Saadi","year":"2023","journal-title":"ASME J. Mech. Des."},{"issue":"3","key":"2023112408330717500_CIT0037","doi-asserted-by":"publisher","first-page":"317","DOI":"10.1017\/S0890060412000170","article-title":"Computer-Aided Design Versus Sketching: An Exploratory Case Study","volume":"26","author":"Veisz","year":"2012","journal-title":"Artif. Intell. Eng. Des. Anal. Manuf."},{"key":"2023112408330717500_CIT0038","first-page":"42","article-title":"Media and Representations in Product Design Education","author":"Babapour","year":"2014"},{"issue":"4","key":"2023112408330717500_CIT0039","doi-asserted-by":"publisher","first-page":"405","DOI":"10.1115\/1.2353856","article-title":"As-Built Modeling of Objects for Performance Assessment","volume":"6","author":"Kokko","year":"2006","journal-title":"ASME J. Comput. Inf. Sci. Eng."},{"issue":"3","key":"2023112408330717500_CIT0040","doi-asserted-by":"publisher","first-page":"034501","DOI":"10.1115\/1.4050531","article-title":"Tool Wear Online Monitoring Method Based on DT and SSAE-PHMM","volume":"21","author":"Zhang","year":"2021","journal-title":"ASME J. Comput. Inf. Sci. Eng."},{"issue":"2","key":"2023112408330717500_CIT0041","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1109\/TPAMI.2018.2798607","article-title":"Multimodal Machine Learning: A Survey and Taxonomy","volume":"41","author":"Baltrusaitis","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"issue":"3","key":"2023112408330717500_CIT0042","doi-asserted-by":"publisher","first-page":"478","DOI":"10.1109\/JSTSP.2020.2987728","article-title":"Multimodal Intelligence: Representation Learning, Information Fusion, and Applications","volume":"14","author":"Zhang","year":"2019","journal-title":"IEEE J. Select. Top. Signal Process."},{"issue":"2","key":"2023112408330717500_CIT0043","doi-asserted-by":"publisher","first-page":"022001","DOI":"10.1088\/2516-1091\/acc2fe","article-title":"Deep Multi-Modal Fusion of Image and Non-Image Data in Disease Diagnosis and Prognosis: A Review","volume":"5","author":"Cui","year":"2022","journal-title":"Progr. Biomed. Eng."},{"issue":"4","key":"2023112408330717500_CIT0044","doi-asserted-by":"publisher","first-page":"041401","DOI":"10.1115\/1.4056436","article-title":"Deep-Learning Methods of Cross-Modal Tasks for Conceptual Design of Product Shapes: A Review","volume":"145","author":"Li","year":"2023","journal-title":"ASME J. Mech. Des."},{"key":"2023112408330717500_CIT0045","first-page":"8780","article-title":"Diffusion Models Beat GANs on Image Synthesis","volume":"11","author":"Dhariwal","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"2023112408330717500_CIT0046","first-page":"16784","article-title":"GLIDE: Towards Photorealistic Image Generation and Editing With Text-Guided Diffusion Models","author":"Nichol","year":"2022"},{"key":"2023112408330717500_CIT0047","first-page":"2426","article-title":"DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation","author":"Kim","year":"2021"},{"key":"2023112408330717500_CIT0048","article-title":"DeViSE: A Deep Visual-Semantic Embedding Model","author":"Frome","year":"2013"},{"key":"2023112408330717500_CIT0049","first-page":"171","article-title":"Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning","author":"Rajendran","year":"2016"},{"key":"2023112408330717500_CIT0050","first-page":"171","article-title":"Multimodal Learning With Deep Boltzmann Machines","author":"Srivastava","year":"2012"},{"key":"2023112408330717500_CIT0051","doi-asserted-by":"crossref","DOI":"10.1109\/RIVF51545.2021.9642125","article-title":"Multimodal Fusion With BERT and Attention Mechanism for Fake News Detection","author":"Duc Tuan","year":"2021"},{"key":"2023112408330717500_CIT0052","first-page":"V006T06A017","article-title":"Hey, AI! Can You See What I See? Multimodal Transfer Learning-Based Design Metrics Prediction for Sketches With Text Descriptions","volume-title":"International Design Engineering Technical Conferences and Computers and Information in Engineering Conference","author":"Song","year":"2022"},{"issue":"2","key":"2023112408330717500_CIT0053","doi-asserted-by":"publisher","first-page":"021403","DOI":"10.1115\/1.4052366","article-title":"Leveraging End-User Data for Enhanced Design Concept Evaluation: A Multimodal Deep Regression Model","volume":"144","author":"Yuan","year":"2022","journal-title":"ASME J. Mech. Des."},{"key":"2023112408330717500_CIT0054","first-page":"10484","article-title":"Multi-Task Learning of Hierarchical Vision-Language Representation","author":"Nguyen","year":"2018"},{"key":"2023112408330717500_CIT0055","first-page":"11336","article-title":"Unicoder-VL: A Universal Encoder for Vision and Language by Cross-Modal Pre-Training","author":"Li","year":"2020"},{"key":"2023112408330717500_CIT0056","article-title":"VL-BERT: Pre-Training of Generic Visual-Linguistic Representations","author":"Su","year":"2019"},{"key":"2023112408330717500_CIT0057","article-title":"VisualBERT: A Simple and Performant Baseline for Vision and Language","author":"Li","year":"2019"},{"key":"2023112408330717500_CIT0058","first-page":"2131","article-title":"Fusion of Detected Objects in Text for Visual Question Answering","author":"Alberti","year":"2019"},{"key":"2023112408330717500_CIT0059","first-page":"7463","article-title":"VideoBERT: A Joint Model for Video and Language Representation Learning","author":"Sun","year":"2019"},{"key":"2023112408330717500_CIT0060","article-title":"Multimodal Deep Learning","author":"Ngiam","year":"2011"},{"key":"2023112408330717500_CIT0061","first-page":"721","article-title":"Learning Grounded Meaning Representations With Autoencoders","author":"Silberer","year":"2014"},{"key":"2023112408330717500_CIT0062","first-page":"7","article-title":"Cross-Modal Retrieval With Correspondence Autoencoder","author":"Feng","year":"2014"},{"key":"2023112408330717500_CIT0063","first-page":"8748","article-title":"Learning Transferable Visual Models From Natural Language Supervision","author":"Radford","year":"2021"},{"key":"2023112408330717500_CIT0064","first-page":"1247","article-title":"Deep Canonical Correlation Analysis","author":"Andrew","year":"2013"},{"key":"2023112408330717500_CIT0065","first-page":"5447","article-title":"Deep Multimodal Representation Learning From Temporal Data","author":"Yang","year":"2017"},{"key":"2023112408330717500_CIT0066","first-page":"15535","article-title":"Learning Representations by Maximizing Mutual Information Across Views","author":"Bachman","year":"2019"},{"key":"2023112408330717500_CIT0067","first-page":"1","article-title":"Contrastive Learning of Medical Visual Representations From Paired Images and Text","volume":"182","author":"Zhang","year":"2020","journal-title":"Proc. Mach. Learn. Res."},{"key":"2023112408330717500_CIT0068","article-title":"Unifying Visual-Semantic Embeddings With Multimodal Neural Language Models","author":"Kiros","year":"2014"},{"key":"2023112408330717500_CIT0069","first-page":"2333","article-title":"Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data","author":"Huang","year":"2013"},{"issue":"4","key":"2023112408330717500_CIT0070","doi-asserted-by":"publisher","first-page":"664","DOI":"10.1109\/TPAMI.2016.2598339","article-title":"Deep Visual-Semantic Alignments for Generating Image Descriptions","volume":"39","author":"Karpathy","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"issue":"Jan.","key":"2023112408330717500_CIT0071","first-page":"1889","article-title":"Deep Fragment Embeddings for Bidirectional Image Sentence Mapping","volume":"3","author":"Karpathy","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"2023112408330717500_CIT0072","first-page":"6602","article-title":"Unified Visual-Semantic Embeddings: Bridging Vision and Language With Structured Meaning Representations","author":"Wu","year":"2019"},{"issue":"1","key":"2023112408330717500_CIT0073","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1007\/s11263-016-0965-7","article-title":"Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models","volume":"123","author":"Plummer","year":"2015","journal-title":"Int. J. Comput. Vision"},{"key":"2023112408330717500_CIT0074","first-page":"5100","article-title":"LXMERT: Learning Cross-Modality Encoder Representations From Transformers","author":"Tan","year":"2019"},{"key":"2023112408330717500_CIT0075","article-title":"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks","author":"Lu","year":"2019"},{"key":"2023112408330717500_CIT0076","article-title":"OmniNet: A Unified Architecture for Multi-modal Multi-task Learning","author":"Pramanik","year":"2019"},{"key":"2023112408330717500_CIT0077","article-title":"IC3D: Image-Conditioned 3D Diffusion for Shape Generation","author":"Sbrolli","year":"2022"},{"key":"2023112408330717500_CIT0078","first-page":"284","article-title":"Deep Multimodal Fusion for Persuasiveness Prediction","author":"Nojavanasghari","year":"2016"},{"key":"2023112408330717500_CIT0079","article-title":"Neural Language Modeling With Visual Features","volume-title":"Undefined.","author":"Anastasopoulos","year":"2019"},{"key":"2023112408330717500_CIT0080","first-page":"575","article-title":"CentralNet: A Multilayer Approach for Multimodal Fusion","volume-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)","author":"Vielzeuf","year":"2019"},{"issue":"5","key":"2023112408330717500_CIT0081","doi-asserted-by":"publisher","first-page":"051004","DOI":"10.1115\/1.4054001","article-title":"Concise and Effective Network for 3D Human Modeling From Orthogonal Silhouettes","volume":"22","author":"Liu","year":"2022","journal-title":"ASME J. Comput. Inf. Sci. Eng."},{"key":"2023112408330717500_CIT0082","first-page":"160","article-title":"Black Holes and White Rabbits: Metaphor Identification With Visual Features","author":"Shutova","year":"2016"},{"key":"2023112408330717500_CIT0083","first-page":"1445","article-title":"Deep Visual-Semantic Hashing for Cross-Modal Retrieval","author":"Cao","year":"2016"},{"key":"2023112408330717500_CIT0084","first-page":"517","article-title":"Multiple Kernel Learning for Emotion Recognition in the Wild","author":"Sikka","year":"2013"},{"key":"2023112408330717500_CIT0085","first-page":"153","volume-title":"Majority Vote of Diverse Classifiers for Late Fusion","author":"Morvant","year":"2014"},{"key":"2023112408330717500_CIT0086","first-page":"6959","article-title":"MFAS: Multimodal Fusion Architecture Search","author":"Perez-Rua","year":"2019"},{"issue":"3","key":"2023112408330717500_CIT0087","doi-asserted-by":"publisher","first-page":"1001","DOI":"10.1002\/hbm.24428","article-title":"Effective Feature Learning and Fusion of Multimodality Data Using Stage-Wise Deep Neural Network for Dementia Diagnosis","volume":"40","author":"Zhou","year":"2019","journal-title":"Human Brain Map."},{"key":"2023112408330717500_CIT0088","article-title":"Neural Architecture Search With Reinforcement Learning","author":"Zoph","year":"2016"},{"issue":"6","key":"2023112408330717500_CIT0089","doi-asserted-by":"publisher","first-page":"1247","DOI":"10.1162\/089976600300015349","article-title":"Separating Style and Content With Bilinear Models","volume":"12","author":"Tenenbaum","year":"2000","journal-title":"Neur. Comput."},{"key":"2023112408330717500_CIT0090","first-page":"1103","article-title":"Tensor Fusion Network for Multimodal Sentiment Analysis","author":"Zadeh","year":"2017"},{"issue":"4","key":"2023112408330717500_CIT0091","doi-asserted-by":"publisher","first-page":"757","DOI":"10.1109\/TMI.2020.3021387","article-title":"Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis","volume":"41","author":"Chen","year":"2019","journal-title":"IEEE Trans. Med. Imag."},{"key":"2023112408330717500_CIT0092","article-title":"Hadamard Product for Low-Rank Bilinear Pooling","author":"Kim","year":"2017"},{"key":"2023112408330717500_CIT0093","first-page":"1839","article-title":"Multi-Modal Factorized Bilinear Pooling With Co-Attention Learning for Visual Question Answering","author":"Yu","year":"2017"},{"issue":"12","key":"2023112408330717500_CIT0094","doi-asserted-by":"publisher","first-page":"5947","DOI":"10.1109\/TNNLS.2018.2817340","article-title":"Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering","volume":"29","author":"Yu","year":"2017","journal-title":"IEEE Trans. Neur. Netw. Learn. Syst."},{"key":"2023112408330717500_CIT0095","first-page":"317","article-title":"Compact Bilinear Pooling","author":"Gao","year":"2015"},{"key":"2023112408330717500_CIT0096","first-page":"457","article-title":"Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding","author":"Fukui","year":"2016"},{"key":"2023112408330717500_CIT0097","first-page":"2631","article-title":"MUTAN: Multimodal Tucker Fusion for Visual Question Answering","author":"Ben-Younes","year":"2017"},{"issue":"3","key":"2023112408330717500_CIT0098","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/BF02289464","article-title":"Some Mathematical Notes on Three-Mode Factor Analysis","volume":"31","author":"Tucker","year":"1966","journal-title":"Psychometrika"},{"key":"2023112408330717500_CIT0099","first-page":"8102","article-title":"BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection","author":"Ben-Younes","year":"2019"},{"key":"2023112408330717500_CIT0100","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TEM.2022.3152216","article-title":"Deep Learning for Technical Document Classification","author":"Jiang","year":"2022","journal-title":"IEEE Trans. Eng. Manage."},{"key":"2023112408330717500_CIT0101","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1016\/j.media.2018.06.001","article-title":"Disease Prediction Using Graph Convolutional Networks: Application to Autism Spectrum Disorder and Alzheimer\u2019s Disease","volume":"48","author":"Parisot","year":"2018","journal-title":"Med. Image Anal."},{"key":"2023112408330717500_CIT0102","doi-asserted-by":"publisher","first-page":"103015","DOI":"10.1016\/j.bspc.2021.103015","article-title":"Using DeepGCN to Identify the Autism Spectrum Disorder From Multi-site Resting-state Data","volume":"70","author":"Cao","year":"2021","journal-title":"Biomed. Signal Process. Contr."},{"issue":"2","key":"2023112408330717500_CIT0103","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1109\/TPAMI.2018.2798607","article-title":"Multimodal Machine Learning: A Survey and Taxonomy","volume":"41","author":"Baltrusaitis","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"2023112408330717500_CIT0104","first-page":"5999","article-title":"Attention is All You Need","author":"Vaswani","year":"2017"},{"key":"2023112408330717500_CIT0105","article-title":"Neural Turing Machines","author":"Graves","year":"2014"},{"key":"2023112408330717500_CIT0106","article-title":"Neural Machine Translation by Jointly Learning to Align and Translate","author":"Bahdanau","year":"2014"},{"key":"2023112408330717500_CIT0107","first-page":"4995","article-title":"Visual7W: Grounded Question Answering in Images","author":"Zhu","year":"2016"},{"key":"2023112408330717500_CIT0108","first-page":"4613","article-title":"Where To Look: Focus Regions for Visual Question Answering","author":"Shih","year":"2015"},{"key":"2023112408330717500_CIT0109","first-page":"451","article-title":"Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering","volume-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)","author":"Xu","year":"2015"},{"key":"2023112408330717500_CIT0110","first-page":"6077","article-title":"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering","author":"Anderson","year":"2017"},{"key":"2023112408330717500_CIT0111","article-title":"Generating Images From Captions With Attention","author":"Mansimov","year":"2015"},{"key":"2023112408330717500_CIT0112","first-page":"1316","article-title":"AttnGAN: Fine-Grained Text to Image Generation With Attentional Generative Adversarial Networks","author":"Xu","year":"2018"},{"key":"2023112408330717500_CIT0113","first-page":"12166","article-title":"Object-Driven Text-to-Image Synthesis Via Adversarial Training","author":"Li","year":"2019"},{"key":"2023112408330717500_CIT0114","first-page":"2156","article-title":"Dual Attention Networks for Multimodal Reasoning and Matching","author":"Nam","year":"2017"},{"key":"2023112408330717500_CIT0115","first-page":"737","article-title":"Hierarchical Question-Image Co-Attention for Visual Question Answering","author":"Elsen","year":"2016"},{"key":"2023112408330717500_CIT0116","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1016\/j.cviu.2019.05.001","article-title":"Dual Recurrent Attention Units for Visual Question Answering","volume":"185","author":"Osman","year":"2018","journal-title":"Comput. Vision Imag. Understand."},{"key":"2023112408330717500_CIT0117","first-page":"3665","article-title":"High-Order Attention Models for Visual Question Answering","author":"Schwartz","year":"2017"},{"key":"2023112408330717500_CIT0118","first-page":"21","article-title":"Stacked Attention Networks for Image Question Answering","author":"Yang","year":"2015"},{"key":"2023112408330717500_CIT0119","first-page":"1072","article-title":"Stacked Latent Attention for Multimodal Reasoning","author":"Fan","year":"2018"},{"key":"2023112408330717500_CIT0120","first-page":"3574","article-title":"Dynamic Memory Networks for Visual and Textual Question Answering","author":"Xiong","year":"2016"},{"key":"2023112408330717500_CIT0121","first-page":"6","article-title":"Faster R-CNN: Towards Real-Time Object Detection With Region Proposal Networks","author":"Ren","year":"2015"},{"key":"2023112408330717500_CIT0122","first-page":"7218","article-title":"Co-Attending Free-Form Regions and Detections With Multi-modal Multiplicative Feature Embedding for Visual Question Answering","author":"Lu","year":"2018"},{"key":"2023112408330717500_CIT0123","first-page":"10674","article-title":"High-Resolution Image Synthesis With Latent Diffusion Models","author":"Rombach","year":"2021"},{"key":"2023112408330717500_CIT0124","article-title":"Data2vec: A General Framework for Self-Supervised Learning in Speech, Vision and Language","author":"Baevski","year":"2022"},{"key":"2023112408330717500_CIT0125","first-page":"361","article-title":"Multimodal Residual Learning for Visual QA","volume-title":"Advances in Neural Information Processing Systems","author":"Kim","year":"2016"},{"key":"2023112408330717500_CIT0126","article-title":"Gated Multimodal Units for Information Fusion","volume-title":"5th International Conference on Learning Representations, ICLR 2017 \u2013 Workshop Track Proceedings","author":"Arevalo","year":"2017"},{"key":"2023112408330717500_CIT0127","first-page":"30","article-title":"Image Question Answering Using Convolutional Neural Network With Dynamic Parameter Prediction","volume-title":"Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition","author":"Noh","year":"2015"},{"issue":"11","key":"2023112408330717500_CIT0128","doi-asserted-by":"publisher","first-page":"111405","DOI":"10.1115\/1.4044229","article-title":"Deep Generative Design: Integration of Topology Optimization and Generative Models","volume":"141","author":"Oh","year":"2019","journal-title":"ASME J. Mech. Des."},{"issue":"2","key":"2023112408330717500_CIT0129","doi-asserted-by":"publisher","first-page":"021712","DOI":"10.1115\/1.4052846","article-title":"Inverse Design of Two-Dimensional Airfoils Using Conditional Generative Models and Surrogate Log-Likelihoods","volume":"144","author":"Chen","year":"2022","journal-title":"ASME J. Mech. Des."},{"key":"2023112408330717500_CIT0130","first-page":"1","article-title":"Generative Adversarial Networks","volume-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), NeurIPS","author":"Tolstikhin","year":"2014"},{"key":"2023112408330717500_CIT0131","article-title":"Conditional Generative Adversarial Nets","author":"Mirza","year":"2014"},{"key":"2023112408330717500_CIT0132","first-page":"1681","article-title":"Generative Adversarial Text to Image Synthesis","author":"Reed","year":"2016"},{"key":"2023112408330717500_CIT0133","first-page":"5908","article-title":"StackGAN: Text to Photo-Realistic Image Synthesis With Stacked Generative Adversarial Networks","author":"Zhang","year":"2016"},{"issue":"8","key":"2023112408330717500_CIT0134","doi-asserted-by":"publisher","first-page":"1947","DOI":"10.1109\/TPAMI.2018.2856256","article-title":"StackGAN++: Realistic Image Synthesis With Stacked Generative Adversarial Networks","volume":"41","author":"Zhang","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"2023112408330717500_CIT0135","first-page":"5795","article-title":"DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis","author":"Zhu","year":"2019"},{"key":"2023112408330717500_CIT0136","first-page":"6199","article-title":"Photographic Text-to-Image Synthesis With a Hierarchically-nested Adversarial Network","author":"Zhang","year":"2018"},{"key":"2023112408330717500_CIT0137","article-title":"TAC-GAN \u2013 Text Conditioned Auxiliary Classifier Generative Adversarial Network","author":"Dash","year":"2017"},{"key":"2023112408330717500_CIT0138","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v33i01.33013272","article-title":"Adversarial Learning of Semantic Relevance in Text to Image Synthesis","author":"Cha","year":"2019"},{"key":"2023112408330717500_CIT0139","first-page":"1505","article-title":"MirrorGAN: Learning Text-to-Image Generation by Redescription","author":"Qiao","year":"2019"},{"key":"2023112408330717500_CIT0140","first-page":"217","article-title":"Learning What and Where to Draw","author":"Reed","year":"2016"},{"key":"2023112408330717500_CIT0141","first-page":"8576","article-title":"Image Generation From Layout","author":"Zhao","year":"2018"},{"key":"2023112408330717500_CIT0142","article-title":"Generating Multiple Objects at Spatially Distinct Locations","author":"Hinz","year":"2019"},{"key":"2023112408330717500_CIT0143","first-page":"7986","article-title":"Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis","author":"Hong","year":"2018"},{"key":"2023112408330717500_CIT0144","first-page":"1219","article-title":"Image Generation From Scene Graphs","author":"Johnson","year":"2018"},{"key":"2023112408330717500_CIT0145","article-title":"Deep Captioning With Multimodal Recurrent Neural Networks (m-RNN)","author":"Mao","year":"2014"},{"key":"2023112408330717500_CIT0146","article-title":"Neural Discrete Representation Learning","author":"van den Oord","year":"2017"},{"key":"2023112408330717500_CIT0147","first-page":"18582","article-title":"Clip-Forge: Towards Zero-Shot Text-to-Shape Generation","author":"Sanghi","year":"2022"},{"key":"2023112408330717500_CIT0148","first-page":"4155","article-title":"Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training","author":"Shetty","year":"2017"},{"key":"2023112408330717500_CIT0149","doi-asserted-by":"crossref","DOI":"10.1109\/ic-ETITE47903.2020.049","article-title":"A Review of Convolutional Neural Networks","author":"Ajit","year":"2020"},{"issue":"12","key":"2023112408330717500_CIT0150","doi-asserted-by":"publisher","first-page":"6999","DOI":"10.1109\/TNNLS.2021.3084827","article-title":"A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects","volume":"33","author":"Li","year":"2021","journal-title":"IEEE Trans. Neur. Netw. Learning Syst."},{"key":"2023112408330717500_CIT0151","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1016\/bs.host.2018.07.006","article-title":"Deep Neural Networks for Natural Language Processing","volume":"38","author":"Fathi","year":"2018","journal-title":"Handb. Statist."},{"key":"2023112408330717500_CIT0152","first-page":"1","article-title":"Distributed Representations of Words and Phrases and Their Compositionality","author":"Mikolov","year":"2013"},{"key":"2023112408330717500_CIT0153","doi-asserted-by":"crossref","DOI":"10.3115\/v1\/P15-2018","article-title":"A Distributed Representation Based Query Expansion Approach for Image Captioning","author":"Yagcioglu","year":"2015"},{"key":"2023112408330717500_CIT0154","article-title":"On the Relationship Between Self-Attention and Convolutional Layers","author":"Cordonnier","year":"2020"},{"key":"2023112408330717500_CIT0155","article-title":"An Image is Worth 16\u00d716 Words: Transformers for Image Recognition at Scale","author":"Dosovitskiy","year":"2021"},{"issue":"3","key":"2023112408330717500_CIT0156","doi-asserted-by":"publisher","first-page":"2585","DOI":"10.1609\/aaai.v36i3.20160","article-title":"End-to-End Transformer Based Model for Image Captioning","volume":"36","author":"Wang","year":"2022","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"2023112408330717500_CIT0157","doi-asserted-by":"publisher","first-page":"89","DOI":"10.1016\/j.neucom.2022.09.136","article-title":"A Survey of Transformer-Based Multimodal Pre-Trained Modals","volume":"515","author":"Han","year":"2023","journal-title":"Neurocomputing"},{"key":"2023112408330717500_CIT0158","first-page":"2246","article-title":"Deep Unsupervised Learning Using Nonequilibrium Thermodynamics","author":"Sohl-Dickstein","year":"2015"},{"issue":"6","key":"2023112408330717500_CIT0159","doi-asserted-by":"publisher","first-page":"060811","DOI":"10.1115\/1.4062542","article-title":"Deep Learning-Driven Design of Robot Mechanisms","volume":"23","author":"Purwar","year":"2023","journal-title":"ASME J. Comput. Inf. Sci. Eng."},{"key":"2023112408330717500_CIT0160","article-title":"Denoising Diffusion Probabilistic Models","author":"Ho","year":"2020"},{"key":"2023112408330717500_CIT0161","article-title":"Denoising Diffusion Implicit Models","author":"Song","year":"2021"},{"key":"2023112408330717500_CIT0162","article-title":"Score-Based Generative Modeling Through Stochastic Differential Equations","author":"Song","year":"2020"},{"key":"2023112408330717500_CIT0163","first-page":"11287","article-title":"Score-Based Generative Modeling in Latent Space","author":"Vahdat","year":"2021"},{"key":"2023112408330717500_CIT0164","first-page":"2836","article-title":"Diffusion Probabilistic Models for 3D Point Cloud Generation","author":"Luo","year":"2021"},{"key":"2023112408330717500_CIT0165","first-page":"5806","article-title":"3D Shape Generation and Completion Through Point-Voxel Diffusion","author":"Zhou","year":"2021"},{"key":"2023112408330717500_CIT0166","article-title":"LION: Latent Point Diffusion Models for 3D Shape Generation","author":"Zeng","year":"2022"},{"key":"2023112408330717500_CIT0167","first-page":"7","article-title":"Point-Voxel CNN for Efficient 3D Deep Learning","author":"Liu","year":"2019"},{"key":"2023112408330717500_CIT0168","article-title":"Classifier-Free Diffusion Guidance","author":"Ho","year":"2022"},{"key":"2023112408330717500_CIT0169","article-title":"Point-E: A System for Generating 3D Point Clouds From Complex Prompts","author":"Nichol","year":"2022"},{"key":"2023112408330717500_CIT0170","article-title":"Hierarchical Text-Conditional Image Generation With CLIP Latents","author":"Ramesh","year":"2022"},{"key":"2023112408330717500_CIT0171","first-page":"11","article-title":"Generation and Comprehension of Unambiguous Object Descriptions","author":"Mao","year":"2015"},{"key":"2023112408330717500_CIT0172","first-page":"3156","article-title":"Show and Tell: A Neural Image Caption Generator","author":"Vinyals","year":"2014"},{"key":"2023112408330717500_CIT0173","first-page":"209","article-title":"The Long-Short Story of Movie Description","volume-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)","author":"Rohrbach","year":"2015"},{"issue":"5","key":"2023112408330717500_CIT0174","doi-asserted-by":"publisher","first-page":"054501","DOI":"10.1115\/1.4054090","article-title":"Prediction of Remaining Useful Life Using Fused Deep Learning Models: A Case Study of Turbofan Engines","volume":"22","author":"Zheng","year":"2022","journal-title":"ASME J. Comput. Inf. Sci. Eng."},{"key":"2023112408330717500_CIT0175","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2206.10789","article-title":"Scaling Autoregressive Models for Content-Rich Text-to-Image Generation","author":"Yu","year":"2022","journal-title":"ArXiv"},{"key":"2023112408330717500_CIT0176","first-page":"19822","article-title":"CogView: Mastering Text-to-Image Generation Via Transformers","author":"Ding","year":"2021"},{"key":"2023112408330717500_CIT0177","first-page":"11157","article-title":"VirTex: Learning Visual Representations From Textual Annotations","author":"Desai","year":"2020"},{"key":"2023112408330717500_CIT0178","first-page":"153","article-title":"Learning Visual Representations With Caption Annotations","author":"Bulent Sariyildiz","year":"2020"},{"key":"2023112408330717500_CIT0179","article-title":"Density Estimation Using Real NVP","author":"Dinh","year":"2017"},{"key":"2023112408330717500_CIT0180","article-title":"Flow-Based GAN for 3D Point Cloud Generation From a Single Image","author":"Wei","year":"2022"},{"key":"2023112408330717500_CIT0181","first-page":"5932","article-title":"Learning Implicit Fields for Generative Shape Modeling","author":"Chen","year":"2018"},{"key":"2023112408330717500_CIT0182","first-page":"11","article-title":"Learning to Infer Implicit Surfaces Without 3D Supervision","author":"Liu","year":"2019"},{"key":"2023112408330717500_CIT0183","first-page":"165","article-title":"DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation","author":"Park","year":"2019"},{"key":"2023112408330717500_CIT0184","first-page":"2234","article-title":"Improved Techniques for Training GANs","author":"Salimans","year":"2016"},{"key":"2023112408330717500_CIT0185","article-title":"GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium","author":"Heusel","year":"2017"},{"key":"2023112408330717500_CIT0186","first-page":"4043","article-title":"Conditional Image Synthesis With Auxiliary Classifier GANs","author":"Odena","year":"2016"},{"key":"2023112408330717500_CIT0187","first-page":"7877","article-title":"ManiGAN: Text-Guided Image Manipulation","author":"Li","year":"2019"},{"key":"2023112408330717500_CIT0188","first-page":"67","article-title":"Learning Representations and Generative Models for 3D Point Clouds","author":"Achlioptas","year":"2017"},{"key":"2023112408330717500_CIT0189","first-page":"3858","article-title":"3D Point Cloud Generative Adversarial Network Based on Tree Structured Graph Convolutions","author":"Shu","year":"2019"},{"key":"2023112408330717500_CIT0190","first-page":"13554","article-title":"3D Shape Generation With Grid-Based Implicit Functions","author":"Ibing","year":"2021"},{"key":"2023112408330717500_CIT0191","article-title":"Zero-Shot Learning Through Cross-Modal Transfer","author":"Socher","year":"2013"},{"key":"2023112408330717500_CIT0192","article-title":"Learning Factorized Multimodal Representations","author":"Tsai","year":"2018"},{"key":"2023112408330717500_CIT0193","first-page":"4247","article-title":"Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions","volume-title":"2015 IEEE International Conference on Computer Vision, ICCV 2015","author":"Ba","year":"2015"},{"key":"2023112408330717500_CIT0194","first-page":"49","article-title":"Learning Deep Representations of Fine-Grained Visual Descriptions","author":"Reed","year":"2016"},{"key":"2023112408330717500_CIT0195","doi-asserted-by":"crossref","DOI":"10.3115\/1699648.1699682","article-title":"Improved Statistical Machine Translation for Resource-Poor Languages Using Related Resource-Rich Languages","author":"Nakov","year":"2009"},{"key":"2023112408330717500_CIT0196","first-page":"1","article-title":"Deep Compositional Captioning: Describing Novel Object Categories Without Paired Training Data","author":"Hendricks","year":"2015"},{"key":"2023112408330717500_CIT0197","first-page":"966","article-title":"Connecting Modalities: Semi-Supervised Segmentation and Annotation of Images Using Unaligned Text Corpora","author":"Socher","year":"2010"},{"key":"2023112408330717500_CIT0198","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1162\/tacl_a_00177","article-title":"Grounded Compositional Semantics for Finding and Describing Images with Sentences","volume":"2","author":"Socher","year":"2014","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"2023112408330717500_CIT0199","article-title":"Visual Information in Semantic Representation","volume-title":"June.","author":"Feng","year":"2010"},{"key":"2023112408330717500_CIT0200","article-title":"Distributional Semantics in Technicolor","volume-title":"July.","author":"Bruni","year":"2012"},{"key":"2023112408330717500_CIT0201","first-page":"4985","article-title":"VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes","author":"Kottur","year":"2016"},{"key":"2023112408330717500_CIT0202","first-page":"7424","article-title":"ViCo: Word Embeddings From Visual Co-occurrences","author":"Gupta","year":"2019"},{"key":"2023112408330717500_CIT0203","article-title":"Image-to-Word Transformation Based on Dividing and Vector Quantizing Images With Words","author":"Mori","year":"1999"},{"key":"2023112408330717500_CIT0204","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2007.383173","article-title":"Learning Visual Representations Using Images With Captions","author":"Quattoni","year":"2007"},{"key":"2023112408330717500_CIT0205","first-page":"67","article-title":"Learning Visual Features From Large Weakly Supervised Data","volume-title":"ECCV 2016: Computer Vision \u2013 ECCV","author":"Joulin","year":"2015"},{"key":"2023112408330717500_CIT0206","first-page":"4193","article-title":"Learning Visual N-Grams From Web Data","author":"Li","year":"2016"},{"key":"2023112408330717500_CIT0207","first-page":"185","article-title":"Exploring the Limits of Weakly Supervised Pretraining","volume-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)","author":"Mahajan","year":"2018"},{"key":"2023112408330717500_CIT0208","first-page":"231","article-title":"Grounding Semantics in Olfactory Perception","author":"Kiela","year":"2015"},{"key":"2023112408330717500_CIT0209","first-page":"92","article-title":"Combining Labeled and Unlabeled Data With Co-Training","author":"Blum","year":"1998"},{"key":"2023112408330717500_CIT0210","first-page":"626","article-title":"Unsupervised Improvement of Visual Detectors Using Co-Training","author":"Levin","year":"2003"},{"key":"2023112408330717500_CIT0211","article-title":"Multi-View Learning in the Presence of View Disagreement","author":"Christoudias","year":"2012"},{"key":"2023112408330717500_CIT0212","doi-asserted-by":"publisher","first-page":"1440","DOI":"10.48550\/arXiv.1504.08083","article-title":"Fast R-CNN","volume-title":"IEEE International Conference on Computer Vision (ICCV)","author":"Girshick","year":"2015"},{"issue":"2","key":"2023112408330717500_CIT0213","doi-asserted-by":"publisher","first-page":"111","DOI":"10.3233\/AIC-210172","article-title":"Explaining Transformer-Based Image Captioning Models: An Empirical Analysis","volume":"35","author":"Cornia","year":"2022","journal-title":"AI Commun."},{"key":"2023112408330717500_CIT0214","article-title":"Image Captioning: Transforming Objects Into Words","author":"Herdade","year":"2019"},{"key":"2023112408330717500_CIT0215","first-page":"4633","article-title":"Attention on Attention for Image Captioning","author":"Huang","year":"2019"},{"key":"2023112408330717500_CIT0216","first-page":"153","article-title":"Image Captioning through Image Transformer","volume-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)","author":"He","year":"2020"},{"key":"2023112408330717500_CIT0217","first-page":"8927","article-title":"Entangled Transformer for Image Captioning","author":"Li","year":"2019"},{"key":"2023112408330717500_CIT0218","first-page":"5561","article-title":"Convolutional Image Captioning","author":"Aneja","year":"2017"},{"key":"2023112408330717500_CIT0219","first-page":"10687","article-title":"Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech","author":"Deshpande","year":"2018"},{"key":"2023112408330717500_CIT0220","first-page":"9","article-title":"Controllable Text-to-Image Generation","author":"Li","year":"2019"},{"key":"2023112408330717500_CIT0221","first-page":"16494","article-title":"Df-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis","author":"Tao","year":"2022"},{"key":"2023112408330717500_CIT0222","article-title":"A Style-Based Generator Architecture for Generative Adversarial Networks","author":"Karras","year":"2018"},{"key":"2023112408330717500_CIT0223","first-page":"2065","article-title":"StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery","author":"Patashnik","year":"2021"},{"issue":"4","key":"2023112408330717500_CIT0224","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1145\/3528223.3530164","article-title":"StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators","volume":"41","author":"Gal","year":"2021","journal-title":"ACM Trans. Graph."},{"key":"2023112408330717500_CIT0225","first-page":"695","article-title":"Image-Based Clip-Guided Essence Transfer","author":"Chefer","year":"2022"},{"key":"2023112408330717500_CIT0226","first-page":"8821","article-title":"Zero-Shot Text-to-Image Generation","author":"Ramesh","year":"2021"},{"key":"2023112408330717500_CIT0227","first-page":"88","article-title":"Vqgan-clip: Open Domain Image Generation and Editing With Natural Language Guidance","author":"Crowson","year":"2022"},{"key":"2023112408330717500_CIT0228","article-title":"Vector-Quantized Image Modeling With Improved VQGAN","author":"Yu","year":"2022"},{"key":"2023112408330717500_CIT0229","article-title":"Photorealistic Text-to-Image Diffusion Models With Deep Language Understanding","author":"Saharia","year":"2022"},{"key":"2023112408330717500_CIT0230","article-title":"CLIPDraw: Exploring Text-to-Drawing Synthesis Through Language-Image Encoders","author":"Frans","year":"2022"},{"issue":"3","key":"2023112408330717500_CIT0231","doi-asserted-by":"publisher","first-page":"031008","DOI":"10.1115\/1.4053077","article-title":"Prediction of Mechanical Properties of Three-Dimensional Printed Lattice Structures Through Machine Learning","volume":"22","author":"Ma","year":"2022","journal-title":"ASME J. Comput. Inf. Sci. Eng."},{"issue":"1","key":"2023112408330717500_CIT0232","doi-asserted-by":"publisher","first-page":"011009","DOI":"10.1115\/1.4041777","article-title":"Triangular Mesh and Boundary Representation Combined Approach for 3D CAD Lightweight Representation for Collaborative Product Development","volume":"19","author":"Nguyen","year":"2019","journal-title":"ASME J. Comput. Inf. Sci. Eng."},{"issue":"4","key":"2023112408330717500_CIT0233","doi-asserted-by":"publisher","first-page":"418","DOI":"10.1115\/1.2353852","article-title":"Point Cloud to CAD Model Registration Methods in Manufacturing Inspection","volume":"6","author":"Tucker","year":"2006","journal-title":"ASME J. Comput. Inf. Sci. Eng."},{"issue":"1","key":"2023112408330717500_CIT0234","doi-asserted-by":"publisher","first-page":"011101","DOI":"10.1115\/1.4040169","article-title":"Implementation of Design Rules for Perception Into a Tool for Three-Dimensional Shape Generation Using a Shape Grammar and a Parametric Model","volume":"141","author":"Mata","year":"2019","journal-title":"ASME J. Mech. Des."},{"issue":"4","key":"2023112408330717500_CIT0235","doi-asserted-by":"publisher","first-page":"041008","DOI":"10.1115\/1.4056566","article-title":"Teeth Mold Point Cloud Completion Via Data Augmentation and Hybrid RL-GAN","volume":"23","author":"Toscano","year":"2023","journal-title":"ASME J. Comput. Inf. Sci. Eng."},{"key":"2023112408330717500_CIT0236","first-page":"628","article-title":"3D-R2N2: A Unified Approach for Single and Multi-View 3D Object Reconstruction","volume-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)","author":"Choy","year":"2016"},{"key":"2023112408330717500_CIT0237","first-page":"9784","article-title":"Mesh R-CNN","author":"Gkioxari","year":"2019"},{"key":"2023112408330717500_CIT0238","first-page":"1290","article-title":"MeshMVS: Multi-View Stereo Guided Mesh Reconstruction","author":"Shrestha","year":"2021"},{"key":"2023112408330717500_CIT0239","first-page":"2463","article-title":"A Point Set Generation Network for 3D Object Reconstruction From a Single Image","author":"Fan","year":"2016"},{"key":"2023112408330717500_CIT0240","first-page":"216","article-title":"A Papier-Mache Approach to Learning 3D Surface Generation","author":"Groueix","year":"2018"},{"issue":"11","key":"2023112408330717500_CIT0241","doi-asserted-by":"publisher","first-page":"114501","DOI":"10.1115\/1.4054906","article-title":"A Predictive and Generative Design Approach for Three-Dimensional Mesh Shapes Using Target-Embedding Variational Autoencoder","volume":"144","author":"Li","year":"2022","journal-title":"ASME J. Mech. Des."},{"key":"2023112408330717500_CIT0242","article-title":"Learning a Probabilistic Latent Space of Object Shapes Via 3D Generative-Adversarial Modeling","author":"Wu","year":"2016"},{"key":"2023112408330717500_CIT0243","doi-asserted-by":"publisher","first-page":"9731","DOI":"10.1109\/cvpr.2019.00997","article-title":"Unsupervised Primitive Discovery for Improved 3D Generative Modeling","author":"Khan","year":"2019"},{"issue":"1","key":"2023112408330717500_CIT0244","doi-asserted-by":"publisher","first-page":"011005","DOI":"10.1115\/1.4063275","article-title":"Three-Dimensional-Slice-Super-Resolution-Net: A Fast Few Shooting Learning Model for 3D Super-Resolution Using Slice-Up and Slice-Reconstruction","volume":"24","author":"Lin","year":"2024","journal-title":"ASME J. Comput. Inf. Sci. Eng."},{"issue":"4","key":"2023112408330717500_CIT0245","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1145\/3072959.3073616","article-title":"Convolutional Neural Networks on Surfaces Via Seamless Toric Covers","volume":"36","author":"Maron","year":"2017","journal-title":"ACM Trans. Graph. (TOG)"},{"key":"2023112408330717500_CIT0246","doi-asserted-by":"crossref","DOI":"10.1145\/3272127.3275052","article-title":"Multi-chart Generative Surface Modeling","author":"Ben-Hamu","year":"2018"},{"key":"2023112408330717500_CIT0247","first-page":"5586","article-title":"Rank3DGAN: Semantic Mesh Generation Using Relative Attributes","author":"Saquil","year":"2020"},{"key":"2023112408330717500_CIT0248","article-title":"Xdgan: Multi-modal 3D Shape Generation in 2D Space","author":"Alhaija","year":"2022"},{"key":"2023112408330717500_CIT0249","article-title":"Shapecrafter: A Recursive Text-Conditioned 3d Shape Generation Model","author":"Fu","year":"2022"},{"key":"2023112408330717500_CIT0250","article-title":"Meshdiffusion: Score-Based Generative 3D Mesh Modeling","author":"Liu","year":"2023"},{"key":"2023112408330717500_CIT0251","first-page":"3763","article-title":"Pre-Train, Self-Train, Distill: A Simple Recipe for Supersizing 3D Reconstruction","author":"Alwala","year":"2022"},{"key":"2023112408330717500_CIT0252","article-title":"ISS: Image as Stepping Stone for Text-Guided 3D Shape Generation","author":"Liu","year":"2022"},{"key":"2023112408330717500_CIT0253","article-title":"3D-LDM: Neural Implicit 3D Shape Generation With Latent Diffusion Models","author":"Nam","year":"2022"},{"key":"2023112408330717500_CIT0254","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/978-3-031-20062-5_18","volume-title":"Computer Vision \u2013 ECCV 2022","author":"Cheng","year":"2022"},{"key":"2023112408330717500_CIT0255","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1007\/978-3-030-01252-6_4","article-title":"Pixel2Mesh: Generating 3D Mesh Models From Single RGB Images","volume":"11215","author":"Wang","year":"2018","journal-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)"},{"key":"2023112408330717500_CIT0256","doi-asserted-by":"publisher","first-page":"134926","DOI":"10.1109\/cvpr52688.2022.01313","article-title":"Text2Mesh: Text-Driven Neural Stylization for Meshes","author":"Michel","year":"2021"},{"key":"2023112408330717500_CIT0257","article-title":"Clipmatrix: Text-Controlled Creation of 3D Textured Meshes","author":"Jetchev","year":"2021","journal-title":"ArXiv"},{"issue":"6","key":"2023112408330717500_CIT0258","doi-asserted-by":"publisher","first-page":"060816","DOI":"10.1115\/1.4062939","article-title":"The Role of Deep Learning in Manufacturing Applications: Challenges and Opportunities","volume":"23","author":"Malhan","year":"2023","journal-title":"ASME J. Comput. Inf. Sci. Eng."},{"issue":"3","key":"2023112408330717500_CIT0259","first-page":"2267","article-title":"Hybrid Contrastive Learning of Tri-Modal Representation for Multimodal Sentiment Analysis","volume":"14","author":"Mai","year":"2022","journal-title":"IEEE Trans. Affect. Comput."},{"key":"2023112408330717500_CIT0260","doi-asserted-by":"crossref","DOI":"10.1109\/ICME55011.2023.00480","article-title":"Multimodal Fake News Detection Via Clip-Guided Learning","author":"Zhou","year":"2023"},{"issue":"15","key":"2023112408330717500_CIT0261","doi-asserted-by":"publisher","first-page":"4316","DOI":"10.1093\/bioinformatics\/btaa501","article-title":"A Multimodal Deep Learning Framework for Predicting Drug-Drug Interaction Events","volume":"36","author":"Deng","year":"2020","journal-title":"Bioinformatics"},{"key":"2023112408330717500_CIT0262","doi-asserted-by":"crossref","DOI":"10.1145\/3411764.3445563","article-title":"Deeptake: Prediction of Driver Takeover Behavior Using Multimodal Data","author":"Pakdamanian","year":"2021"},{"issue":"4","key":"2023112408330717500_CIT0263","doi-asserted-by":"publisher","first-page":"041407","DOI":"10.1115\/1.4056500","article-title":"DDE-GAN: Integrating a Data-Driven Design Evaluator Into Generative Adversarial Networks for Desirable and Diverse Concept Generation","volume":"145","author":"Yuan","year":"2023","journal-title":"ASME J. Mech. Des."},{"key":"2023112408330717500_CIT0264","article-title":"Im2text: Describing Images Using 1 Million Captioned Photographs","author":"Ordonez","year":"2011"},{"key":"2023112408330717500_CIT0265","doi-asserted-by":"publisher","first-page":"100","DOI":"10.3115\/v1\/P15-2017","article-title":"Language Models for Image Captioning: The Quirks and What Works","author":"Devlin","year":"2015"},{"key":"2023112408330717500_CIT0266","doi-asserted-by":"publisher","first-page":"e22","DOI":"10.1017\/S0890060422000130","article-title":"Enabling Multi-modal Search for Inspirational Design Stimuli Using Deep Learning","volume":"36","author":"Kwon","year":"2022","journal-title":"Artif. Intell. Eng. Des. Anal. Manuf."},{"key":"2023112408330717500_CIT0267","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1007\/978-3-642-15561-1_2","article-title":"Every Picture Tells a Story: Generating Sentences From Images","volume":"6314","author":"Farhadi","year":"2010","journal-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)"},{"key":"2023112408330717500_CIT0268","doi-asserted-by":"publisher","first-page":"2346","DOI":"10.1609\/aaai.v29i1.9512","article-title":"Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework","author":"Xu","year":"2015"},{"key":"2023112408330717500_CIT0269","doi-asserted-by":"publisher","first-page":"853","DOI":"10.1613\/jair.3994","article-title":"Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics","volume":"47","author":"Hodosh","year":"2013","journal-title":"J. Artif. Intell. Res."},{"issue":"2","key":"2023112408330717500_CIT0270","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1080\/15710882.2019.1654524","article-title":"The Situated Function-Behavior-Structure Co-Design Model","volume":"17","author":"Gero","year":"2021","journal-title":"CoDesign"},{"key":"2023112408330717500_CIT0271","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1007\/978-3-319-10602-1_48","article-title":"Microsoft COCO: Common Objects in Context","volume":"8693","author":"Lin","year":"2014","journal-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)"},{"issue":"1","key":"2023112408330717500_CIT0272","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1007\/s11263-016-0981-7","article-title":"Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations","volume":"123","author":"Krishna","year":"2016","journal-title":"Int. J. Comput. Vision"},{"issue":"2","key":"2023112408330717500_CIT0273","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1145\/2812802","article-title":"YFCC100M: The New Data in Multimedia Research","volume":"59","author":"Thomee","year":"2015","journal-title":"Commun. ACM"},{"key":"2023112408330717500_CIT0274","doi-asserted-by":"publisher","first-page":"843","DOI":"10.1109\/iccv.2017.97","article-title":"Revisiting Unreasonable Effectiveness of Data in Deep Learning Era","author":"Sun","year":"2017"},{"key":"2023112408330717500_CIT0275","first-page":"2408","article-title":"AVA: A Large-Scale Database for Aesthetic Visual Analysis","author":"Murray","year":"2012"},{"key":"2023112408330717500_CIT0276","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1007\/978-3-030-20893-6_7","article-title":"Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings","volume":"11363","author":"Chen","year":"2018","journal-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)"},{"key":"2023112408330717500_CIT0277","first-page":"1","article-title":"Parkinson\u2019s Disease Detection Using CNN Architectures With Transfer Learning","author":"Jahan","year":"2021"},{"key":"2023112408330717500_CIT0278","doi-asserted-by":"crossref","DOI":"10.1016\/j.cad.2023.103609","article-title":"Beyond Statistical Similarity: Rethinking Metrics for Deep Generative Models in Engineering Design","author":"Regenwetter","year":"2023"},{"issue":"1","key":"2023112408330717500_CIT0279","doi-asserted-by":"publisher","first-page":"011006","DOI":"10.1115\/1.4044507","article-title":"Physics-Driven Regularization of Deep Neural Networks for Enhanced Engineering Design and Analysis","volume":"20","author":"Nabian","year":"2020","journal-title":"ASME J. Comput. Inf. Sci. Eng."},{"issue":"1","key":"2023112408330717500_CIT0280","doi-asserted-by":"publisher","first-page":"285","DOI":"10.1109\/TPAMI.2022.3148853","article-title":"Deep Learning for Free-Hand Sketch: A Survey","volume":"45","author":"Xu","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"2023112408330717500_CIT0281","first-page":"1152","article-title":"Multi-Level 3D CNN for Learning Multi-Scale Spatial Features","author":"Ghadai","year":"2019"},{"key":"2023112408330717500_CIT0282","first-page":"3558","article-title":"What Are You Talking About? Text-to-Image Coreference","author":"Kong","year":"2014"},{"issue":"10","key":"2023112408330717500_CIT0283","doi-asserted-by":"publisher","first-page":"5","DOI":"10.3390\/ijerph19106046","article-title":"Research on the Design Strategy of Healing Products for Anxious Users During COVID-19","volume":"19","author":"Wu","year":"2022","journal-title":"Int. J. Environ. Res. Public Health"},{"issue":"1","key":"2023112408330717500_CIT0284","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3390\/e23010018","article-title":"Explainable AI: A Review of Machine Learning Interpretability Methods","volume":"23","author":"Linardatos","year":"2021","journal-title":"Entropy"},{"key":"2023112408330717500_CIT0285","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1016\/j.inffus.2019.12.012","article-title":"Explainable Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges Toward Responsible AI","volume":"58","author":"Barredo Arrieta","year":"2020","journal-title":"Inf. Fusion"}],"container-title":["Journal of Computing and Information Science in Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/asmedigitalcollection.asme.org\/computingengineering\/article-pdf\/24\/1\/010801\/7062966\/jcise_24_1_010801.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/asmedigitalcollection.asme.org\/computingengineering\/article-pdf\/24\/1\/010801\/7062966\/jcise_24_1_010801.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,24]],"date-time":"2023-11-24T08:33:41Z","timestamp":1700814821000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmedigitalcollection.asme.org\/computingengineering\/article\/24\/1\/010801\/1169855\/Multi-Modal-Machine-Learning-in-Engineering-Design"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,24]]},"references-count":285,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,1,1]]}},"URL":"https:\/\/doi.org\/10.1115\/1.4063954","relation":{},"ISSN":["1530-9827","1944-7078"],"issn-type":[{"value":"1530-9827","type":"print"},{"value":"1944-7078","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,24]]},"article-number":"010801"}}