{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,23]],"date-time":"2026-03-23T15:20:26Z","timestamp":1774279226791,"version":"3.50.1"},"reference-count":40,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T00:00:00Z","timestamp":1754956800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:sec><jats:title>Introduction<\/jats:title><jats:p>i-Motifs (iMs) are cytosine-rich, four-stranded DNA structures with emerging roles in gene regulation and genome stability. Despite their biological relevance, genome-wide prediction of iM-forming sequences remains limited by low specificity and high false-positive rates, leading to considerable experimental burden.<\/jats:p><\/jats:sec><jats:sec><jats:title>Method<\/jats:title><jats:p>To address this, we developed a refined computational approach that prioritizes high-confidence iM candidates using a Position-Specific Similarity Matrix (PSSM) derived from multiple sequence alignments. The human reference genome (hg38) was scanned using a custom regular expression targeting cytosine-rich motifs, followed by scoring each sequence with the PSSM. Statistical significance was assessed via permutation testing, one-sided t-tests, Benjamini-Hochberg correction, and Z-scores.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>This pipeline identified 37,075 candidate sequences (15\u201346 nucleotides) with strong iM-forming potential. Validation against experimentally confirmed iMs and known G-quadruplexes (G4s) demonstrated significant differences in alignment scores and sequence similarity, confirming structural specificity. A random forest classifier trained on nucleotide features further supported the distinctiveness of the candidates, achieving a high classification performance.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>This work presents a scalable and statistically robust method to enrich for biologically relevant iM sequences, providing a valuable resource for future experimental validation and the rational design of ligands targeting iMs to modulate gene expression in contexts such as cancer.<\/jats:p><\/jats:sec>","DOI":"10.3389\/fbinf.2025.1657841","type":"journal-article","created":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T05:31:53Z","timestamp":1754976713000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Sequence-based prioritization of i-Motif candidates in the human genome"],"prefix":"10.3389","volume":"5","author":[{"given":"Veronica","family":"Remori","sequence":"first","affiliation":[]},{"given":"Michela","family":"Prest","sequence":"additional","affiliation":[]},{"given":"Mauro","family":"Fasano","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,8,12]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"8038","DOI":"10.1093\/nar\/gky735","article-title":"i-Motif DNA: structural features and significance to cell biology","volume":"46","author":"Abou Assi","year":"2018","journal-title":"Nucleic Acids Res."},{"key":"B2","doi-asserted-by":"publisher","first-page":"139582","DOI":"10.1016\/j.ijbiomac.2025.139582","article-title":"I-motif formation in the promoter region of the B-MYB proto-oncogene","volume":"296","author":"Alves","year":"2025","journal-title":"Int. J. Biol. Macromol."},{"key":"B3","doi-asserted-by":"publisher","first-page":"100081","DOI":"10.1016\/j.patter.2020.100081","article-title":"Spectral jaccard similarity: a new approach to estimating pairwise sequence alignments","volume":"1","author":"Baharav","year":"2020","journal-title":"Patterns"},{"key":"B4","doi-asserted-by":"publisher","first-page":"13530","DOI":"10.1093\/nar\/gkae1001","article-title":"Profiling of i-motif-binding proteins reveals functional roles of nucleolin in regulation of high-order DNA structures","volume":"52","author":"Ban","year":"2024","journal-title":"Nucleic Acids Res."},{"key":"B5","doi-asserted-by":"publisher","first-page":"3287","DOI":"10.1109\/tit.2020.2996543","article-title":"Levenshtein distance, sequence comparison and biological database search","volume":"67","author":"Berger","year":"2021","journal-title":"IEEE Trans. Inf. Theory"},{"key":"B6","doi-asserted-by":"publisher","first-page":"D1228","DOI":"10.1093\/nar\/gks1147","article-title":"InnateDB: systems biology of innate immunity and Beyond\u2014recent updates and continuing curation","volume":"41","author":"Breuer","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"B7","doi-asserted-by":"publisher","first-page":"96","DOI":"10.3390\/ph14020096","article-title":"The i-Motif as a molecular target: more than a complementary DNA secondary structure","volume":"14","author":"Brown","year":"2021","journal-title":"Pharmaceuticals"},{"key":"B8","doi-asserted-by":"publisher","first-page":"323","DOI":"10.1016\/j.ygeno.2012.04.003","article-title":"Random forests for genomic data analysis","volume":"99","author":"Chen","year":"2012","journal-title":"Genomics"},{"key":"B9","doi-asserted-by":"publisher","first-page":"11921","DOI":"10.1093\/nar\/gkz1008","article-title":"A DNA G-quadruplex\/i-motif hybrid","volume":"47","author":"Chu","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"B10","doi-asserted-by":"publisher","first-page":"2942","DOI":"10.1002\/anie.201813288","article-title":"Chemical regulation of DNA i-Motifs for nanobiotechnology and therapeutics","volume":"58","author":"Debnath","year":"2019","journal-title":"Angew. Chem. Int. Ed."},{"key":"B11","doi-asserted-by":"publisher","first-page":"102474","DOI":"10.1016\/j.omtn.2025.102474","article-title":"i-Motifs as regulatory switches: mechanisms and implications for gene expression","volume":"36","author":"Deep","year":"2025","journal-title":"Mol. Ther. Nucleic Acids."},{"key":"B12","doi-asserted-by":"publisher","first-page":"176","DOI":"10.1016\/j.biochi.2023.07.002","article-title":"Non-B DNA structures as a booster of genome instability","volume":"214","author":"Duardo","year":"2023","journal-title":"Biochimie"},{"key":"B13","doi-asserted-by":"publisher","first-page":"7858","DOI":"10.1093\/nar\/gkq639","article-title":"How long is too long? Effects of loop size on G-quadruplex stability","volume":"38","author":"Gu\u00e9din","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"B14","doi-asserted-by":"publisher","first-page":"1267","DOI":"10.1038\/ng.3662","article-title":"G-quadruplex structures mark human regulatory chromatin","volume":"48","author":"H\u00e4nsel-Hertsch","year":"2016","journal-title":"Nat. Genet."},{"key":"B15","doi-asserted-by":"publisher","first-page":"D947","DOI":"10.1093\/nar\/gkaa609","article-title":"HRT atlas v1.0 database: redefining human and mouse housekeeping genes and candidate reference transcripts by mining massive RNA-seq datasets","volume":"49","author":"Hounkpe","year":"2021","journal-title":"Nucleic Acids Res."},{"key":"B16","doi-asserted-by":"publisher","first-page":"4172","DOI":"10.1021\/ja4109352","article-title":"The transcriptional complex between the BCL2 i-Motif and hnRNP LL is a molecular switch for control of gene expression that can be modulated by small molecules","volume":"136","author":"Kang","year":"2014","journal-title":"J. Am. Chem. Soc."},{"key":"B17","doi-asserted-by":"publisher","first-page":"4161","DOI":"10.1021\/ja410934b","article-title":"The dynamic character of the BCL2 promoter i-Motif provides a mechanism for modulation of gene expression by compounds that bind selectively to the alternative DNA hairpin structure","volume":"136","author":"Kendrick","year":"2014","journal-title":"J. Am. Chem. Soc."},{"key":"B18","doi-asserted-by":"publisher","first-page":"3307","DOI":"10.1016\/j.bmcl.2015.05.064","article-title":"Stabilization of the i-motif structure by the intra-strand cross-link formation","volume":"25","author":"Kikuta","year":"2015","journal-title":"Bioorg. Med. Chem. Lett."},{"key":"B19","doi-asserted-by":"publisher","first-page":"2947","DOI":"10.1093\/bioinformatics\/btm404","article-title":"Clustal W and clustal X version 2.0","volume":"23","author":"Larkin","year":"2007","journal-title":"Bioinformatics"},{"key":"B20","doi-asserted-by":"publisher","first-page":"108827","DOI":"10.1016\/j.dib.2022.108827","article-title":"Comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome","volume":"46","author":"Li","year":"","journal-title":"Data Brief."},{"key":"B21","doi-asserted-by":"publisher","first-page":"e202301666","DOI":"10.1002\/anie.202301666","article-title":"Crystal structure of an i-Motif from the HRAS oncogene promoter","volume":"62","author":"Li","year":"","journal-title":"Angew. Chem. Int. Ed."},{"key":"B22","doi-asserted-by":"publisher","first-page":"1136251","DOI":"10.3389\/fphar.2023.1136251","article-title":"Emerging roles of i-motif in gene expression and disease treatment","volume":"14","author":"Luo","year":"2023","journal-title":"Front. Pharmacol."},{"key":"B23","doi-asserted-by":"publisher","first-page":"e50","DOI":"10.1093\/nar\/gkr1135","article-title":"A highly efficient and effective motif discovery method for ChIP-seq\/ChIP-chip data using positional information","volume":"40","author":"Ma","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"B24","doi-asserted-by":"publisher","first-page":"3445","DOI":"10.1093\/nar\/gkac158","article-title":"i-Motif formation and spontaneous deletions in human cells","volume":"50","author":"Martella","year":"2022","journal-title":"Nucleic Acids Res."},{"key":"B25","doi-asserted-by":"publisher","first-page":"645","DOI":"10.3390\/genes14030645","article-title":"Structural and functional classification of G-Quadruplex families within the human genome","volume":"14","author":"Neupane","year":"2023","journal-title":"Genes"},{"key":"B26","doi-asserted-by":"publisher","first-page":"4786","DOI":"10.1038\/s44318-024-00210-5","article-title":"Human genomic DNA is widely interspersed with i-motif structures","volume":"43","author":"Pe\u00f1a Martinez","year":"2024","journal-title":"EMBO J."},{"key":"B27","doi-asserted-by":"publisher","first-page":"130038","DOI":"10.1016\/j.bmcl.2024.130038","article-title":"Switching off cancer \u2013 an overview of G-quadruplex and i-motif functional role in oncogene expression","volume":"116","author":"Roxo","year":"2025","journal-title":"Bioorg. Med. Chem. Lett."},{"key":"B28","doi-asserted-by":"publisher","first-page":"gkae1305","DOI":"10.1093\/nar\/gkae1305","article-title":"The iMab antibody selectively binds to intramolecular and intermolecular i-motif structures","volume":"53","author":"Ruggiero","year":"2025","journal-title":"Nucleic Acids Res."},{"key":"B29","doi-asserted-by":"publisher","first-page":"364","DOI":"10.1038\/334364a0","article-title":"Formation of parallel four-stranded complexes by guanine-rich motifs in DNA and its implications for meiosis","volume":"334","author":"Sen","year":"1988","journal-title":"Nature"},{"key":"B30","first-page":"45","article-title":"Chapter three - a beginner\u2019s handbook to identify and characterize i-motif DNA","volume-title":"Methods in enzymology","author":"Sengupta","year":"2024"},{"key":"B31","doi-asserted-by":"publisher","first-page":"11593","DOI":"10.1073\/pnas.182256799","article-title":"Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription","volume":"99","author":"Siddiqui-Jain","year":"2002","journal-title":"Proc. Natl. Acad. Sci."},{"key":"B32","doi-asserted-by":"publisher","first-page":"539","DOI":"10.1038\/msb.2011.75","article-title":"Fast, scalable generation of high\u2010quality protein multiple sequence alignments using clustal omega","volume":"7","author":"Sievers","year":"2011","journal-title":"Mol. Syst. Biol."},{"key":"B33","doi-asserted-by":"publisher","first-page":"853","DOI":"10.1016\/j.tig.2024.05.011","article-title":"i-Motif DNA: identification, formation, and cellular functions","volume":"40","author":"Tao","year":"2024","journal-title":"Trends Genet."},{"key":"B34","doi-asserted-by":"publisher","first-page":"2279","DOI":"10.1111\/febs.13307","article-title":"DNA structure and function","volume":"282","author":"Travers","year":"2015","journal-title":"FEBS J."},{"key":"B35","doi-asserted-by":"publisher","first-page":"322","DOI":"10.1186\/s12859-015-0749-z","article-title":"DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment","volume":"16","author":"Wright","year":"2015","journal-title":"BMC Bioinforma."},{"key":"B36","doi-asserted-by":"publisher","first-page":"143555","DOI":"10.1016\/j.ijbiomac.2025.143555","article-title":"Advances in i-motif structures: stability, gene expression, and therapeutic applications","volume":"311","author":"Wu","year":"2025","journal-title":"Int. J. Biol. Macromol."},{"key":"B37","doi-asserted-by":"publisher","first-page":"12020","DOI":"10.1093\/nar\/gkad981","article-title":"Decoding complexity in biomolecular recognition of DNA i-motifs with microarrays","volume":"51","author":"Yazdani","year":"2023","journal-title":"Nucleic Acids Res."},{"key":"B38","doi-asserted-by":"publisher","first-page":"gkae315","DOI":"10.1093\/nar\/gkae315","article-title":"iM-Seeker: a webserver for DNA i-motifs prediction and scoring via automated machine learning","volume":"27","author":"Yu","year":"2024","journal-title":"Nucleic Acids Res."},{"key":"B39","doi-asserted-by":"publisher","first-page":"8309","DOI":"10.1093\/nar\/gkad626","article-title":"Genome-wide mapping of i-motifs reveals their association with transcription regulation in live human cells","volume":"51","author":"Zanin","year":"2023","journal-title":"Nucleic Acids Res."},{"key":"B40","doi-asserted-by":"publisher","first-page":"7742","DOI":"10.1002\/anie.201301278","article-title":"Combination of i-Motif and G-Quadruplex structures within the same strand: formation and application","volume":"52","author":"Zhou","year":"2013","journal-title":"Angew. Chem. Int. Ed."}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1657841\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T05:31:54Z","timestamp":1754976714000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1657841\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,12]]},"references-count":40,"alternative-id":["10.3389\/fbinf.2025.1657841"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2025.1657841","relation":{},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,12]]},"article-number":"1657841"}}