{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T16:32:46Z","timestamp":1740155566089,"version":"3.37.3"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,4,15]],"date-time":"2024-04-15T00:00:00Z","timestamp":1713139200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,4,15]],"date-time":"2024-04-15T00:00:00Z","timestamp":1713139200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/X032701\/1"],"award-info":[{"award-number":["EP\/X032701\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Purpose<\/jats:title>\n                <jats:p>Wiswesser Line Notation (WLN) is a old line notation for encoding chemical compounds for storage and processing by computers. Whilst the notation itself has long since been surpassed by SMILES and InChI, distribution of WLN during its active years was extensive. In the context of modernising chemical data, we present a comprehensive WLN parser developed using the OpenBabel toolkit, capable of translating WLN strings into various formats supported by the library. Furthermore, we have devised a specialised Finite State Machine l, constructed from the rules of WLN, enabling the recognition and extraction of chemical strings out of large bodies of text. Available open-access WLN data with corresponding SMILES or InChI notation is rare, however ChEMBL, ChemSpider and PubChem all contain WLN records which were used for conversion scoring. Our investigation revealed a notable proportion of inaccuracies within the database entries, and we have taken steps to rectify these errors whenever feasible.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Scientific contribution<\/jats:title>\n                <jats:p>Tools for both the extraction and conversion of WLN from chemical documents have been successfully developed. Both the Deterministic Finite Automaton (DFA) and parser handle the majority of WLN rules officially endorsed in the three major WLN manuals, with the parser showing a clear jump in accuracy and chemical coverage over previous submissions. The GitHub repository can be found here: <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/Mblakey\/wiswesser\">https:\/\/github.com\/Mblakey\/wiswesser<\/jats:ext-link>.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s13321-024-00831-2","type":"journal-article","created":{"date-parts":[[2024,4,15]],"date-time":"2024-04-15T10:02:09Z","timestamp":1713175329000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Zombie cheminformatics: extraction and conversion of Wiswesser Line Notation (WLN) from chemical documents"],"prefix":"10.1186","volume":"16","author":[{"given":"Michael","family":"Blakey","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Samantha","family":"Pearman-Kanza","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jeremy G.","family":"Frey","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,4,15]]},"reference":[{"key":"831_CR1","doi-asserted-by":"publisher","DOI":"10.1002\/wcms.36","author":"WA Warr","year":"2011","unstructured":"Warr WA (2011) Representation of chemical structures. Wiley Interdiscip Rev Comput Mol Sci. https:\/\/doi.org\/10.1002\/wcms.36","journal-title":"Wiley Interdiscip Rev Comput Mol Sci"},{"key":"831_CR2","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1186\/1758-2946-4-22","volume":"4","author":"NM O\u2019Boyle","year":"2012","unstructured":"O\u2019Boyle NM (2012) Towards a universal smiles representation - a standard method to generate canonical smiles based on the inchi. J Cheminf 4:22","journal-title":"J Cheminf"},{"key":"831_CR3","first-page":"217","volume":"4","author":"EE Bolton","year":"2008","unstructured":"Bolton EE, Wang Y, Thiessen PA, Bryant SH (2008) Pubchem: integrated platform of small molecules and biological activities. Annu Rep Prog Chem 4:217\u2013241","journal-title":"Annu Rep Prog Chem"},{"key":"831_CR4","unstructured":"Landrum G et al (2013) Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum"},{"issue":"1","key":"831_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-14-257","volume":"14","author":"S Beisken","year":"2013","unstructured":"Beisken S, Meinl T, Wiswedel B, Figueiredo LF, Berthold M, Steinbeck C (2013) Knime-cdk: Workflow-driven cheminformatics. BMC Bioinformatics 14(1):1\u20134","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"831_CR6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1758-2946-3-1","volume":"3","author":"NM O\u2019Boyle","year":"2011","unstructured":"O\u2019Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open babel: an open chemical toolbox. J Cheminf 3(1):1\u201314","journal-title":"J Cheminf"},{"key":"831_CR7","unstructured":"Mazzarella L (2018) Chemical file format conversion tools: an overview. https:\/\/api.semanticscholar.org\/CorpusID:212442689"},{"issue":"3","key":"831_CR8","doi-asserted-by":"publisher","first-page":"739","DOI":"10.1021\/ci100384d","volume":"51","author":"DM Lowe","year":"2011","unstructured":"Lowe DM, Corbett PT, Murray-Rust P, Glen RC (2011) Chemical name to structure: opsin, an open source solution. J Chem Inf Model 51(3):739\u201353","journal-title":"J Chem Inf Model"},{"key":"831_CR9","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1186\/1758-2946-7-S1-S5","volume":"7","author":"DM Lowe","year":"2015","unstructured":"Lowe DM, Sayle RA (2015) Leadmine: a grammar and dictionary driven approach to entity recognition. J Cheminf 7:5","journal-title":"J Cheminf"},{"key":"831_CR10","unstructured":"Lowe DM, Sayle RA (2015) Recognition of chemical entities in patents using leadmine. https:\/\/api.semanticscholar.org\/CorpusID:12533862"},{"key":"831_CR11","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1186\/1758-2946-6-S1-P4","volume":"6","author":"SR Heller","year":"2014","unstructured":"Heller SR (2014) Inchi - the worldwide chemical structure standard. J Cheminf 6:4","journal-title":"J Cheminf"},{"key":"831_CR12","doi-asserted-by":"publisher","first-page":"40","DOI":"10.1186\/1752-153X-2-S1-P40","volume":"2","author":"A Dalke","year":"2008","unstructured":"Dalke A (2008) Parsers for smiles and smarts. Chem Cent J 2:40","journal-title":"Chem Cent J"},{"key":"831_CR13","unstructured":"Sayle R, O\u2019Boyle N, Landrum G, Affentranger R (2019) open sourcing a wiswesser line notation (wln) parser to facilitate electronic lab notebook (eln) record transfer using the pistoia alliance\u2019s udm (unified data model) standard. NextMove Software"},{"key":"831_CR14","unstructured":"Smith EG, Wiswesser WJ (1968) The wiswesser line-formula chemical notation. https:\/\/api.semanticscholar.org\/CorpusID:118420484"},{"issue":"1","key":"831_CR15","doi-asserted-by":"publisher","first-page":"54","DOI":"10.1021\/c160036a018","volume":"10","author":"E Garfield","year":"1970","unstructured":"Garfield E, Revesz GS, Granito CE, Dorr HA, Calderon MM, Warner A (1970) Index chemicus registry system: pragmatic approach to substructure chemical retrieval. J Chem Doc 10(1):54\u201358. https:\/\/doi.org\/10.1021\/c160036a018. (Accessed 2023-05-09)","journal-title":"J Chem Doc"},{"issue":"3","key":"831_CR16","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1002\/ps.2780050316","volume":"5","author":"DR Eakin","year":"1974","unstructured":"Eakin DR, Hyde E, Palmer G (1974) The use of computers with chemical structural information: ICI CROSSBOW system. Pestic Sci 5(3):319\u2013326. https:\/\/doi.org\/10.1002\/ps.2780050316. (Accessed 2023-07-24)","journal-title":"Pestic Sci"},{"issue":"4","key":"831_CR17","doi-asserted-by":"publisher","first-page":"223","DOI":"10.1021\/ci60008a008","volume":"16","author":"AV Tomea","year":"1976","unstructured":"Tomea AV, Sorter PF (1976) On-line substructure searching utilizing wiswesser line notations. J Chem Inf Comput Sci 16(4):223\u2013227","journal-title":"J Chem Inf Comput Sci"},{"issue":"3","key":"831_CR18","doi-asserted-by":"publisher","first-page":"46","DOI":"10.1515\/ci-2019-0315","volume":"41","author":"E Hepler-Smith","year":"2019","unstructured":"Hepler-Smith E, McEwen L (2019) A century of nomenclature for chemists and machines. Chem Int 41(3):46\u201349","journal-title":"Chem Int"},{"issue":"12","key":"831_CR19","doi-asserted-by":"publisher","first-page":"7673","DOI":"10.1021\/acs.chemrev.6b00851","volume":"117","author":"M Krallinger","year":"2017","unstructured":"Krallinger M, Rabal O, Louren\u00e7o A, Oyarz\u00e1bal J, Valencia A (2017) Information retrieval and text mining technologies for chemistry. Chem Rev 117(12):7673\u20137761","journal-title":"Chem Rev"},{"key":"831_CR20","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1186\/1758-2946-6-17","volume":"6","author":"S Eltyeb","year":"2014","unstructured":"Eltyeb S, Salim N (2014) Chemical named entities recognition: a review on approaches and applications. J Cheminf 6:17","journal-title":"J Cheminf"},{"key":"831_CR21","unstructured":"Todsen WL. ChemDoodle WLN Parser. https:\/\/web.chemdoodle.com\/demos\/wiswesser-line-notation Accessed 10 Dec 2023."},{"key":"831_CR22","doi-asserted-by":"crossref","unstructured":"Wiswesser WJ (1954) A line-formula chemical notation","DOI":"10.1108\/eb049470"},{"key":"831_CR23","volume-title":"The Wiswesser line-formula chemical notation","author":"GS Elbert","year":"1968","unstructured":"Elbert GS (1968) The Wiswesser line-formula chemical notation. McGraw-Hill Book Company publishers, USA"},{"issue":"6","key":"831_CR24","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1145\/512274.512278","volume":"7","author":"S Gorn","year":"1964","unstructured":"Gorn S, Bemer RW, Green J, Lohse E (1964) Proposed American standard: bit sequencing of the American standard code for information interchange (ACSII) in serial-by-bit data transmission. Commun ACM 7(6):333\u2013336","journal-title":"Commun ACM"},{"key":"831_CR25","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-53622-3","volume-title":"Graph theory","author":"R Diestel","year":"2017","unstructured":"Diestel R (2017) Graph theory, 5th edn. Springer, Cham","edition":"5"},{"issue":"1","key":"831_CR26","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1021\/ci200463r","volume":"52","author":"R Sayle","year":"2012","unstructured":"Sayle R, Xie PH, Muresan S (2012) Improved chemical text mining of patents with infinite dictionaries and automatic spelling correction. J Chem Inf Model 52(1):51\u201362. https:\/\/doi.org\/10.1021\/ci200463r. (PMID: 22148717)","journal-title":"J Chem Inf Model"},{"key":"831_CR27","doi-asserted-by":"publisher","unstructured":"Parr T, Fisher K (2011) Ll(*): The foundation of the antlr parser generator. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI \u201911. Association for Computing Machinery, New York, NY, USA, pp 425\u2013436. https:\/\/doi.org\/10.1145\/1993498.1993548","DOI":"10.1145\/1993498.1993548"},{"issue":"12","key":"831_CR28","doi-asserted-by":"publisher","first-page":"1305","DOI":"10.1002\/spe.872","volume":"38","author":"J Bovet","year":"2008","unstructured":"Bovet J, Parr T (2008) Antlrworks: an antlr grammar development environment. Softw Pract Exp 38(12):1305\u20131332. https:\/\/doi.org\/10.1002\/spe.872","journal-title":"Softw Pract Exp"},{"key":"831_CR29","unstructured":"Hopcroft JE, Ullman JD (1979) Introduction to automata theory, languages and computation. https:\/\/api.semanticscholar.org\/CorpusID:31901407"},{"key":"831_CR30","unstructured":"Telephone B (1968) Regular expression search algorithm. https:\/\/api.semanticscholar.org\/CorpusID:61948707"},{"issue":"2","key":"831_CR31","doi-asserted-by":"publisher","first-page":"106","DOI":"10.1021\/ci00034a010","volume":"22","author":"LE Fritts","year":"1982","unstructured":"Fritts LE, Schwind MM (1982) Using the wiswesser line notation (WLN) for online, interactive searching of chemical structures. J Chem Inf Comput Sci 22(2):106\u2013109","journal-title":"J Chem Inf Comput Sci"},{"issue":"2","key":"831_CR32","doi-asserted-by":"publisher","first-page":"194","DOI":"10.1021\/ed074p194","volume":"74","author":"S Kikuchi","year":"1997","unstructured":"Kikuchi S (1997) A history of the structural theory of benzene - the aromatic sextet rule and Huckel\u2019s rule. J Chem Educ 74(2):194. https:\/\/doi.org\/10.1021\/ed074p194","journal-title":"J Chem Educ"},{"key":"831_CR33","doi-asserted-by":"publisher","first-page":"449","DOI":"10.4153\/CJM-1965-045-4","volume":"17","author":"J Edmonds","year":"1965","unstructured":"Edmonds J (1965) Paths, trees, and flowers. Can J Math 17:449\u2013467","journal-title":"Can J Math"},{"key":"831_CR34","doi-asserted-by":"publisher","first-page":"1100","DOI":"10.1093\/nar\/gkr777","volume":"40","author":"A Gaulton","year":"2011","unstructured":"Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP (2011) Chembl: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:1100\u20131107","journal-title":"Nucleic Acids Res"},{"key":"831_CR35","doi-asserted-by":"publisher","first-page":"1123","DOI":"10.1021\/ed100697w","volume":"87","author":"HE Pence","year":"2010","unstructured":"Pence HE, Williams AJ (2010) Chemspider: an online chemical information resource. J Chem Educ 87:1123\u20131124","journal-title":"J Chem Educ"},{"key":"831_CR36","doi-asserted-by":"publisher","first-page":"1202","DOI":"10.1093\/nar\/gkv951","volume":"44","author":"S Kim","year":"2015","unstructured":"Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH (2015) Pubchem substance and compound databases. Nucleic Acids Res 44:1202\u20131213","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"831_CR37","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1021\/acs.jcim.8b00537","volume":"59","author":"J Wang","year":"2018","unstructured":"Wang J, Ge Y, Xie X-QS (2018) Development and testing of druglike screening libraries. J Chem Inf Model 59(1):53\u201365","journal-title":"J Chem Inf Model"},{"issue":"1","key":"831_CR38","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1021\/ci00037a001","volume":"23","author":"SB Walker","year":"1983","unstructured":"Walker SB (1983) Development of CAOCI and its use in ICIction division. J Chem Inf Comput Sci 23(1):3\u20135","journal-title":"J Chem Inf Comput Sci"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-024-00831-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-024-00831-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-024-00831-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,15]],"date-time":"2024-04-15T10:16:21Z","timestamp":1713176181000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-024-00831-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,15]]},"references-count":38,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["831"],"URL":"https:\/\/doi.org\/10.1186\/s13321-024-00831-2","relation":{},"ISSN":["1758-2946"],"issn-type":[{"type":"electronic","value":"1758-2946"}],"subject":[],"published":{"date-parts":[[2024,4,15]]},"assertion":[{"value":"8 November 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 March 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 April 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"42"}}