{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T15:25:48Z","timestamp":1781105148040,"version":"3.54.1"},"reference-count":0,"publisher":"IGI Global Scientific Publishing","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,1]]},"abstract":"<jats:p>This article presents experiments based on the extensible domain-specific web corpus for \u201clayfication\u201d. For these experiments, both the existing layfication corpus (in Swedish and in English) and a new addition in English (the NHS-PubMed subcorpus) are used. With this extended corpus, methods to classify lay-specialized medical sublanguages cross-linguistically using small data and noisy web documents are investigated. Sublanguage is a language variety used in specific domains. Here, the authors focus on two medical sublanguages, namely the \u201cpatientspeak\u201d (lay) and the medical jargon (specialized). Cross-lingual sublanguage classification is still largely underexplored although it can be crucial in downstream applications for digital health and cyber-physical systems. Classification models are built using small and noisy training sets in Swedish and evaluated on English test sets. The performance of Naive Bayes classifiers\u2014built with stopwords and with Bag-of-Words\u2014is compared with convolutional neural network classifiers leveraging on MUSE multi-lingual word embeddings. Results are promising and nuanced. These results are proposed as a first baseline for cross-lingual sublanguage classification.<\/jats:p>","DOI":"10.4018\/ijcps.2020010102","type":"journal-article","created":{"date-parts":[[2021,2,8]],"date-time":"2021-02-08T19:29:59Z","timestamp":1612812599000},"page":"20-32","source":"Crossref","is-referenced-by-count":0,"title":["Exploring the Potential of an Extensible Domain-Specific Web Corpus for \u201cLayfication\u201d"],"prefix":"10.4018","volume":"2","author":[{"given":"Marina","family":"Santini","sequence":"first","affiliation":[{"name":"RISE Research Institutes of Sweden, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Min-Chun","family":"Shih","sequence":"additional","affiliation":[{"name":"Link\u00f6ping University, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"2432","container-title":["International Journal of Cyber-Physical Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=272559","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,5,6]],"date-time":"2022-05-06T08:50:48Z","timestamp":1651827048000},"score":1,"resource":{"primary":{"URL":"http:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/IJCPS.2020010102"}},"subtitle":["The Case of Cross-Lingual Classification"],"short-title":[],"issued":{"date-parts":[[2020,1]]},"references-count":0,"journal-issue":{"issue":"1"},"URL":"https:\/\/doi.org\/10.4018\/ijcps.2020010102","relation":{},"ISSN":["2577-4867","2577-4875"],"issn-type":[{"value":"2577-4867","type":"print"},{"value":"2577-4875","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,1]]}}}