{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T03:46:28Z","timestamp":1761709588448,"version":"build-2065373602"},"reference-count":29,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2019,3,7]],"date-time":"2019-03-07T00:00:00Z","timestamp":1551916800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>We propose an extended scheme for selecting related stocks for themed mutual funds. This scheme was designed to support fund managers who are building themed mutual funds. In our preliminary experiments, building a themed mutual fund was found to be quite difficult. Our scheme is a type of natural language processing method and based on words extracted according to their similarity to a theme using word2vec and our unique similarity based on co-occurrence in company information. We used data including investor relations and official websites as company information data. We also conducted several other experiments, including hyperparameter tuning, in our scheme. The scheme achieved a 172% higher F1 score and 21% higher accuracy than a standard method. Our research also showed the possibility that official websites are not necessary for our scheme, contrary to our preliminary experiments for assessing data collaboration.<\/jats:p>","DOI":"10.3390\/info10030102","type":"journal-article","created":{"date-parts":[[2019,3,8]],"date-time":"2019-03-08T04:58:35Z","timestamp":1552021115000},"page":"102","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Related Stocks Selection with Data Collaboration Using Text Mining"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5883-8250","authenticated-orcid":false,"given":"Masanori","family":"Hirano","sequence":"first","affiliation":[{"name":"Department of Systems Innovation, Faculty of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hiroki","family":"Sakaji","sequence":"additional","affiliation":[{"name":"Department of Systems Innovation, School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shoko","family":"Kimura","sequence":"additional","affiliation":[{"name":"Quantitative Investment Department, Daiwa Asset Management Co. Ltd., 1-9-1 Marunouchi, Chiyoda-ku, Tokyo 100-6753, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kiyoshi","family":"Izumi","sequence":"additional","affiliation":[{"name":"Department of Systems Innovation, School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hiroyasu","family":"Matsushima","sequence":"additional","affiliation":[{"name":"Department of Systems Innovation, School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shintaro","family":"Nagao","sequence":"additional","affiliation":[{"name":"Quantitative Investment Department, Daiwa Asset Management Co. Ltd., 1-9-1 Marunouchi, Chiyoda-ku, Tokyo 100-6753, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Atsuo","family":"Kato","sequence":"additional","affiliation":[{"name":"Frontier Technologies Research &amp; Consulting Deptartment, Daiwa Institute of Research Ltd., 15-6 Fuyuki, Koto-ku, Tokyo 135-8460, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,3,7]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Hirano, M., Sakaji, H., Kimura, S., Izumi, K., Matsushima, H., Nagao, S., and Kato, A. (2018, January 17\u201320). Selection of related stocks using financial text mining. Proceedings of the 18th IEEE International Conference on Data Mining Workshops (ICDMW 2018), Singapore.","DOI":"10.1109\/ICDMW.2018.00036"},{"key":"ref_2","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013, January 2\u20134). Efficient estimation of word representations in vector space. Proceedings of the International Conference on Learning Representations (ICLR 2013), Scottsdale, AZ, USA."},{"key":"ref_3","unstructured":"Nagata, R., Nishite, S., and Ototake, H. (2018, January 5\u20138). A method for detecting overgeneralized be-verb based on subject-compliment identification. Proceedings of the 32nd Annual Conference of the Japanese Society for Artificial Intelligence (JSAI 2018), Kagoshima, Japan. (In Japanese)."},{"key":"ref_4","unstructured":"Neubig, G., Nakata, Y., and Mori, S. Pointwise prediction for robust, adaptable Japanese morphological analysis. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL HLT 2011)."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Morita, H., Kawahara, D., and Kurohashi, S. (2015, January 17\u201321). Morphological analysis for unsegmented languages using recurrent neural network language model. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, Portugal.","DOI":"10.18653\/v1\/D15-1276"},{"key":"ref_6","unstructured":"Kudo, T., Yamamoto, K., and Matsumoto, Y. (2004, January 25\u201326). Applying conditional random fields to Japanese morphological analysis. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain."},{"key":"ref_7","unstructured":"Toshinori, S. (2019, March 06). Neologism Dictionary Based on the Language Resources on the Web for Mecab. Available online: https:\/\/github.com\/neologd\/mecab-ipadic-neologd."},{"key":"ref_8","unstructured":"Toshinori, S., Taiichi, H., and Manabu, O. (2016). Operation of a word segmentation dictionary generation system called NEologd. Information Processing Society of Japan, Special Interest Group on Natural Language Processing (IPSJ SIGNL 2016), Information Processing Society of Japan. (In Japanese)."},{"key":"ref_9","unstructured":"Toshinori, S., Taiichi, H., and Manabu, O. (2017). Implementation of a word segmentation dictionary called mecab-ipadic-NEologd and study on how to use it effectively for information retrieval. Proceedings of the Twenty-three Annual Meeting of the Association for Natural Language Processing (NLP 2017), The Association for Natural Language Processing. (In Japanese)."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1177\/001316446002000104","article-title":"A coefficient of agreement for nominal scales","volume":"20","author":"Cohen","year":"1960","journal-title":"Educ. Psychol. Meas."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1037\/h0026256","article-title":"Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit","volume":"70","author":"Cohen","year":"1968","journal-title":"Psychol. Bull."},{"key":"ref_12","unstructured":"Jarvelin, K., and Kekalainen, J. IR evaluation methods for retrieving highly relevant documents. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR \u201900)."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1145\/582415.582418","article-title":"Cumulated gain-based evaluation of IR techniques","volume":"20","year":"2002","journal-title":"ACM Trans. Inf. Syst."},{"key":"ref_14","unstructured":"Koppel, M., and Shtrimberg, I. (2006). Good news or bad news? Let the market decide. Computing Attitude and Affect in Text: Theory and Applications, Springer."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Fellbaum, C. (1998). WordNet: An Electronic Lexical Database, The MIT Press.","DOI":"10.7551\/mitpress\/7287.001.0001"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Low, B.T., Chan, K., Choi, L.L., Chin, M.Y., and Lay, S.L. (2001, January 16\u201318). Semantic expectation-based causation knowledge extraction: A study on Hong Kong stock movement analysis. Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2001), Hong Kong, China.","DOI":"10.1007\/3-540-45357-1_15"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1145\/1462198.1462204","article-title":"Textual analysis of stock market prediction using breaking financial news: The AZFin text system","volume":"27","author":"Schumaker","year":"2009","journal-title":"ACM Trans. Inf. Syst."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Ito, T., Sakaji, H., Tsubouchi, K., Izumi, K., and Yamashita, T. (2018, January 3\u20136). Text-visualizing neural network model: Understanding online financial. Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2018), Melbourne, Australia.","DOI":"10.1007\/978-3-319-93040-4_20"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Ito, T., Sakaji, H., Izumi, K., Tsubouchi, K., and Yamashita, T. (2018). GINN: Gradient interpretable neural networks for visualizing financial texts. Int. J. Data Sci. Anal., 1\u201315.","DOI":"10.1007\/s41060-018-0160-8"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Milea, V., Sharef, N.M., Almeida, R.J., Kaymak, U., and Frasincar, F. (2010, January 7\u201310). Prediction of the MSCI EURO index based on fuzzy grammar fragments extracted from European Central Bank statements. Proceedings of the 2010 International Conference of Soft Computing and Pattern Recognition, Paris, France.","DOI":"10.1109\/SOCPAR.2010.5686083"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1016\/j.knosys.2018.11.035","article-title":"Growing semantic vines for robust asset allocation","volume":"165","author":"Xing","year":"2018","journal-title":"Knowl. Based Syst."},{"key":"ref_22","unstructured":"Sakai, H., and Masuyama, S. (2007, January 19\u201321). Extraction of cause information from newspaper articles concerning business performance. Proceedings of the 4th IFIP Conference on Artificial Intelligence Applications & Innovations (AIAI 2007), Athens, Greece."},{"key":"ref_23","unstructured":"Sakaji, H., Sakai, H., and Masuyama, S. (2008, January 20\u201323). Automatic extraction of basis expressions that indicate economic trends. Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2008), Osaka, Japan."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Sakaji, H., Murono, R., Sakai, H., Bennett, J., and Izumi, K. (December, January 27). Discovery of rare causal knowledge from financial statement summaries. Proceedings of the 2017 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics (CIFEr 2017), Honolulu, HI, USA.","DOI":"10.1109\/SSCI.2017.8285265"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Kitamori, S., Sakai, H., and Sakaji, H. (December, January 27). Extraction of sentences concerning business performance forecast and economic forecast from summaries of financial statements by deep learning. Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (IEEE SSCI 2017), Honolulu, HI, USA.","DOI":"10.1109\/SSCI.2017.8285335"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., and Riedl, J. GroupLens: An open architecture for collaborative filtering of netnews. Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work (CSCW \u201994).","DOI":"10.1145\/192844.192905"},{"key":"ref_27","unstructured":"Shardanand, U., and Maes, P. Social information filtering: Algorithms for automating \u201cWord of Mouth\u201d. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI \u201995)."},{"key":"ref_28","unstructured":"Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. Item-based collaborative filtering recommendation algorithms. Proceedings of the 10th International Conference on World Wide Web."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1109\/MIC.2003.1167344","article-title":"Amazon.com recommendations: Item-to-item collaborative filtering","volume":"7","author":"Linden","year":"2003","journal-title":"IEEE Int. Comput."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/3\/102\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:37:05Z","timestamp":1760186225000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/3\/102"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,3,7]]},"references-count":29,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["info10030102"],"URL":"https:\/\/doi.org\/10.3390\/info10030102","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2019,3,7]]}}}